Voice datasets are structured collections of audio recordings paired with corresponding text transcriptions, metadata, and annotations. These datasets serve as the foundation for training […]
Voice datasets are structured collections of audio recordings paired with corresponding text transcriptions, metadata, and annotations. These datasets serve as the foundation for training […]
Indigenous communities in Bolivia want access to practical, actionable and timely early warning systems in a language that they can understand. They are increasingly […]
Changing direction is never very easy; in 2022 we developed a new Direction of Travel, focusing more on developing partnerships and language technology to […]
Learn how we are exploring the potential of synthetic data to improve automatic speech recognition for low-resource African languages Africa is home to over […]
The TWB Voice Playbook is a practical guide to planning and managing voice data collection projects for low-resource languages. It is aimed at both […]
Below is a curated collection of open resources for text-to-speech (TTS), automatic speech recognition (ASR), and synthetic voice datasets in the Chichewa language. Text-to-Speech […]
Below is a curated collection of open resources for text-to-speech (TTS), automatic speech recognition (ASR), and synthetic voice datasets in the Hausa language. Text-to-Speech […]
Below is a curated collection of open resources for text-to-speech (TTS), automatic speech recognition (ASR), and synthetic voice datasets in the Dholuo language. Text-to-Speech […]
Text data This dataset contains sentences in the Marma language (ISO code: rmz), with both original and normalized forms. The dataset is designed to […]