Voice datasets are structured collections of audio recordings paired with corresponding text transcriptions, metadata, and annotations. These datasets serve as the foundation for training and evaluating speech recognition systems, text-to-speech engines, and other voice-enabled applications. High-quality voice datasets are essential for developing accurate and robust speech technologies. 

Voice models are machine learning systems trained on voice datasets to perform specific speech-related tasks. These models learn patterns in human speech and can be used for various applications including automatic speech recognition, text-to-speech synthesis, etc.

Data for low-resource languages is scarce. Our purpose is to advance language technology for marginalized languages by making tools such as these available. 

You can explore our collected Shuwa Arabic voice dataset on Hugging Face.

See our voice datasets and models in other languages.