Voice datasets are structured collections of audio recordings paired with corresponding text transcriptions, metadata, and annotations. These datasets serve as the foundation for training and evaluating speech recognition systems, text-to-speech engines, and other voice-enabled applications. High-quality voice datasets are essential for developing accurate and robust speech technologies.
Voice models are machine learning systems trained on voice datasets to perform specific speech-related tasks. These models learn patterns in human speech and can be used for various applications including automatic speech recognition, text-to-speech synthesis, etc.
Data for low-resource languages is scarce. Our purpose is to advance language technology for marginalized languages by making tools such as these available.
Below is a curated collection of open resources for text-to-speech (TTS) and automatic speech recognition (ASR) in the Kanuri language.
Text-to-speech (TTS) models
Explore our collection of Kanuri TTS models.
View Kanuri TTS models on Hugging Face.
Automatic speech recognition (ASR) models
ASR models for Kanuri trained on different splits between human and synthetic data.
View Kanuri ASR models on Hugging Face: