Resource Library

Shuwa Arabic voice dataset

November 21, 2025

Voice datasets are structured collections of audio recordings paired with corresponding text transcriptions, metadata, and annotations. These datasets serve as the foundation for training […]

Kanuri TTS and ASR models

November 18, 2025

Voice datasets are structured collections of audio recordings paired with corresponding text transcriptions, metadata, and annotations. These datasets serve as the foundation for training […]

Chichewa synthetic voice dataset, TTS models, ASR models

July 30, 2025

Below is a curated collection of open resources for text-to-speech (TTS), automatic speech recognition (ASR), and synthetic voice datasets in the Chichewa language. Text-to-Speech […]

Hausa synthetic voice dataset, TTS models, ASR models

July 30, 2025

Below is a curated collection of open resources for text-to-speech (TTS), automatic speech recognition (ASR), and synthetic voice datasets in the Hausa language. Text-to-Speech […]

Dholuo synthetic voice dataset, TTS models, ASR models

July 30, 2025

Below is a curated collection of open resources for text-to-speech (TTS), automatic speech recognition (ASR), and synthetic voice datasets in the Dholuo language. Text-to-Speech […]

Marma TTS and text data resources

June 18, 2025

Text data This dataset contains sentences in the Marma language (ISO code: rmz), with both original and normalized forms. The dataset is designed to […]