Language technology Archives

Playbook for voice data collection for low-resource languages

The TWB Voice Playbook is a practical guide to planning and managing voice data collection projects for low-resource languages. It is aimed at both […]

Chichewa synthetic voice dataset, TTS models, ASR models

Below is a curated collection of open resources for text-to-speech (TTS), automatic speech recognition (ASR), and synthetic voice datasets in the Chichewa language. Text-to-Speech […]

Hausa synthetic voice dataset, TTS models, ASR models

Below is a curated collection of open resources for text-to-speech (TTS), automatic speech recognition (ASR), and synthetic voice datasets in the Hausa language. Text-to-Speech […]

Dholuo synthetic voice dataset, TTS models, ASR models

Below is a curated collection of open resources for text-to-speech (TTS), automatic speech recognition (ASR), and synthetic voice datasets in the Dholuo language. Text-to-Speech […]

Marma TTS and text data resources

Text data This dataset contains sentences in the Marma language (ISO code: rmz), with both original and normalized forms. The dataset is designed to […]

Topic: Language technology

Playbook for voice data collection for low-resource languages

Chichewa synthetic voice dataset, TTS models, ASR models

Hausa synthetic voice dataset, TTS models, ASR models

Dholuo synthetic voice dataset, TTS models, ASR models

Marma TTS and text data resources

Follow us

© 2026 All rights reserved.