Speech technology is one of the fastest-growing sectors in AI, but it’s failing 4 billion people worldwide who speak underresourced languages.
Current systems only work well in a handful of globally dominant languages, leaving numerous communities without access to voice-enabled services, information, and digital tools.
While the technical know-how to resolve this is available , the lack of data imposes a barrier to progress.
Speech technology requires extensive voice recordings, transcriptions, and linguistic metadata that don’t exist for the majority of the world’s languages. This leaves social impact opportunities unexplored and communities underserved.
Join us
in building speech technology that works for the 4 billion people excluded by language barriers
TWB Voice is a scalable voice data collection platform that enables organizations to build inclusive speech technology infrastructure.
It leverages our global network of 100,000+ linguists to crowdsource high-quality voice datasets for low-resource languages.
The platform also allows for the onboarding of new contributors in collaborations with partners.
Platform Capabilities
Scalable voice data collection for low-resource languages.
Quality assurance systems with community-driven validation.
Open dataset creation for researchers and developers.
Ethical data governance with informed consent.
Pathways to self-managed projects.
How we delivered speech technology in northeast Nigeria
In 2025 we successfully collected voice data and built datasets in Hausa, Kanuri, and Shuwa Arabic. From these datasets, we developed speech-to-text and text-to-speech models in Hausa and Kanuri.
Taking on the challenge to activate and mobilize the community, we worked to compile and validate over 111 hours of automatic speech recognition data, and 20 hours of text-to-speech data with 135 contributors.
What voice data can do
Voice datasets enable transformative services across multiple sectors.
Organizations can deploy voice-powered chatbots that answer frequently asked questions on critical topics like health information or agricultural best practices, making essential knowledge accessible regardless of literacy levels.
Additionally, speech-to-text models allow organizations to listen to, transcribe, and analyze incoming information and questions from communities, creating feedback loops that improve service delivery and ensure programs respond to actual community needs.
Proven success that is ready to scale
Partner with us
Tech companies
Expand your market reach
Access new language markets
Collaborate on model development for low-resource languages
Leverage the TWB Community for testing and validation
Build truly inclusive solutions
Social impact organizations
Transform your delivery
Integrate voice technology into existing programs
Scale access to essential information and services
Researchers
Advance AI that benefits everyone
Access unique datasets
Collaborate on ethical AI development
Study real-world applications of inclusive technology
Publish findings on marginalized language communities
Funders
Invest in systemic change
Support an infrastructure that benefits millions
Address digital equity at scale
Enable change through sustainable and community-led initiatives
Leverage technology for maximum impact
Community
Be part of a global movement
Participate in the transformation of speech technology
Have direct impact in creating tools that serve your language community
Develop your skills in voice data collection
Earn recognition and certificates for your
How TWB Voice can boost your mission
An established infrastructure
Built on CLEAR Global’s experience in language services, with proven systems for community engagement, quality assurance and ethical data handling.
Our team includes researchers and engineers with deep expertise in low-resource languages.
A global network
Our access to the TWB Community connects you directly to dozens of marginalized language communities.
We have worked with hundreds of social impact partners and funders over the years, creating a solid network for collaboration.
Ethical leadership
We take an industry-leading approach to informed consent, data privacy, and community ownership for sustainable and responsible solutions.
Scalable architecture
The TWB Voice platform is designed to support simultaneous projects across multiple languages, contexts, and partner organizations.
Learn how we did it
Setting up TWB Voice was an exciting endeavor full of challenges and learning.
We captured all of them in the TWB Voice playbook, a resource that can help other organizations plan and run voice data projects for low-resource languages.
Learn more about available datasets, models, and existing partnerships
Community
If you are interested in contributing to voice data collection in your language, you can join our community of volunteer contributors.
You must be a member of the Translators without Borders Community or sign up for an account. You can contribute to active TWB Voice projects in your language.