LANGUAGE TECHNOLOGY
Of the world’s 7,000+ languages, only a few are meaningfully represented online. Just 17 languages dominate digital content, and these are not the first languages of many crisis-affected communities.
For speakers of dominant languages, accessing information and language technologies is straightforward. Resources are abundant, and technology is readily available for translation, communication, and content creation. But for those speaking less globally influential languages, essential information is often unavailable, and the technology to bridge that gap simply doesn’t exist.
This creates an information divide that isolates these communities and limits their ability to participate in global conversations.
Our research identifies key challenges driving this gap: a shortage of diverse, high-quality language data (especially voice data) for marginalized languages, limited awareness of language tech’s potential among social impact organizations, and a lack of insight into the specific needs of these communities. Many languages around the world have little or no voice data, and existing datasets are often narrow, dominated by young male voices and urban accents. This makes it difficult to reach crucial groups, such as women farmers in rural areas. While language AI is advancing in the social impact sector, many organizations still lack the knowledge or resources to harness its full potential.
There is a growing gap between speakers of globally dominant languages, like English, Spanish, and Mandarin, and those who speak marginalized languages.
For dominant language speakers, accessing information and language technologies is easy. They can find resources, engage in global discussions, and use technology for translation and content creation. However, for speakers of less globally influential languages, essential information is often unavailable in their language, and the technology to make it accessible is lacking. This creates an information gap that isolates and disempowers these communities, hindering their participation in global conversations.
We have years of experience building language technology for languages with few technology resources. We’ve built:
MULTILINGUAL CHATBOTS
We’ve built chatbots in multiple languages, providing essential information on COVID, sustainable business practices, farming, and migrant/refugee services for regions including the Democratic Republic of Congo, Nigeria, Kenya, and Central America.

We have built automatic speech recognition and machine translation for marginalized languages like Kurdish and Tigrinya that outperformed the models available at the time. This investment enabled solutions that help people in emergencies access vital information in their own language.
We deployed an offline information kiosk in Bihar, India that answered farmers’ spoken questions on climate adaptation in their own language. With the right technology they no longer struggled to access information; the information they wanted was available in the audio form that was easiest for them.
We built chatbots using conversational AI to answer people’s questions on Covid-19 in neglected languages as Lingala, Congolese Swahili, Hausa and Kanuri. Unlike the menu-based bots commonly deployed during the pandemic, Uji in DRC and Shehu in Nigeria allowed users to put questions in their own words.
Chatbot Hajiya, in northeast Nigeria, uses conversational AI to respond to questions in 4 languages: Shuwa Arabic, Hausa, Kanuri and English. That’s not all: it can accommodate the common practice of switching between those languages, understanding for instance when a user drops an English word into a Kanuri sentence.
TWB Voice: the next step is to enable users to engage with our chatbots using speech not text – ensuring that everyone, regardless of background or education, can easily ask questions directly and confidently. Our newest tool, TWB Voice, addresses the gaping shortage of voice data in marginalized languages by providing a platform for collecting the speech data needed to build voice technology for languages like Shuwa Arabic, Hausa and Kanuri.
Translation Resources: We’ve developed translation tools and resources for nearly 100 languages, ensuring vital information is accessible to underserved communities.
Machine Translation Engine: We created a machine translation engine specifically for Levantine Arabic to combat food insecurity, improving access to critical resources.


With your help, we can do more.
Our goal is to engage the four billion people who don’t speak the dominant global languages, by providing communication tools in the languages they understand. We want to do this by:
Partnering with local technology organizations to develop scalable solutions that fit the people, place, and problem.
Working with global technology leaders to develop simple, smart, scalable solutions.