CLEAR Global has a plan to radically expand the availability of language AI for speakers of marginalized languages, in partnership with other social impact organizations.

We have a record of pioneering and advocating for the development of language technology for marginalized communities – opening the way for others. 

  • Our early automatic speech recognition and machine translation for Kurdish and Tigrinya outperformed the models available at the time.
  • Our offline information kiosk in Bihar, India answered farmers’ spoken questions on climate adaptation in their own language when literacy and connectivity were problematic.
  • We built chatbots using conversational AI to answer people’s questions on Covid-19 in Lingala, Congolese Swahili, Hausa and Kanuri when most Covid chatbots were menu-based.

 

Working with other language tech experts in Africa and South Asia in particular, we have shown how language technology can be made accessible and useful for more of the 4 billion people worldwide excluded by language from information, services and conversations.

You can help make that change.

Advances in technology and untapped potential in the social impact sector present a unique opportunity to accelerate that progress and ensure it benefits speakers of less powerful languages. We would love to work with you on making that happen.

What needs to happen?

Our analysis is that the biggest bottlenecks at present are:

  • A shortage of diverse, good quality language data (especially voice) for marginalized languages
  • A lack of awareness among social impact organizations about what is possible
  • A lack of information on what communities need and want

Very little voice data exists for most of the world’s languages, and what does, is fairly undiverse: young male voices from the cities predominate, as do the accents and dialects of more prosperous regions. That means that if the aim is to communicate with women farmers in rural areas, for instance, the application is likely to fail. 

While language AI is bringing change to the social impact sector as it is elsewhere, many of the organizations involved are unfamiliar with language AI or  lack the capacity to use or help build it. 

We aim to:

  • Build diverse voice datasets and support other social impact organizations to do the same – tapping into a largely unmined potential of language data for marginalized language speakers, in ways that reflect their needs and wishes and are durably safe for the individuals concerned
  • Collaborate on building voice models, integrating them into useful applications, and documenting their impact
  • Research and collaborate on and advocate for centering language technology development on the needs of people excluded by language

How you can be part of it

  • Explore the potential of language technology for reaching and hearing from marginalized language speakers: integrate it into your programs and services and contribute to improving and promoting its impact. 
  • Help build a consensus on and support for safe and ethical pooling of language data for marginalized languages and for using it to expand access to services, information and conversations.
  • Collect and share language data safely and ethically.
  • Work with us to understand what marginalized communities need from language tech and how best to consult them.

How CLEAR Global can help

  • To learn more, read about and share information on our work and learning to date, or contact us to set up a call.
  • To help build language data, contact us about using our TWB Voice platform or getting our support to share text or voice data safely in other ways.
  • To integrate language tech for marginalized language speakers into your work, get in touch to discuss how we can support you with information, capacity building and language models.
  • To get a better understanding of community needs, we can support you with research and data collection.

About TWB Voice

TWB Voice is CLEAR Global’s latest contribution to building voice data by facilitating quality-controlled voice recordings in any language to meet the needs of automated speech recognition (ASR) and text-to-speech (TTS) technology. 

-A platform for quality-controlled voice data collection

-A growing repository of open-source voice data

-Access to the 100,000-strong TWB Community of linguists

-Voices accurately classified by age group and gender to aid appropriate use

How you can use TWB Voice

If you are looking to expand into new languages, we can work with you to build the voice datasets and develop the models needed for speech recognition and text-to-speech. We can also advise on integrating them into your existing services if needed.

Because we aim to expand the reach of language AI for marginalized languages, the goal is to publish datasets as open-source whenever possible, with the full consent of contributors and in compliance with ethical data management and AI use. If that won’t work for your datasets, coordinate with our experts on the needs of your language data project.

Other language tech support

If you are interested in building language datasets over time from your own digital communication with communities, we can advise and support on setting this up.

We also offer user research services to help tailor technology-enabled solutions to the needs of your intended users.

Please get in touch!

We work with a wide range of technologists, civil society organizations, international aid providers and governments to build language AI and other solutions to language exclusion. We want to hear from you if you have needs we can help address, and if you have capacity, ideas or learning we can build on together.

Leave a Reply

Your email address will not be published. Required fields are marked *

two × 1 =