Supporting digital inclusion for Kinyarwanda speakers

Local partnerships boost language technology for communities in Rwanda

Digital services can offer vital information, communication, and community – as long as they’re designed for the people who need them. Yet despite innovations in connectivity, service provision, and language technology like large language modeling, only a fraction of the world’s 7,000 languages are meaningfully online. For the billions already locked out, the gap is widening. 

With relevant and appropriate language technology, more of the world’s most marginalized people can access information and be empowered to make decisions that affect their lives. Learn how we work with local organizations and language communities to build relevant, sustainable solutions. 

Paul, CLEAR Global
Paul Warambo, CLEAR Global’s Senior Community Officer for 4 Billion Conversations.

Building language technology in marginalized languages helps close this digital divide

 

Kinyarwanda is the most widely spoken first language in Rwanda. Yet, like many other non-European languages, it is disproportionately underrepresented in the digital space. We partnered with Digital Umuganda, a Rwandan language technology company specializing in African languages to address language-based digital exclusion for Kinyarwanda speakers. For this project, we built and integrated a Machine Translation Plugin into Moodle, an online learning management system so users can switch between Kinyarwanda and English. This enables users of digital learning in Rwanda to access content in their language while also improving their English. The topics in this case were entrepreneurship, digital literacy, and also content on Rwanda’s tourism experience. By building machine translation (MT) capacity between English and Kinyarwanda (bi-directional) we can help the public sector improve communication with communities and access to services.

Our collaboration enabled Digital Umuganda to strengthen its technical capacity to support further projects promoting digital inclusion through language technology. It also showed how our model can catalyze sustainable technology development for marginalized languages, and make good on our commitment to localize aid. In this blog, we explore how we help local language technology experts equip themselves to address language marginalization in their own contexts. The next blog in this series on digital inclusion will look at what we learned about generating language data and building the technology to reach people in their languages.

Engaging Kinyarwanda speakers to generate language data

 

To begin creating language technology like machine translation, you need language data – digitized voice or text datasets in the right languages. Even languages with millions of speakers like Kinyarwanda may not have language datasets that are good enough to create accurate, viable, and domain-specific language technology capacity – yet.  

In order to build machine translation capacity in Kinyarwanda, we mobilized speakers from our Translators without Borders Community, the Mbaza NLP community coordinated by FAIR Forward and Digital Umuganda, and speakers from local universities such as the University of Rwanda’s School of Art Languages. We took a collaborative approach by sharing information about the project goals, the tool they would use to collect and validate language data, and the project’s intended impact. We aimed to ensure our community members had full transparency about the project and how their language data would be used. Demonstrating our commitment to transparency and open communication helped strengthen relationships and foster a sense of ownership between the community and our project team. This approach helped build trust in the technology and the overall project.

"I felt empowered knowing that our voices were being heard and valued in the development of language solutions that directly impact our Kinyarwanda community."

People talking around a desk covered in colorful post-it notes
Photo: Yagazie Emezi/Getty Images/Images of Empowerment

We also explored different methods of data collection, on- and offline. In collaboration with Digital Umuganda, we organized a data collection hackathon in Kigali, Rwanda, where community members met in person to work together on generating Kinyarwanda language data. While in-person data collection is more costly, it ensures that people without access to devices or a stable internet connection can engage. It gave community members the opportunity to share their opinions, ask questions, and express any concerns about the data collection tool we were piloting. The datasets collected can be accessed online: e-learning contenttourism experience.

“It was refreshing to see that the project team genuinely cared about our input and feedback. This collaborative approach made me feel confident that the language solutions being developed would truly meet our needs."

Understanding communities’ linguistic challenges and needs to design user-centered solutions

 

Community engagement played a pivotal role in ensuring that the language solutions we created were tailored to the local community’s specific needs and preferences. CLEAR Global’s project team gained valuable insights into linguistic challenges, cultural nuances, accessibility challenges, and user expectations through active involvement and collaboration with Digital Umuganda. The sense of ownership fostered by involving Kinyarwanda linguists online and on-site ultimately led to more effective and impactful language solutions that have since been applied in use cases beyond this project’s scope. 

Communities and organizations know their context best

 

We collected localized, domain-specific language data on relevant topics offered through digital learning – entrepreneurship, digital literacy, and Rwanda’s tourism experience. One example of a linguistic challenge faced was how to render in Kinyarwanda concepts related to education systems and knowledge sharing. Knowledge acquisition in Kinyarwanda is embedded in traditional customs and practices. Linguists had to find appropriate phrases that were both accurate and would not risk representing educational content as elitist or reproducing colonial ways of thinking. The Kinyarwanda linguists working on the data collection adapted and contextualized the text. Their input helped ensure the text we used to develop our machine learning tool was appropriate and relevant to the users’ needs. 

To make the most of the community’s engagement we facilitated two-way communication channels between our project team, linguists, and hackathon contributors. This allowed for continuous feedback and iteration to enhance the data collection tool. By actively inviting input and feedback from the tool’s end-users, the project team gained insights into some of the requirements of a more user-friendly tool. For example, Kinyarwanda linguists expressed a preference for a solution with intuitive navigation capabilities and a user-friendly interface. Language data collectors also emphasized the importance of making the language data collection platform easily accessible to individuals with varying levels of digital literacy, ensuring simplicity in interaction and reducing the need for extensive training. They highlighted the need for more logical features that enhance the overall user experience.

A group of Kinyarwanda-speaking TWB Community members in Kigali, Rwanda at the Digital Transformation Center Rwanda
Kinyarwanda-speaking TWB Community members in Kigali, Rwanda at the Digital Transformation Center Rwanda

Collaboration builds agency, trust, and more effective language technology 

 

Sustainable social impact requires local ownership and long-term commitment. We value the insights of communities such as Mbaza NLP and local organizations – they simply know their context best. When planning projects, we prioritize participatory decision-making to ensure key stakeholders have the agency to shape effective, inclusive, and sustainable initiatives that benefit their communities.

Partnering with local community-based organizations and people experiencing digital exclusion helps us develop digital initiatives that address their unique challenges. Collaboration with end-users from the start also promotes acceptance and adoption of digital interventions within the community. Considering localized challenges, community needs, language and format preferences, and sociocultural dynamics helps us identify relevant use cases for language technology – and assess when a digital solution might not be the best option. 

We have now handed over ownership of the machine translation tool to Mbaza NLP, ensuring the community continues to develop and apply the technology to other use cases beyond this project’s lifespan. Our collaborative approach strengthens our partners’ capacity to address access challenges, helping communities get vital information and be heard long after the project is completed. Digital Umuganda and the local community are now better placed to develop future language technology in even more languages to support other communities at risk of digital exclusion. By pooling our resources and leveraging existing technology infrastructure, we can increase the quality of existing technology, avoid redundancies, and scale our social impact solutions more efficiently.

Starting 4 billion more conversations

 

Four billion people – half the world’s population – are still excluded from important global conversations because their languages are underrepresented online. Our Four Billion Conversations movement #4BC aims to change that with initiatives like the Language AI Playbook to help social good partners integrate technology and mobilize communities

Our tech team has supported digital language inclusion in various contexts and languages:

– Learn how our pilot project, TILES (Touch Interface for Language Enabled Services), supported Hindi-speaking farmers in India to access information about climate change mitigation strategies.

– Explore Kompas, our multilingual artificial intelligence platform, curates verified, up-to-date information for people affected by the war in Ukraine.

– Discover chatbots like Shehu, using natural language understanding to answer questions about COVID-19 in Hausa, Kanuri, and English in Nigeria.

– Read our ebook to learn more about how language and communication are key to achieving sustainable development, climate change action, and health care for all.

Do you want to work with us to support digital inclusion in your language?

 

Click here to partner with us.

With thanks to our technology and funding partners:

Digital Umuganda 

Digital Umuganda is an AI and open data company with a mission to enable access to information in local African languages. Digital Umuganda creates open-source datasets, models and tools that make it possible for NLP including Large language models to work for marginalized communities that speak underresourced languages. Learn more at digitalumuganda.com

 

Digital Transformation Center Rwanda

The Digital Transformation Center is a Rwandan-German initiative aimed at developing impact-driven digital solutions in Africa. Therefore, it not only provides advisory services and training for government institutions and local tech companies, but also a modern space to boost creativity and collaboration. Learn more at digicenter.rw

 

GIZ Fair Forward: 

On behalf of the German Federal Ministry for Economic Cooperation and Development (BMZ), the Deutsche Gesellschaft für Internationale Zusammenarbeit (“GIZ”) implements the project “FAIR Forward – Artificial Intelligence for All” which strives to create a more open, inclusive, and sustainable approach to AI on the international level, and more specifically, to develop artificial intelligence ecosystems locally across its seven partner countries (Rwanda, Uganda, Kenya, South Africa, Ghana, India and Indonesia). For more information, visit FAIR Forward – Open data for AI (bmz-digital.global)

 

Written by Paul Warambo, Senior Community Officer, and Emily Elderfield, Advocacy Officer, CLEAR Global

Share this blog

Facebook
Twitter
LinkedIn
Email
WhatsApp
Telegram