Language data by country

26 countries | 58 datasets | 80 maps

Click on a country on the map below to see language data, resources, and maps that we have available for that country.
This map will update as new data is published in the future.

Find, share and use the datasets for all countries listed above on the Humanitarian Data Exchange page.

The datasets are all available in .xlsx and .csv formats on HDX, and detailed metadata clearly states the source of each dataset along with known limitations.

We have made all of these datasets available under a Creative Commons Attribution Noncommercial Share Alike license (CC BY-NC-SA-4.0). This means that you are free to use and adapt them as long as you cite the source and do not use them for commercial purposes. You can also share derivatives of the data as long as you comply with the same license when doing so.

The availability of data on languages around the world is in bad shape.

Currently available language data is often protected by restrictive copyrights or locked behind paywalls. In the datasets that do exist, languages are often visualized as discrete polygons or specific points on a map, which do not accurately reflect the complexity of the real world.

In short, language data isn’t accessible, or easily verifiable, or in a format that aid workers can readily use.

These are the first openly available language datasets for humanitarian and development use.

The majority of these datasets are based on existing sources — census and other government data. We have curated, cleaned, and reformatted the data to be more accessible for humanitarian and development purposes.

We are also exploring ways of deriving new language data in countries without existing data sources, and extracting language information from digital sources.

This project is built on four main principles: accessibility, interoperability, openness and ethics.

Read more about the Language Data Initiative in our blog.

How can you help?

This is just the beginning of our effort to provide more accessible language data for humanitarian purposes. Our goal is to make language data openly available for every humanitarian crisis, and we can’t do it alone.
We need your help to:

Integrate and share this data.

Our strategy is to make these datasets as accessible and interoperable as possible using existing platforms. But we need your feedback so we can improve and expand them.

Add language-related questions into your ongoing surveys.

Existing language data is often outdated and does not necessarily represent large-scale population movements. We have successfully worked with partners to integrate standard language questions into ongoing surveys. This is essential if we are to develop language data for the countries that don’t have regular censuses.

Use this language data to improve communication strategies.

As we develop more data, we hope to provide the tools for service providers to design more appropriate communication strategies. Decisions to hire interpreters and field workers, develop radio messaging, or create new posters and flyers should all be data-driven. That’s only possible if we know which languages people speak. An inclusive and participatory system requires two-way communication strategies that use languages and formats that people understand.