Monday, March 27, 2017

Half of the world's languages are not Indo-European (but almost half of the population speak an Indo-European language)

Big Think recently published an article titled "Half of All Languages Come from One Root Language. How it Spread Is Something of Debate" by Philip PerryThis is a nice article that provides an introduction to the stormy debate on the origins of the Indo-European languages family to the public. It is always appreciated to see current research debate covered in popular press!

The article informs the reader that "half of the languages spoken today by some 3 billion people come from a single root language".  This is unfortunately misleading, I'll provide some context here. 

TL;DR Only 6% of the 7,000-ish languages alive today are Indo-European. However, 46% of people speak an Indo-European language. Languages are not evenly distributed across the population, i.e. not half of the world's languages are Indo-European even if almost half of people speak it.

The writer is referring to the Indo-European language family, the most studied and well-known language family in the world. The Indo-European family includes English, Hindi, Sanskrit, Greek, Swedish, Russian, Nepali and many more well-known languages. (It does not however contain the european languages Hungarian, Basque, Finnish or Maltese.)

While it is true that slightly less than half (46%) of earth's population speak an Indo-European languages as their first language, it is not true that half of the world's languages are Indo-European. How can this be? Let's learn about the distribution of languages across the earth's population!

A language family is a group of languages that are hypothesized to share a common ancestor. Similarly to how groups of people can be traced back via DNA to common ancestors, linguists use vocabulary, sounds and other features of language of tracing the history of languages. This is not exactly comparable to genetics, but the two approaches to learn about human history have their similarities. 

Now, here are languages of the Indo-European family:
Capture of map of the distribution of Indo-European languages at
(NB that they include contact languages, like Nigerian Pidgin.) The colours represent sub-groupings.
And here's the world:
Map of languages of the world by Ethnologue, one dot per language.

Let's tease this out! How come that the language family that is spoken by half the population does not represent half of the languages? Well, it all comes down to the fact that languages are not evenly distributed across the population, most people speak one of a set of few, but very large languages, and a small group of people speak a lot of different languages!

The majority of the people of the world speak one of  9 languages: "Chinese"/Mandarin, English, Spanish, Russian, Hindi, Japanese, Portuguese, Bengali or "Arabic" (click here to understand more). 

We have a great diversity of languages alive today, roughly 7,000 languages and 140-260 language families (depending on which historical linguist you trust). Most of the world's languages (3,517) are however spoken by less than 1 thousand people each. 

What about the families then? We can learn from  the Ethnologue catalogue of languages that 87% of the world's population (5,9 billion people) speak a language from one of only 6 language families. Linguists think that there are between 141-260 language families in the world, so this is just a small subset of the total diversity of families (read more here). Below follows numbers taken from the latest edition of Ethnologue*. 

The 6 language families with the most speakers

Language family Living languages Number of speakers
Count Percent of all languages  Total Percent of all speakers
Indo-European 440 6.2% 3,077,112,005 46.32%
Sino-Tibetan 452 6.37% 1,355,708,295 20.41%
Niger-Congo 1,526 21.5% 458,899,441 6.91%
Afro-Asiatic 366 5.16% 444,845,814 6.7%
Austronesian 1,224 17.24% 324,883,805 4.89%
Dravidian 85 1.2% 228,108,690 3.43%
Total 4,093 57.67%  5,889,558,050 88.66%

(It's worth noting that there is a language family that has more languages than Dravidian, but fewer speakers: the Trans-New Guinean language family as 478 languages, but "only" 3,553,780 speakers.)

While Indo-European is the language family with most speakers, the reader will notice that it is not the family with the most languages(!). Only 6% of the living languages of the world are Indo-European (440/7,099).  6% is not half. 46% is almost half though!

Of the people who speak an Indo-European language (46% of the world's population), most of them (54%) speak one of only 6 languages. The table below gives the 6 most populous Indo-European languages today and their speaker numbers in millions. 

Language Speaker population in millions
Spanish 437
English 372
Hindi 260
Bengali 242
Portuguese 219
Russian 154
Total 1,684

It's not only Indo-European languages that are very big, "Chinese"/Mandarin, Japanese, "Arabic" and Lahnda etc are also massive (see more here). 80% of the total earth's population speak one of only 100 languages. 

To learn more about how linguists classify languages and families, please see this previous post on practicalities of counting in the two catalogues of languages, Glottolog and Ethnologue.

Remember, there's roughly 7,000 languages out there. What about the other? Well, 6,660 languages in the world are in fact spoken by a total population of only 1,3 billion people. Most of them (3,517) are spoken by less than 1 thousand people each. Most research is carried out on these massive families, but in order to understand human history, we're going to have to dig into the other languages and families as well! 

Linguists do not rank languages by importance depending how many speakers they have, each language carries a full system of expression and it can reveal the history of humankind, they all need to be studied. A most diverse sample gives us a more accurate picture.

These languages with less than 1 thousand speakers are increasingly losing ground, we're losing the heritage of thousands of years of humankind's history at a rapid rate. Within 100 years, most of these languages will be gone. Let that sink in. Welcome to Monday.

Also, thanks Philip Perry for writing this article and giving us a chance to enlighten everyone's day with some language stats!


Here are two bonus maps that illustrate the above discussion that languages are not evenly distributed over the world's population. The first map shows languages per country, the second shows population per country. As you can see, the two maps are not the same.

Map from Worldmapper, where each country is scaled relative to how many languages it has.
© Copyright Sasi Group (University of Sheffield) and Mark Newman (University of Michigan).

Map from Worldmapper, where each country is scaled relative to how many people live there.
© Copyright Sasi Group (University of Sheffield) and Mark Newman (University of Michigan).
Simons, Gary F. and Charles D. Fennig (eds.). 2017. Ethnologue: Languages of the World, Twentieth edition. Dallas, Texas: SIL International. Online version:

* Ethnologue is not infallible and has its problems, but it is the most well-covering source of population statistics on languages that I know and that is freely available. If you have any other suggestions, please do contact us.

