Wednesday, July 5, 2017

What languages are grammars of the world written in?

Humans have been writing grammars for a long time. The serious expansion into non-european languages is fairly recent though, and associated with colonialism and Christian missionary work. Because of this, it's interesting to see in what language grammars are written in (meta-langauge) as well as what language their about (target-language). In the map above, this is precisely what we see - what the meta-languages of Glottolog language descriptions are.

There's roughly 7,000 languages in the world alive today, and we have some kind of description of approximately 4,000 of them. If you want to find them, go and search Glottolog.

Harald Hammarström, one of the editors of Glottolog, recently shared with me some interesting data on these descriptions that I want to share with all of you. In Glottolog, descriptive references are tagged for which language their in (meta-language) as well as which language they are about (target-language)*.  The map above gives the distribution of meta-languages of the descriptions of 4,005 languages in Glottolog. For each language on the map above there is only one dot with only one color. The color is according to the meta-language of the Most Extensive Description for said language**.

In this map we can clearly see the domination of English as a world language, but we can also so the prevalence of French in former French colonies in Africa and naturally the national languages of the modern nation states like Brazil (Portuguese) and Indonesia (Indonesian).

If we look a bit closer at this data we can see exactly how many target-languages there are per meta-language in total, as well how many documents in Glottolog there are per meta-language. For those documents where it's possible, Hammarström has also compiled a corpus of the actual content text per document and calculated how many types and tokens there are therein.

The table below summarizes this information for all references in Glottolog, i.e. not only the Most Extensive Description per language. There's a total of 96 meta-languages in Glottolog, the table summarized the 9 most common.
Here is an interactive graphic showing the same data as the table above:

We hope you enjoyed that, be sure to explore Glottolog yourself if you haven't already!

* In bibTeX-entries for Glottolog references, meta-language have the entry field "inlg" and target-languages have "lgcode". 

** Most Extensive Description is first sorted by descriptive type (Grammar>Grammar Sketch> etc), then number of pages and lastly publication year.

1 comment: