Counting time!

Let's start with some shameless self-promotion: this week, Andreea Calude and I published a paper on Indo-European numerals. We investigate how these languages form numerals beyond 1-10, so 11-99, 100s, and 1000s. There are many famous languages around with crazy numeral systems but we were interested in both the regular and the crazy. For that purpose, we investigate data from Eugene Chan's amazing database 'Numeral system's of the world's languages' to get an overview of how regular and how crazy the Indo-European languages are.

Turns out there are indeed some well-known crazies to be found in Indo-European: Welsh, which forms 18 by 2 * 9, Breton, which forms 50 by 1/2 * 100, and Danish, which uses a base 10 for 20, 30, and 40, but a base 20 for 50, 60, 70, 80, and 90.

But aside from these rara in the expression of particular numbers, most languages form higher numerals in very regular ways. One particular interest of us is when languages start using syntagms rather than atoms. Atoms are lower numerals with a unique, non-compositional lexical expression, such as English four, and syntagms are composites of atoms. So English eighty-five is a syntagm composed of atoms eight and five (and a base 10 ty). Turns out all of the languages in our sample make the switch from atoms somewhere between 11 and 13: most languages have a syntagm for 11, a few start at 12 (Catalan and Marwari), and famously the Germanic languages, including English, start at 13. So there is a little variation, but none of the languages in our sample used atoms all the way up to 19, for instance. Welsh might be weird in that it uses 2*9 to form 18, but at least it doesn't have a atom for 18...

Once languages have syntagms, we looked at the order of atom and base for what we call teens (11-19), crowns (20, 30, 40, 50, 60, 70, 80, 90), and running numbers (21-29, 31-29, 41-49, 51-59, 61-69, 71-79, 81-89, 91-99). I had a personal interest in studying the last category, as my two languages (English and Dutch) have opposing orders (English eighty-five '80-5' is base-then-atom; which is vijf-en-tachtig '5-and-80' atom-then-base in Dutch) and I am always struggling to get it right. It just so happens that the famous typologist Joseph Greenberg has published a cool article on numeral systems, also talking about the order of atom and base.

He finds that if languages have both atom-then-base AND base-then-atom order, it's always the case that they have atom-then-base for the lower numerals, and base-then-atom for the higher numerals, never the other way around. Many Indo-European languages, like English, switch from having atom-then-base order in the teens (eighteen '8-10'), to base-then-atom order in the running numbers (eighty-one '80-1'). Others have atom-then-base order for both the teens and the running numbers (most of the Indian languages in our sample) and a few have base-then-atom order (Wakhi, Modern Armenian, Tocharian). But, in line with Greenberg's universal, no language in our sample changes from base-then-atom order to atom-then-base order.

As for English having base-then-atom order for running-numbers, I believe this must be due to pressure from the conquering Scandinavians and/or Normans during the formation of Middle English, as Old English still had ancestral West Germanic atom-then-base order. Damn those Vikings and Normans for making my bilingual life difficult!

In the paper, we then go on to reconstruct the ancestral order of atom and base, and we look at correlations between the order of atom and base in numerals and other word orders. We do this using phylogenetic comparative methods which are great for studying historical change in typological features such as these. You can read all about that in the paper.

But here I'd like to expand a little on one of the other questions that arose when we were writing this up. Not all languages are like the Indo-European languages: some languages do not have numerals at all, or they have a restricted set that stops somewhere and cannot be used for the derivation of infinitely higher numerals. Bernard Comrie's WALS chapter on numeral bases lists 20 languages with such a 'restricted' system, out of 196 languages. We started wondering about the dynamics of change between 'restricted' and 'productive' numeral systems: Given that productive numeral systems are so useful for counting, once you have it, can you lose it? Are they faithfully inherited as language families diverge, or are they frequently borrowed? We know that languages with restricted numeral systems can lose numbers, as investigated by Kevin Zhou and Claire Bowen for Pama-Nyungan languages of Australia. But hardly anything is known regarding the dynamics of change between restricted and productive systems.

In the paper, we shortly mention the Arawakan language family, to which languages belong with both restricted and productive systems. Comrie (2005) samples three Arawakan languages, two of which (Baré and Achagua) have restricted number systems, while the third, Arawak (Lokono), has a vigesimal number system. In the 'Numeral system's of the world's languages' database, information is available on 36 Arawakan languages, of which more than half, 20, have restricted number systems (they have numerals for 1, 2, 3, sometimes up to 5). Another 8 have traditional numerals until 20. Only 8 have truly productive systems. What is interesting aout the Arawakan languages in the database is the comments on where these systems are coming from: for several of the languages with productive numeral systems, it is remarked that the language has "developed" this system, suggesting that ancestrally, all Arawakan languages had restricted numeral systems. However, for many of the languages with restricted systems we find comments that these have "lost" their numerals - this would suggest that Arawakan languages had productive systems to start with! The fact that many speakers of Arawakan languages have now adopted the colonial French, Spanish, or Portuguese numeral systems does not help with uncovering changes between restricted and productive systems in the Arawakan language family. 

As a last note, the Arawakan numeral systems seem to be based on body counting (see here for the Mehináku system), using the fingers (and toes) to count. Some nice pics on different methods of counting can be found here.


Popular posts from this blog

ELAN: making tier(s) out of search results