Monday, April 13, 2015

USA census of languages only has 382 language categories

I like digging around in censuses, especially of language use. I was just poking through the one from the US of A. I've just learned that in their census they collapse all the languages of the world into 382 categories. I thought this was rather interesting and that maybe you would be interested too.

In the census of citizens and residents of the USA they basically ask:

  • Do you speak a language other than English at home?
  • Which?
  • How well do you speak English?

I'm guessing the 382 language categories is due to practical purposes, it seems that interviewers mostly have been using and probably still are using pen and paper. I tried to figure out if it still is the case and will be for 2020, but I couldn't find that information on I could find out that you can answer by mail, phone or interview, probably meaning that it's all still on paper.

Collapsing the 7,000 plus languages of the world into 382 categories makes sense if you have limited resources, i.e. not a smartphone or computer. For comparison, Ethnologue counts 422 languages in the USA, 216 of those indigenous. It's also interesting to note that many of the speaker populations that Ethnologue cites for languages in the USA are from the censuses, either 1990, 2000 or 2010.

Of these 382 language categories, 39 are singled out and there is more detailed information on them. You can read the report from 2011 here. Here's the table of the details of these 39 language categories. Notice how "Spanish" and "Spanish creole" is one category whilst "French" and "French creole" are two (yes we know that there is more than one Spanish resp. French creole). I also can't help but wonder if the "other Indo-European" shouldn't say "other Indo-European excl. Indic".

Here are some other interesting quotes from the census' homepage:

For most people residing in the United States, English is the only language spoken in the home. However, many languages other than English are spoken in homes across the country. Data on speakers of languages other than English and on their English-speaking ability provide more than an interesting portrait of our nation. Routinely, these data are used in a wide variety of legislative, policy, legal, and research applications.


The coding operations used by the Census Bureau puts the reported answers from the question "What is this language?" into 382 language categories of single languages or language families. These 382 language categories represent the most commonly spoken language other than English at home. Linguists recognize several thousand languages in the world and as languages are reported by respondents, they are coded and added to the language list. Due to small sample counts, data tabulations are not generally available for all 382 detailed languages. Instead, the Census Bureau collapses languages into smaller sets. For the list of the 382 individual language codes, click here [PDF – 55k].

Presenting data for all 382 languages is not sensible due to sample size and confidentiality concerns. Therefore we collapse the 382 language codes into more manageable categories. These categories were originally developed following the 1970 Census and are grouped linguistically and geographically. These groups are based generally on Classification and Index of the World's Languages (Voegelin, C.F. and F.M., 1977) and are updated constantly using linguistic books and online resources.
The simplest collapse recodes the 382 language codes into four major language groups: Spanish; Other Indo-European languages; Asian and Pacific Island languages; and All Other languages. A more detailed collapsing puts the 382 codes into 39 languages and language groups. The table below shows how the 382 codes go into the four and 39 language groups. For information on how to get more detail than the four or 39 languages, go to the FAQ.
Why is language information collected?  
One of the main purposes of collecting information on languages is for Voting Rights determination. Information about languages spoken at home and English-speaking ability is used to determine bilingual election requirements under the Voting Rights Act. 
Does the Census Bureau provide the number of people who use American Sign Language (ASL)? 
The three questions used to capture languages spoken and English-speaking ability are not designed to identify those who use ASL. The design of the question is to gather the number of people speaking languages other than English at home, identify which languages are being spoken, and to get the number of people who have difficulty with English (see the FAQ question Why is language information collected?). With that in mind, those who use ASL are presumed to know English. Those who report using American Sign Languages, ASL, or some variation of those words are coded as being English speakers.

I just thought this all was neat to know, and now you know too ^^!

No comments:

Post a Comment