Ethnologue changes access, again! Clarifying points

News in brief.

Ethnologue, as of October 26, have changed their access conditions on the site. Instead of getting 3 free page views per month, users can now see all pages on the website but not all information on them. To the right are examples of what the views look like for Country and Language pages. This has sparked negative emotions.


They are also pushing more for their guide pages, which old users may notice is very similar to the "Statistics" pages of older editions but with less information. These guide pages seem directed more at educators than academics.


Just like with the previous access restrictions, these are not levied against users in certain countries with low mean incomes. 
They have also launched a contributor program, which will enable people who contribute to access Ethnologue freely.


SIL International is the publisher of Ethnologue, they are a "faith-based" organisation and while they claim to not be missionary, they work closely with and are funded by their sister organisation Wycliffe Bible Translators who are explicitly Christian missionaries. You can read more about the finances of SIL International here. They also publish other resources besides the Ethnologue.


SIL International are also the official registration authority for ISO 639-3 - the most popular of the ISO standards for language names & codes. It is hosted at a website separately from the Ethnologue and that website is not under any access restrictions. You can see all old change requests and updates to the classifications there. Even though that ISO 639-3 information is technically stored separately, it is not that useable since
it lacks information on geography, genealogy, or alternative names (as Cysouw pointed out last time around). One would have to swap back and forth between Ethnologue and ISO 639-3 and hope there's enough information in the limited view to figure out what is what.

See also our old blog post about the 2016 change.


Alternatives to Ethnologue

Ethnologue is a great resource that have served academics well for a long time, and the ISO 639-3 code standard is very practical. However, perhaps time has come for Ethnologue to redefine their target audience and for academics to go elsewhere. The limited information provided to non-subscribers is indeed very minimal, and it is not clear that Ethnologue offers enough added value compared to other resources to warrant asking your local university library to subscribe.

There are several other resources that provide similar services to Ethnologue for free, and of these Glottolog is the most comprehensive. Glottolog.org offers many of the same functionalities as Ethnologue and ISO 639-3. You can find the following information there:

  1. Language classification (what counts as language versus dialect, by their standards)
  2. Language codes for languages, dialects and families and all nodes in between (handy if you disagree with their classification in (1)
  3. Language locations (points, not polygons)
  4. Endangerment status and descriptive status per language
  5. References per language
  6. Alternative names
Glottolog's codes are also mapped to ISO 639-3, so you can quite easily convert your old data to Glottocodes.

Below is a table comparing Glottolog and products by SIL International on more points:


SIL International Glottolog Other resources
Language codes Yes Yes (also for families and dialects)
Open Access? No, mostly behind paywall Yes, Open Access (CC-BY)
Alternative language names Yes Yes, including names from Ethnologue, OLAC, MultiTree, AIATSIS etc OLAC, MultiTree, WALS, AIATSIS
Population stats Yes No
Language bibliography Yes, 42.000+ references Yes, 180.000+ references OLAC
Endangerment information Yes Yes, but derived from Ethnologue and other resources (→) UNESCO Atlas of Languages in Danger, ELCat
Descriptive status No Yes
Genealogies Yes, but not referenced Yes, and referenced MultiTree, D-place Phylogenies
Language area polygons Yes, but not freely available (costs est 5.000 USD) No Partial: https://native-land.ca/ and others
Countries per language Yes Yes
Long/lat point per language Derivable from polygons Yes
Genealogical classification tendencies Merge Split
Handling of contact languages Creoles, Pidgins and Mixed all in their own 3 separate families Creoles appear within their lexifier's family, pidgins and mixed in own 2 families
Handling of sign languages In their own family with no hierarchy In their own family with some hierarchy based on history and type
Handling of isolates All in one family Separated out (no Family_ID = Isolate)
Requests for changes Form at iso639-3.sil.org GitHub Issues
Transparency in decisions Changes in ISO 639-3 are mostly well described, most other information per language is not referenced. Almost everything is tied to a published reference
Dialects Yes, listed but not as meticulously managed as “languages” Yes, listed but not as meticulously managed as “languages”
Criteria for being a language Mutual intelligibility, shared cultural identity, shared literature Mutual intelligibility, lexical similarity
“Faith-based” Yes No



Problems Ethnologue and Glottolog share
It can be tricky for users of both catalogues to easily understand the reasoning behind certain decisions and lodge requests. For genealogy, Ethnologue does state that the sources used are available on request, but they are not provided for each tree and language up front (they are for Glottolog). Furthermore, changes to Ethnologue and ISO 639-3 should be submitted in different places (here and here resp.) For Glottolog, one must have basic GitHub skills to navigate the backlog of decisions and submit new. For example, you shouldn't go to the clld/glottolog3 repos for data decisions which is where one of the link on the site takes you, but to glottolog/glottolog.

These obstacles are by no means insurmountable, but they are there and they will most likely result in certain changes not being lodged and certain users not being involved.

Usage and aims 

When providing a comprehensive resource, like Glottolog and Ethnologue do, it is key to be entirely clear on what the aim and target audience is (and what they are not). The Ethnologue user audience is currently changing, whether SIL International wants it to or not. Glottolog will be a good resource for some of those lost users within academia, but probably not all.

ISO 639-3 is not just used by academics, it is also used in NLP, Wikimedia, HTML, unicode, libraries and more.

Wikipedia pages on languages now list both the ISO 639-3 code and glottocode (and linguasphere codes).


***

Hopefully this will clear things up for many disappointed Ethnologue users and clarify if Glottolog is the right choice for you in your future research. 

All the best, 

Hedders.

Comments

  1. It seems that

    > "The Ethnologue user audience is currently changing, whether SIL International wants it to or not"

    contradicts your claim that

    > "When providing a comprehensive resource [...] it is key to be entirely clear on what the aim and target audience is"

    Of course, providing something without hope that anyone wants it would be pointless. But other than that, I'm convinced the "throwing things over the fence" publication strategy has led to a lot of value for both, data and software publications.

    ReplyDelete
    Replies
    1. I don't think that is a contradiction. They are clearly communicating what they imagine that target audience to be. The fact that the real world is changing so that it doesn't line up with their intentions is unfortunate, and should maybe cause them to change their aims, but the communication is still clear.

      I also think that there are benefits to "throwing things over the fence". Neither extreme position (overanalysing target audience or not caring at all) is good, I'm not falling for slipping down some slope here.

      Delete

Post a Comment

Popular posts from this blog

Having fun with phrase structure grammars: Midsomer Murders and Beatles

That infographic on languages of the world - some context to help you understand what's going on

My ELAN workflow for segmenting and transcription