Language population data from Google (FREE!)

Google's linguistics research team have put together a great big table of speaker population size data and other information. There is population data for over 5,000 languages.

It's free, it includes sources and you can get it here: https://github.com/google-research/url-nlp/tree/main/linguameta

Proper citation is:

@InProceedings{ritchie-etal-2024-linguameta-unified,
  author    = {Ritchie, Sandy and van Esch, Daan and Okonkwo, Uche and Vashishth, Shikhar and Drummond, Emily},
  title     = {LinguaMeta: Unified metadata for thousands of languages},
  booktitle      = {Proceedings of the Joint International Conference on Computational Linguistics, Language Resources and Evaluation},
  month          = {May},
  year           = {2024},
  address        = {Torino, Italy},
  publisher      = {European Language Resources Association},
  pages     = {10530–-10538},
  abstract  = {We introduce LinguaMeta, a unified resource for language metadata for thousands of languages, including language codes, names, number of speakers, writing systems, countries, official status, and geographic coordinates. The resources are drawn from various existing repositories and supplemented with our own research. Each data point is tagged for its origin, allowing us to easily trace back to and improve existing resources with more up-to-date and complete metadata. The resource is intended for use by researchers and organizations who aim to extend technology to thousands of languages.},
  url       = {https://aclanthology.org/2024.lrec-main.921},
}




Comments

Popular posts from this blog

A Global Tree of Languages

Language family maps

That infographic on languages of the world - some context to help you understand what's going on