That infographic, again ;)

In 2015 I wrote a blogpost about Alberto Lucas López visualisations of the worlds languages. I answered some frequently asked questions in relation to that visualisation, mostly to do with Ethnologue's definitions of languages, macro-languages and speakers. There's a lot more context needed to fully understand that infographic, and every time I see if re-shared I see the same questions pop up. It's a good infographic, so I understand that it goes viral - but when the same questions come every time it means that more context is needed. 
Since then, Alberto (who is now Senior Graphics Editor at the National Geographic) has released an updated version, which among other things fixes the color of Mexico. I haven't gone through to check what else has been adjusted, but many of the same questions will remain. This is because Ethnologue's classification of what is and what is not a language (which still underlies the visualisation) is still controversial at times and the g…

Brust, breast, borst: an encounter with r-metathesis

Two months ago I gave birth to our second daughter. In order to prepare for this joyous event, I prepared by trying to get some of the local (German) vocabulary on labour & babies in my head. One of the words I had some trouble with was Brust 'breast'. Basically, my German reading is pretty decent, but speaking and writing are another matter, I just don't have enough vocabulary at the ready, hence my quest. Until now I could get away with blaming my high school education, where I suffered from a then new policy to split up second language education in a compulsory reading module and an optional speaking & writing module that I did not take. Having lived in Germany for over two years now, it's getting rather embarrassing though.

Anyway, back to Brust. The reason I found it confusing is that compared to my native Dutch, the r is in the wrong place: in Dutch it's borst 'breast'. Hmm. English breast has the r in the same place as German though. Then wh…

Having fun with phrase structure grammars: Midsomer Murders and Beatles

This post is about phrase-structure grammars, which can be both entertaining and educational. If you're a linguistics student, you will be interested in this. We’re going to learn how to define a little set of rules for a made up language, and then generate possible sentences in that language based on the rules. We can also use it to test if something is grammatical in our tested language.

You may already be familiar with phrase structure from linguistics class, or parsing in programming. Regardless, this introduction is accessible for everyone - including novices.

We will first learn the basics of these little rules, and then illustrate by generating random plot summaries for possible episodes of the TV show Midsomer Murders (à la the Midsomer Murders Bot on twitter) and also Beatles lyrics.

Even Barnaby can see the templatic nature of the show.
How many nas do we need to generate this song? Nearley parser
We will be using the Nearley parser, a computer program that helps parse se…