"Le Petit Prince" as a paralleltext has just become even more interesting

We've talked here before about researching the diversity of the worlds languages through parallel corpora, and I provided a little list of some interesting items in this respect (also found below). Basically the idea is that by comparing text that are very similar to each other in meaning we can find differences and similarities of lexicon an structure without having to consult descriptions of the languages (which for example involve a lot of interpretation). This is not a perfect method, but it can serve as a complement and answer some interesting questions that cannot be solved by descriptions only.

In that list one can find "Le Petit Prince" by Antoine von de Saint-Exupéry which has been translated into at least 216 languages. We are much pleased to inform you that the latest addition to that list of translations of "Le Petit Prince" is into Casamance Creole [pov, upper1455]. Casamance Creole is a contact language spoken in southern Senegal and has gotten the majority of its lexicon from Portuguese. In contact linguistics we call this the lexifier, i.e. Casamance Creole is a Portuguese-lexified creole language of West Africa. It is the native language of at least 10,000 people in and around Ziguinchor. You can read more about the language here. It is Nicolas Quint, Noël Bernard Biagui and Joseph Jean François Nunez that have created the translation and you can get it here or at amazon. They are all linguists who have worked on this and other languages of the area. 

Speaking of creoles and parallel texts, among the many languages that "Le Petit Prince" has been translated into we actually find quite a few creoles: Moriysen, Kabuverdianu, Kréyòl Gwadloup*, Kréyol Matinik*, Haitian, Seselwa , Reunion and Guianese. You can see (nearly) all languages that have a translation here. 

Now, it needs to be said that all these languages are either French- or Portuguese-lexified, which means that a comparison might not be exactly as exiting as if the sample was more diverse (perhaps even containing non-indo-european lexified contact languages). On the other hand, having a smaller set of closely related languages for comparison means we can make better predictions about what is correlating with the differences since we can control for more variables. We can compare these different languages in terms of the features of APiCS, but we could also complement this with some comparisons of the parallel corpora. In many of the features of APiCs these languages are similar, but perhaps we would find something we didn't think to look for if used the parallel texts.

The French-lexified langauges that have a translation of "Le Petit Prince" are: Moriysen, Kréyòl Gwadloup/Kréyol Matinik, Haitian, Reunion, Seselwa, Reunion and Guianese. The Portuguese-lexified are: Kabuverdianu and the new-comer Casamance Creole. 

There are 9 languages in APiCS that are French-lexified, here is a screen dump of an interactive map from APiCs showing where they are:

There are 14 Portuguese-lexifed, they are here:

If you want to know more about these languages and how they are alike and differ, have a look at them at the Atlas of Pidgin and Creole Language structures online (APiCS). They're all there and they have been filled in for lots of features, go check it out.

List of interesting items that are possible to utilise as parallel corpora
1700+ New Testament
500+ Bible (entire)
419 Universal Declaration of Human Rights
240 Le avventure di Pinocchio
220 The Watchtower, Announcing Jehovah’s Kingdom
216 Le Petit Prince
184 the phrase “My hoovercraft is full of eels”
153 Eventyr (H.C. Andersen)
112 Astérix le Gaulois
97 Alice in Wonderland
67 Harry Potter
64 Pippi Långstrump
61 Kalevala
56 O Alquimista
45 L’Etranger
43 Mumintrollen
40 The Hobbit
39 Through the Looking Glass
30 Millenniumtriologin
21 EuroParl - Proceedings of the European Parliament
3 The Battle of Little Big Horn in English, American Sign Language and Plains Indian Sign Language

* Glottolog and ISO 639-3/Ethnologue lump Kréyòl Gwadloup and Kréyol Matinik into one language, gcf/guad1242. However, the Atlas of Pidgin and Creole Language Structures (APiCS) gives two entries, one for Kréyòl Gwadloup and one for Kréyol Matinik. We cannot know exactly why this is, classifying languages into different dialects or different languages is as we've talked about before here on the blog not an easy task. To the right is a map of some languages that can be found in APiCS in the Caribbean, to illustrate how close geographically Kréyol Matinik (Mauritian Creole) and Kréyòl Gwadloup (Gaudelopean Creole) are to each other.


Popular posts from this blog

Language family maps

My ELAN workflow for segmenting and transcription

A Global Tree of Languages