What Are Language Families? Part II: Hereditary Messiness

Sam Quillen
6 min readFeb 1, 2024

Of the 7,000 or so languages that exist in the world today, most belong to larger families. In Part I, we explored how this works through the example of the Indo-European family, the most prolific of them all. But although we can trace the roots of modern tongues back into prehistory, five thousand years of trade, war, and cultural entropy can make things complicated.

I commented last week that, while working in Pakistan, I have noticed some traces of commonality between Urdu and English. Indeed, they are distant cousins in the Indo-European family, their speakers having parted ways around 3,500 years ago (roughly twice as much divergence as the Romance languages have from one another).

But although many Pakistanis know some English, few are aware of this. Everyone knows that their languages are related to India’s (also in the Indo-Aryan branch), and a bit more distantly to Iran’s (of its own eponymous branch). Many also feel a kinship to Arabic. This is natural — Pakistani languages (each province has its own) have a lot of Arabic words, and use a Persian-adapted version of the Arabic alphabet. But, as it turns out, each of these clues is a linguistic red herring.

I have some issues with this map, but it is generally sound.

Arabic (in its manifold varieties) is actually the principal member of the Afro-Asiatic family (in a more Biblical age, this was known as the Hamito-Semitic family). Its closest living cousin, albeit a harshly estranged one, is Hebrew, which is approximately as close to Arabic as German is to English. Others include Aramaic, the lingua franca of the ancient Near East and native tongue of Jesus; the languages of Ethiopia and Somalia; and Berber, of the mountain tribes of North Africa.

These languages are known, among other things, for grammars based on consonant triplets. In Arabic, for example, kitab is “book,” kutub is “books,” kataba is “to write,” and maktaba is “library.” The triplet S-L-M has to do with peace, which gives us Arabic salam, Hebrew shalom (if only the two could get on the same page), and Amharic selami.

For whatever reason, this is the preferred family of a certain monotheistic deity. Divine favour did relatively little to spread Hebrew’s influence, and even less for Aramaic, whose speakers mostly drifted over to Arabic in the Middle Ages. But it did propel Arabic to world prominence. However, the proliferation of Arabic’s vocabulary and script from Morocco to the Indies actually says nothing about linguistic kinship.

Unlike spoken language, writing is not hardwired into the human brain. It is a cultural phenomenon, adopted for historic, rather than natural, reasons. While spoken languages reveal a people’s racial kinship, written ones speak to political, economic, and cultural dominance.

This temple in Seoul is adorned with Chinese characters, whereas modern signage is in the native Hangul script the nation adopted in the 20th Century. (Photo credit: University of Notre Dame.)

In medieval Europe, barbarian peoples like the English and Germans adopted the Latin alphabet (and Slavs the Greek one) for their entirely un-Latin languages. Likewise, in the age when Islamic empires dominated that part of the world, many nations adopted the Arabic alphabet to write their own languages.

The process had a close parallel farther east. There, China reigned supreme. The budding nations of Japan, Korea, and Dai Viet (Vietnam) fell into its orbit, and, in spite of the fact that their languages were as unrelated to Chinese as English is, adopted the Chinese script. Such a process is never smooth. But it is particularly so when one is dealing with a system that requires memorising thousands of unique characters that make no allowance for the complicated grammatical inflections that pervade Japanese and Korean.

In Northern Europe, scribes developed the letters K, Y (both known to the Romans but used mainly for Greek words), and J (invented by Charlemagne’s brightest as a lowercase I) to express sounds that existed in their languages but not in Latin. Iranians, never keen to let the Arabs have the last word, developed their own flowing version of the Arabic script, which ended up being more popular with non-Afro-Asiatic languages than the original. Meanwhile, in Japan, they had to invent two entire parallel writing systems, katakana and hiragana, to be used alongside Chinese character kanji.

Unlike more genteel newspapers, these Japanese tabloids have their headlines printed mostly in katakana and hiragana, which are simpler (you can even tell by looking at them), and easier to learn. However, the one featuring the explosive photo favours dramatic kanji. (Photo credit: Satoshi Sugiyama.)

While a language can never escape its roots, it can swap out one script for another. The Japanese still use their Chinese-based triple hybrid, but the Koreans ditched characters in favour of a home-cooked syllabary that is probably the best writing system in the world. The Vietnamese, meanwhile, switched over to the new hegemon: the Latin alphabet. The Turks (setting the tone for other Muslim nations) did the same, ditching the Persian script as a keystone of their reorientation toward the West.

A government can, as Turkey’s did in 1928, make an entire nation illiterate overnight. However, it is harder to unpick another type of linguistic influence: loanwords. Borrowing happens naturally all the time (whether the Académie Française likes it or not), but massive cultural influence can bring in so many new words that speakers from each side end up thinking their languages are related.

This generally happens by the same process as adoption of a writing system. Japanese and Korean are pervaded with Chinese borrowings, just as many languages of Islamic countries are with Arabic and Persian. But Japanese and Korean still belong to their own unique language families, just as Persian and Urdu are still Indo-European.

If all this seems confusing and even counterintuitive to you, you are in good company. Even professional linguists can be led astray into thinking a Sprachbund (a community of languages that borrow from one another) is actually a family. For much of the 20th Century, linguists believed in an Altaic family embracing most of the languages of Central Asia, including Mongol, Kazakh, Turkish, and even Korean and Japanese. Today, almost all agree that the former “branches” are actually separate language families, which just happen to have been in constant contact over the centuries.

The Altaic family may be dead, but some of its more surprising connections are very much alive. The Turks, for example, left their ancestral homeland near the Aral Sea centuries ago, but their language still ties them to tribes of Central Asia and the Uyghurs of northwest China.

Other controversies persist. Chinese linguists still tend to place Thai in their Sino-Tibetan family, though most of their colleagues elsewhere disagree. The Nilo-Saharan family, which embraces a diverse array of tongues of Central Africa, is the brainchild of a single Columbia linguist named Joseph Greenberg. It enjoys currency in English-language sources, but the Germans (who are often the best at this sort of thing) insist that the whole thing is a fantasy. It is hard to settle the debate, because not that many academics are fluent enough in Luo, Kanuri, Songhai, or Nubian to analyse their potential genetic ties.

This speaks to one of the most frustrating, but also compelling, facts of language generally: you can never truly understand one unless you actually learn a bit of it. This is relevant not just to arcane academic debates, but to anyone who wants to connect with another culture. But for all the confusion and controversy, there is plenty that is well-established. The large majority of human beings speak languages that fall into the Indo-European, Sino-Tibetan, or Afro-Asiatic families. But, as we will explore in Part III, there are a lot more families out there, which are equally fascinating and sometimes a lot more bizarre.

Welsh belongs to the Celtic branch of the Indo-European family, but they deserve some bizarreness credit for this one. (Photo credit: Business Insider.)



Sam Quillen

Former linguistics student; current investment bank analyst who sometimes thinks about something other than spreadsheets