What Are Language Families? Part III: Unusual Cousins

Sam Quillen
6 min readMar 6, 2024
Inhabitants of the world’s most linguistically diverse country (photo credit: Tomas Griger).

A significant majority of human beings today speak languages that fall into the Indo-European, Sino-Tibetan, or Afro-Asiatic language families; the first alone covers 46% of people. However, of the seven thousand or so individual languages that exist today, most belong to other, smaller families. Studying them reveals ancient connections and ways of describing the world that would probably never occur to those who do not venture from their linguistic comfort zone.

One of the largest families worldwide is also one of the most surprising. Of the 380 million or so people who speak Austronesian languages, the vast majority live in what are now Indonesia and Malaysia. But their cousins colonised Pacific islands from Fiji to Hawaii, all the way to New Zealand and Madagascar. The discovery that Indonesians, Hawaiians, Maori, and Malagasy people (i.e., Madagascar), along with countless others, are all related caused some shock, not least amongst Indonesians, Hawaiians, Maori, and Malagasy people. It turns out they all originated in Taiwan, and over millennia had forgotten their ancestral connection — linguistics made it possible to reunite long-lost cousins.

Some families have to travel really far to go home for Christmas.

Another big connection is among the Niger-Congo family of Sub-Saharan Africa. The Bantu were essentially the Aryans of Africa, sweeping across the continent starting around 1000 B.C., and leaving a legacy shared by most black Africans today. They replaced the ancient inhabitants of the continent, known as Pygmies and Bushmen (or Khoisan), who are racially distinct from the rest of humanity.

The latter group, however, has left a major legacy: the use of tongue clicks as consonants was once a unique trait of the natives of Southern Africa. But it was adopted by Niger-Congo colonists like the Zulu (who arrived in the area around the same time as white people), and is now associated worldwide with African languages.

Lest we Indo-European speakers think such unexpected connections are something exotic, some of the most bizarre kinships are in the heart of Europe. Hungarians are descendants of Asian nomads who conquered the Carpathian Basin in the 1st Millennium A.D. Into the Middle Ages, German visitors commented on the Asiatic complexion of Hungarian aristocrats. They have become thoroughly Europeanised, but their language betrays their ancient origins. Viktor Orbán can set himself up as the defender of European civilisation if he likes, but his people’s closest racial kin are Siberian tribes in the Ural Mountains.

Another distant Uralic cousin of the Hungarians are the Lapps (or Sami) of northern Scandinavia. In Arctic isolation, they have retained some of their distinctness. Most of the Uralic peoples of the region were assimilated by the (Indo-European) Swedes and Norwegians, with a notable exception: the Finns.

After around 4,000 years, languages drift apart so much that any connection becomes nigh on impossible to trace. But one can try. Some linguists have marshalled fascinating evidence of kinship between Navajo, a major Native American tongue, and the languages of the Yenisei Valley of Siberia. Twelve thousand years or so after some tribesmen crossed the Bering Strait into the Americas, some deep ancestral connection remains.

Some languages change rapidly, while others remain remarkably similar over time. English and the Romance languages have bounced around like crazy over the centuries, as political chaos and worldwide economic connections brought in new speakers to garble and add to them. Modern English speakers have an easier time understanding Spanish than Old English.

By contrast, today’s Icelanders would have little trouble conversing with their Viking forebears. With a stable population and relatively few new learners, a language can pass down quite intact. Tamil, a Dravidian tongue whose roots in South India predate the Indo-European invasions, is sometimes called the last living Classical language — as if Sicilians still spoke Ciceronian Latin.

Dravidian speakers were probably the original inhabitants of the whole Indian subcontinent — a few pockets still remain in the north. But their language and culture are as strong as ever in the south. Today’s Tamils speak essentially the same language their ancestors did in the days of the Roman Empire. (Photo credit: Times of India.)

In English, bizarre grammatical features like three genders and tacking on new suffixes to nouns depending on where they appear in the sentence disappeared a millennium ago. But in isolation, languages retain these eccentricities — and can even get weirder.

The high mountains and isolated valleys of the Caucasus region were traditionally considered the homeland of Indo-Europeans, but today the region is inhabited by speakers of some of the strangest language families in the world. Tsez, spoken in Dagestan in southern Russia, has 42 grammatical cases, meaning words transform in 42 different ways depending on their role in a sentence. Karbadian has 48 consonants and only 3 vowels, which are often not pronounced in speech (Hawaiian, by contrast, has five vowels and only eight consonants). Georgian, the largest Caucasian language, has a grammar so bizarre that linguists have had to invent new vocabulary to describe it.

But all that looks like Esperanto compared to Dyirbal, a native language of the Australian Outback. Dyirbal speakers are entirely forbidden from speaking to their mothers-in-law (some other cultures may empathise with this), and must use an entirely different set of vocabulary when addressing older relatives. It has four grammatical genders, one of which is for fruit and another for women, fire, and dangerous items.

South Ossetia is a Russian (and Syrian)-backed breakaway state in Georgia. Its people are the last descendants of Iranian horsemen who terrorised the Roman Empire.

The most linguistically diverse place on the planet is New Guinea, where small tribes are so isolated and hostile to one another that people often belong to an entirely different language family than their neighbours in the next valley over (again, consider that this means they have been separated for at least 4,000 years). Of the 7,000 or so languages in the world, one in seven is endemic to New Guinea.

In such extreme isolation, languages often lack features we cannot imagine doing without. This is a fascinating issue I have covered at greater length elsewhere, but one example is that many tribal languages lack words for colours. When everyone has the same frame of reference, it is easy enough to define things in terms of a certain flower, or the sky on a sunny day; but when they do need an abstract term, for whatever reason the first colour people come up with a name for is red. Another area is numbers: in many Aboriginal Australian languages, counting is simply “one,” “two,” and “many.”

People typically associate language isolates with outliers like Basque, but both Japanese and Korean are also one-of-a-kind. However, because Chinese characters have no relation to phonetics, a Chinese and Japanese person could fairly easily communicate in writing, in spite of speaking entirely unrelated languages.

Some major languages preserve features of ancient tongues that have nothing to do with them. Above sixty, French has a base twenty number system, i.e., 91 would be quatre vingt onze, “four twenties-eleven.” This is probably something their forebears picked up from Gaullish, the long-dead Celtic language that reigned before Caesar arrived.

It makes perfect sense that Tagalog, the most popular language in the Philippines, picked up a lot of Spanish vocabulary over centuries of Spanish rule. Perhaps more unusual is that modern Filipinos also use a lot of words from Nahuatl, the language of the Aztecs.

While most languages belong to larger families, some are alone in the world. The most famous of these is Basque, spoken in the rugged mountains of northern Spain. Their language has nothing to do with Spanish, nor any other language on planet earth. While the rest of the continent was conquered and assimilated by Indo-European speakers in the two millennia before Christ, the Basques held strong. In the foothills of the Pyrenees, you can still hear a living heritage that has survived since before Europe was European.

Roman conquerors, the ancestors of most Spaniards today, referred to the native inhabitants of the land as “Celtiberians.” This refers to the Celts, who were the dominant group in Western Europe two millennia ago, and people who were probably the ancestors of today’s Basques. The Celts have since retreated to a few isolated tracts of the British Isles (and Brittany), but the original Iberians still hold their corner of Iberia.

Language is, of course, omnipresent in daily life, and the defining feature of any culture. In our modern age, a lot of people with a lot of different ideas have sought to reshape language to their own ends. But with the exception of some whittling around the edges, they have all failed. There are vast differences between French and Hindi, or between Hawaiian and Malagasy. But the endurance of language families down hundreds of generations reminds us that, no matter how much we develop and are shaped by our experiences, we never really change who we are.



Sam Quillen

Former linguistics student; current investment bank analyst who sometimes thinks about something other than spreadsheets