How Many Languages Does ChatGPT Speak?

Sam Quillen
6 min readApr 27, 2023

--

I was recently languishing on an 11-hour flight from Istanbul to Boston, and decided to watch a science fiction movie called Prometheus. It opens with a cyborg learning Proto-Indo-European, the common ancestor of all the languages of Europe, Iran, and north India. The idea is not just to kill time on an intergalactic voyage (as I was) during which the human crew are in cryosleep, but to communicate with aliens believed to have been in contact with humans at the dawn of civilisation.

Of course, there are no texts in Proto-Indo-European, but scientists have reconstructed it based on common features of ancient and modern languages. David the cyborg, with his superior processing power, does that so well that he becomes fluent. As I was watching, a question occurred to me that has occurred to many of us several times a day for the past few months: “Can ChatGPT do that?”

Michael Fassbender as ChatGPT’s great-grandson (photo credit: Twentieth Century Fox)

At first, I did not even know if ChatGPT supported languages other than English, so I started simple: “Habla español?” Sí, it does, as well as German and French. I asked it how many languages it does speak, and it rattled off ten of the first languages you would think of, “among others.”

I was impressed, and wanted to know how it had come to learn so many more languages in five months than I have in twenty-five years. The answer, while unsurprising to people familiar with artificial intelligence, is nonetheless fascinating. ChatGPT can “learn” a language the same way it learns anything else: by analysing vast amounts of data.

This may sound mystifying when one considers all the grammar, shades of meaning, et cetera that human language learners have to master. But it is actually pretty much the same as the way babies master their first language. They do not learn rules or memorise flashcards — they just listen to people talk and figure it out.

They can even do it with multiple languages. My cousin and her husband are raising their kids in Spanish and English, and while they do mix them up sometimes as toddlers, eventually human intelligence triumphs. With access to infinite data, artificial intelligence just does it faster.

My digital interlocutor did mention that, for some languages, it has to “use translators,” which I take to mean finding something online that will translate the query into a language it does speak, and then back out again. Its range is impressive. It appears to speak Farsi pretty fluently, and notoriously tricky Tamil with a few minor errors. When it claimed to know Basque and Acehnese, I suspected maybe it was bluffing, but when I asked it for Norwegian Creole, it did point out that that is not a real language.

After a minutes-long battle of linguistic randomness, I finally found a weakness: when I asked for Quechua, it replied in a broken pidgin of Quechua and Spanish that it could not help me. It made sense, I figured, as the language of the Incas would not have much online data on which to draw.

But that raised another question: might ChatGPT learn Quechua in the future? I asked. It assured me that it was “constantly learning” and might be able to answer my Quechua queries in the future. That raised yet another question. After some back-and-forth, ChatGPT assured me that it is not capable of taking independent initiative to learn a new language, but rather waits for “programmers and developers” to tell it what to do. At least some people in rural Peru will still be relevant in a few years.

This was the state of ChatGPT’s Quechua knowledge on the afternoon of April 26. I thought I had won…

All this is an amusing exercise, but it has some important practical significance. Will ChatGPT-tier AI revolutionise the translation business? Google Translate has started using AI in the past few years, using big data rather than the traditional model using dictionaries and human training. But Google’s program is clearly not very robust. It is adept at creating rudimentary offerings in Igbu, even for major languages like Turkish its fluency is still pretty shoddy. As I wrote after a confusing trip last summer, adventure is better without relying on a robot.

But I admit that some experimentation with ChatGPT’s translation skills shook my confidence. I caveat this with an admission that my assessment abilities are limited to languages I actually speak (otherwise, I am just using other translators to check), which are mostly major European ones. Google Translate generally does these very well. But I came up with a few tests that opened some daylight between these two artificial intelligences.

ChatGPT speaks Latin far more fluently than Google, in terms of grammar and even coming up with words to express modern concepts — for “racecar,” for example, it gave me currus, a racing chariot. When I entered Spanish sentences riddled with spelling and grammatical errors, Google rendered them into word-for-word nonsense, while ChatGPT managed to figure out what I was trying to say. When I asked it, “Parlez-jij Русский?,” a cacophony of French, Dutch, and Russian, it immediately answered in French that it does speak Russian.

Having done a fair amount of research on this, this is by far the most robust and explanatory rendering of PIE I have ever seen. ChatGPT did it in six seconds.

At this point, I climbed out of the rabbit hole and got back to my original question: can ChatGPT speak Proto-Indo-European? Still riding high on my Quechua triumph, I came out swinging, giving it a sentence that was straightforward but included a fair amount of grammatical complexity (see above). ChatGPT aced it almost instantaneously.

It should be said that PIE is pretty well-established, so it can draw on existing scholarly work, rather than doing its own reconstructing. It also does Ancient Egyptian, in fluent hieroglyphs. But when I asked it to reconstruct Proto-Sino-Tibetan (which scholars have yet to do), it refused to do it, again protesting at its incapacity to take independent initiative. I was disappointed because I could have won a prize for that, but it was eerily capable of reasoning out specific questions about the hypothetical language. And to ChatGPT’s credit, its explanations and fluency with PIE are miles ahead of anything else I have seen — it even comes up with creative guesses for colloquial expressions, or what our ancient forebears would have called “the Mediterranean Sea,” which is by no means established in literature on this subject.

Our ancient forebears were barbarians who were too busy raping and pillaging to write anything down. But by analysing languages their civilised descendants spoke, like Latin, Ancient Greek, Sanskrit, and Lithuanian (which weirdly has not changed that much), we can reconstruct Proto-Indo-European

Before writing, something possessed me to give ChatGPT a shot at redemption for the one language on which I managed to trip it up. Four hours earlier, it had apologised for its inability to speak Quechua. This time, when I asked it to tell me about Machu Picchu in Quechua, it did so immediately. The first time, it was clumsily rendered with side-by-side Spanish translations, but by this morning, those were gone, too.

ChatGPT insists that it does not take independent initiative in learning languages. But it is hard to believe that, in the time since I first asked yesterday, there has been a sudden flurry of Quechua queries that prompted some human developer to train it in that language. I appreciate the help with my bizarre requests, but it is all a bit unnerving.

Heaven help us.

--

--

Sam Quillen
Sam Quillen

Written by Sam Quillen

Former linguistics student; current investment bank analyst who sometimes thinks about something other than spreadsheets

No responses yet