Can AI Save the World’s Threatened Languages?

Image of printed page with Cree syllabics with alphabetic transliteration and English translation

Here’s a fun procrastinatory activity for you. Go on ChatGPT, get it to write something — maybe something technical, and then ask it to translate it into a parade of different languages to see what it can and can’t do.

I did this the other day, and the results were quite interesting.

When it comes to the world’s major languages, it has no problem at all, and this seems to span all continents. I was pleasantly surprised that Chat seems to be able to handle translations into the larger African languages, such as Yoruba, Lingala, Sango, Xhosa, Zulu, Kinyarwanda, Malagasy, and others — particularly languages that have official status. It can also handle most of India’s 22 official languages and all the official languages of Southeast Asia, with all their many written scripts.

It can even translate into a number of archaic languages, such as Latin, Old English, Old Norse, Ancient Greek, Biblical Hebrew, and Sanskrit, although, disappointingly, it can’t do Egyptian hieroglyphs or ancient Near Eastern cuneiform.

Chat has some noteworthy holes, though, when it comes to global language coverage. It notably couldn’t translate into any of the languages of post-Soviet Central Asia, like Kazakh, Kyrgyz, or Tajik. It also can’t seem to do any of China’s minority languages, including the big ones like Tibetan and Uyghur. And while its grasp of the Indian Subcontinent is pretty good, it can’t handle most of the northeastern languages of India, like Santali, Meithei, or Kokborok, despite these being official languages.

And closer to home, it seems to draw a complete blank when it comes to the indigenous languages of North America, even the major ones like Cree, Navajo, and Inuktitut. Curiously, though, it does much better with the major indigenous languages from Mexico southward, including Nahuatl, K’iche Maya, Quechua, and Guarani — perhaps owing to the fact that these tongues have longer histories as written languages. The only indigenous language north of the US-Mexico border that Chat seems to be fluent in is Cherokee, which is also has a longstanding written tradition.

When it comes to small, threatened European languages, it’s a mixed bag. It handles the Celtic languages well, including Scottish and Irish Gaelic, Welsh, and Breton. It’ll translate into Basque, Low German, Luxembourgish, and Frisian, but can’t do Romansh, Ligurian, Sardinian, or Provençal. In terms of minority European tongues, there seems to be no rhyme or reason for what it can and can’t handle.

Clearly Chat has a way to go before it’s a true Tower of Babel.

Why does any of this matter? Why would anyone care of ChatGPT can speak Ainu or Tlingit or Crimean Tatar? Because AI is a powerful language learning tool, and as such might hold the key to language revitalization — and the cultural revitalization that comes with this.

I myself can attest to the power of AI as a language instruction tool. In the decade-plus since I’ve been back in Canada from Japan, my Japanese speaking and reading ability has gotten rusty to say the least. In recent months I’ve been making a concerted effort to whip it back into shape, conversing with Japanese friends and reading Japanese material online. And I’ve found Chat to be extremely helpful in doing so. One strategy I use is to get Chat to translate something I’ve already read into Japanese as a means of refreshing myself on the meanings of the different characters. I’ll also try writing in Japanese and then translate it into English to see if what I’ve written makes sense.

I’d love to do the same thing with Cree. Cree (or nêhiyawêwin in the western plains dialect spoken out here) is a language I’ve long wanted to learn, as it’s the traditional language of the land on which I live and is spoken by many of the Indigenous Elders I’ve had the good fortune to meet and learn from. It also represents a dialect continuum that stretches across almost the entire Canadian landmass — from the Rocky Mountains to the east coast of Labrador — and represents one of the most widely spoken indigenous languages of North America, at nearly 100,000 modern speakers.

I’m hoping it’s only a matter of time before ChatGPT is fluent in this and other threatened indigenous languages. While the Wikitongues project has done great work archiving the world’s obscure and endangered languages while also promoting revitalization, it in itself won’t save them. AI just might. There will probably come a time when language learning will simply be a matter of getting the necessary neural implant, but until then, the ability to go back and forth in translation and get real-time answers on the meanings of specific words is the best tool we have on this front.

It wouldn’t surprise me at all if the people behind Wikitongues and other language advocates are already working with AI experts to bring the world’s threatened languages into the 21st century and help ensure their future viability by teaching them to our AI tools. I certainly hope so. This technology holds tremendous promise for empowering cultures that have long been marginalized and promoting multilingualism. Language learning has never been more accessible than it is now. We just need to care enough to put in the study and use the language once we’ve learned some of it.

Oh, and if any ancient language experts are reading this, I’d love to see ChatGPT be able to work with Egyptian or Mayan hieroglyphs or Sumerian or Elamite cuneiform. There’s a ton of ancient material out there that has yet to be translated, and as a historian by background, I would love nothing more than to see all this unlocked.

Postscript: Does ChatGPT Have Political Bias?

One thing that does worry me about AI, aside from the fear that it’ll eventually kill or enslave us all, is the potential for human prejudice and ethno-nationalism to infect these tools. I find it a bit suspicious that neither Tibetan nor Uyghur, both of which are major languages with ancient written traditions, are available on Chat. While this doesn’t necessarily indicate direct Chinese influence on how this tool has evolved, it seems awfully convenient to a Chinese government intent on suppressing these and other troublesome ethno-linguistic minorities.

On a related note, I found it interesting that when you ask Chat to write in Punjabi, it automatically gives you the Gurmukhi (or Sikh) script employed in India by Punjabi Sikhs and Hindus rather than the Arabic-derived Shahmukhi script used by Muslim Punjabis in Pakistan. This strikes me as less nefarious than the above but simply a reflection of the outsized presence of India and the Indian diaspora (and the Sikh community specifically) in the tech world. Indian programmers probably just beat their Pakistani counterparts to the punch. Still, Chat should really ask which of the official scripts you want your Punjabi text in.

Similarly, when you ask for “Serbo-Croatian” (as opposed to simply “Serbian” or “Croatian”) it gives you Latin script rather than Cyrillic, indicating a bias towards Croatian as opposed to Serbian — which are the same language written in different scripts. Again, I doubt there’s anything nefarious going on here. As a member of the European Union with a thriving tech sector, Croatia is simply a more integrated and interconnected country than Serbia, and as such its written language probably has greater currency. Perhaps some enterprising Serbian programmers can do something about this.

This all strikes me as all the more reason why everybody needs to get their act together and promote their languages via AI. This could be a great level playing field for all, and it should be. Hopefully it’s only a matter of time before it is.

Previous
Previous

20 Tips for Writing a Killer Cover Letter

Next
Next

How I Learned to Write Less