Preserving Languages With AI

I was watching a series on Prime Videos, “Outer Range”, which in some sense shares some common threads with the German Netflix series “Dark”. The common thread is time travel. In “Outer Range”, a native American sheriff in Wyoming falls into a wormhole that takes her to the 1880s. Within minutes of landing in the past, she is confronted by natives from that era. She speaks their native tongue to introduce herself. She said something like: “Ne Namya Joy”. Since the character’s name is Joy, it was obvious that Ne Namya meant “my name is…”.

That fascinated me since, in the root language of my native tongue, Sanskrit, “name” translates into “namya”. To understand which specific Native American dialect it was, I turned on the subtitles. It was the language of the Shoshone tribe. Further research on the language revealed that many of these native American dialects are either extinct or nearly extinct. I then thought about how technology can play a role in helping preserve languages close to being extinct.

Among other efforts to preserve these languages, AI-enabled language preservation can be key.

Many emerging technologies can come into play, but deep learning is the core. Building AI algorithms that can detect languages is not a big feat. You can train an algorithm in languages with abundant training material available in days. With languages on the brink of being extinct, a lack of training data will be a challenge.

This is where human help comes into play. Manpower, proficient in these languages, can be leveraged to create sufficient training data. Once you have the data, you can build an algorithm that is the “guru”, the digital keeper of the language.

With languages that are mostly preserved in verbal form, like many native American languages, this first stage of building the training data is the most challenging. The training data needs to be both written and verbal (audio). And needs to be in volumes that are optimal for deep-learning models.

But that addresses only a part of the problem. You don’t want to just preserve the language. You also want to propagate the language. You not only want the language to be available for future generations, you want them to actually be interested in learning that language, and then be able to learn that language easily.

This is where other technologies, like augmented reality, come into play. In tandem with deep learning algorithms, apps, games, and virtual classrooms can be created to introduce the language to the kids. Adaptive and interactive apps and games, both with and without augmented reality, are already available to teach many topics to kids. So, the core is obviously the deep learning algorithm, but the overall solution needs to be built creatively. Curriculum design will play an important role.


Leave a comment