Why don’t we just use Google Translate for the Bible?

I recently got into a discussion with a programmer about machine translation and the Bible. He seemed convinced that with some online crowd-sourcing and a few edits, a machine translation of the Bible could work. And then once we had a translation to work from, updating based on dialect/language change would be super simple. But in reality, no translation is simple, and especially not the translation of a holy text.

Machine translation is the Star Trek Universal Translator dream. A computer is able to learn the foreign/alien language and accurately translate it into the language of the listener. We have rudimentary machine translators now, and they have definitely come a long way even in just the last 5 years. I remember using Babelfish, the first free online machine translation software, to “help” with my Spanish homework in high school. More often than not, my teacher could tell right away because the translation was so poor.

Google Translate is one of the most popular/famous machine translation services. It uses statistical machine translation rather than hard-coded grammatical rules and dictionaries. You need a large collection of bilingual texts in order for the statistical approach to have enough to analyze and extrapolate patterns. Google Translate uses United Nations documents, which allow the software to see how human translators have already translated between languages.

This method works really well, but has a couple drawbacks: you need the large corpus of already-translated (by humans) texts, and new translation is based, not on the whole language with all its slang and poetry and styles, but on texts that have already been translated (in this case, UN documents). So often, statistical machine translation services like Google Translate are actually quite accurate with straightforward texts like news articles, but worse when presented with idioms or song lyrics.

Those two drawbacks are crucially problems for translating the Bible. You might think that the Bible is a pretty long book at 611,000 words in the original languages, but statistical machine translation can require millions of words for accuracy. So the language data that we have is impoverished.

The other problem is that the books of the Bible are not all the same genre. You don’t translate poetry the same way that you translate laws. The purposes of the two genera are different. A human translator will be able to see this difference, and realize that the poet chose the word “dozen” because it rhymed with “cousin” and maybe in the translation, the target language word for “twelve” would be more appropriate. But in a legal text, maybe talking about eggs sold by the dozen is particularly important for regulating packaging. With statistical machine translation, the software does not make a distinction between texts that should be treated as poetry and texts that are legal documents. Or histories, or how-to instruction, or prophecy, or whatever other sorts of texts that we find in the Bible.

These problems with machine translation are not specific to Bible translation, but they do pose a significant problem for using machine translation for Bible translation. However, computer and machine translation software are always improving. There has been work in statistical machine translation using software that has not been trained with already-translated texts. Now, the translations made with this software do not approach anywhere near usable accuracy (they are less that 50% accurate), but that’s still pretty amazing for a computer that can’t speak any human language. Someday we might have a Universal Translator like in Star Trek, but we’re pretty far off from that at this point.

The field of Bible translation brings up many different fascinating topics in linguistics and anthropology and even, as with machine translation, computational methods. I have a couple more posts lined up on the challenges of translation, especially Bible translation, for the next few weeks.

Leave a Reply