Minh’s Notes

Human-readable chicken scratch

Minh Nguyễn
June 3rd, 2005


Something novel

The conventional machine translator tries to retain a limited vocabulary (enough for conversational usage; rarely enough for the Real World) and manages to understand the fundamentals of the language – the present tense, pronouns, and perhaps some basic prefixes and suffixes. Translations that emanate from the software score something like a 2 on the AP scale. Not good enough.

Google’s now trying something novel that – well, isn’t anything new, in fact. It relies on brute force: feeding great works of literature and their accepted translations into a machine (which happens to jive with their current efforts to digitize university libraries). Much like your Thunderbird spam filter “learns” what you consider spam and what you consider legitimate over time, the Google translator’s AI is able to spot similar passages in two different translations of the work and figure out which words are equivalent.

It’s plausible that Google will want to integrate this technology into some of its existing services, such as its translation tool (duh!), Google Toolbar (instead of AutoLink, AutoTranslate), and Google Groups.

Of course, this method on its own probably won’t yield that much success. Since no translator worth their keyboard ever translates literally (often adding helpful inline phrases or taking liberties with the text itself), the software would still have to recognize any diversions from the original.

But it’s something for the Foreign Language Dept. to watch out for.

Thanks to Asa Dotzler for the scoop.