Google unleashes deep learning tech on language
Translating from one language to another is hard, and creating a system that does it automatically is a major challenge, partly because there’s just so many words, phrases, and rules to deal with. Fortunately, neural networks eat big, complicated data sets for breakfast. Google has been working on a machine learning translation technique for years, and today is its official debut.
The Google Neural Machine Translation system, deployed today for Chinese-English queries, is a step up in complexity from existing methods. Here’s how things have evolved in a nutshell.
Word by word and phrase by phrase
A very simple technique for translating — one a kid or simple computer could do — would be to simply look up each word encountered and switch it with the equivalent word in another language. Of course, the nuances of speech and often the meaning of an utterance can be lost, but this rudimentary word-by-word system can still impart the gist at minimal fuss.
Since language is naturally phrase-based, the logical next move is to learn as many of those phrases and semi-formal rules, applying those as well. But it requires a lot of data (not just a German-English dictionary) and serious statistical chops to know the difference between, for example, “run a mile,” “run a test,” and “run a store.” Computers are good at that, so once they took over, phrase-based translation became the norm.
More complexity lurks still in the rest of the sentence, of course, but it’s another jump in complexity, subtlety, and the computational power necessary to parse it. Ingesting complex rulesets and making a predictive model is a specialty of neural networks, and researchers have been looking into this method, but Google has beaten the others to the punch.
GNMT is the latest and by far the most effective to successfully achieve leverage machine learning in translation. It looks at the sentence as a whole, while keeping in mind, so to speak, the smaller pieces like words and phrases.