Machine-based translation is amazing, but hundreds of millions of people on our Pale Blue Dot can’t enjoy its benefits–because their language is nowhere to be found in the translator’s pull-down menu. Now, two new artificial intelligence systems–one from the Universidad del País Vasco (UPV) in Spain and another from Carnegie Mellon University (CMU)–promise to change all that, opening the door to true universal translators like the ones in Star Trek.
pull-down menu. 下拉菜單
opening the door to true universal translators like the ones in Star Trek.為星際迷航中那樣真正宇宙通用譯者的到來(lái)打開(kāi)大門(mén)
To understand the potential of these new systems,To understand the potential of these new systems, it helps to know how current machine translation works. The current de facto standard is Google Translate, a system that covers 103 languages from Afrikaans to Zulu, including the top 10 languages in the world–in order, Mandarin, Spanish, English, Hindi, Bengali, Portuguese, Russian, Japanese, German, and Javanese. Google’s system uses human-supervised neural networks that compare parallel texts–books and articles that have been previously translated by humans. By comparing extremely large amounts of these parallel texts, Google Translate learns the equivalences between any two given languages, thus acquiring the ability to quickly translate between them. Sometimes the translations are funny or don’t really capture the original meaning but, in general, they are functional and, overtime, they’re getting better and better.
it helps to know how current machine translation works.首先要弄清這些機(jī)器翻譯是如何工作的
current de facto standard 事實(shí)標(biāo)桿
Afrikaans阿非利卡語(yǔ)
Mandarin漢語(yǔ)
Hindi印地語(yǔ)
Bengali孟加拉語(yǔ)
Portuguese 葡萄牙語(yǔ)
Javanese爪哇語(yǔ)
human-supervised neural networks 人類(lèi)監(jiān)督的平行網(wǎng)絡(luò)
compare parallel texts比較平行文本
learns the equivalences between any two given languages學(xué)習(xí)任意兩種指定語(yǔ)言之間的對(duì)等關(guān)系
don’t really capture the original meaning并不能真正反映原文的意思
they are functional 他們是有用的
overtime, they’re getting better and better.隨著時(shí)間的推移,翻譯質(zhì)量會(huì)越來(lái)越好
Google’s approach is good, and it works. But unfortunately, it’s not universally functional. That’s because supervised training requires a very long time and a lot of supervisors–so many that Google actually uses crowdsourcing–but also because there just aren’t enough of these parallel texts translated between all the languages in the world. Consider this: According to the Ethnologue catalog of world languages, there are 6,909 living languages on Earth. 414 of those account for 94% of humanity. Since Google Translate covers 103, that leaves 6,806 languages without automated translation–311 with more than one million speakers. In total, at least eight hundred million people can’t enjoy the benefits of automated translation.
unfortunately, it’s not universally functional. 可惜不能全球通用
crowdsourcing 眾包
Ethnologue catalog of world languages 世界民族語(yǔ)目錄
there are 6,909 living languages on Earth
地球上現(xiàn)存有6906種語(yǔ)言
414 of those account for 94% of humanity 其中的414種語(yǔ)言的使用人數(shù)占人類(lèi)總?cè)藬?shù)的94%
In total 總的來(lái)說(shuō)
Most machine learning—in which neural networks and other computer algorithms learn from experience—is “supervised.”
computer algorithms 計(jì)算機(jī)算法
To start, each constructs bilingual dictionaries without the aid of a human teacher telling them when their guesses are right. That’s possible because languages have strong similarities in the ways words cluster around one another. The words for table and chair, for example, are frequently used together in all languages. So if a computer maps out these co-occurrences like a giant road atlas with words for cities, the maps for different languages will resemble each other, just with different names. A computer can then figure out the best way to overlay one atlas on another. Voilà! You have a bilingual dictionary.
如此一來(lái)桃序,計(jì)算機(jī)就能找出將一個(gè)圖譜集覆蓋在另一個(gè)圖譜集上的最佳方法中符。瞧爪瓜!一本雙語(yǔ)詞典誕生了剥懒。
construct bilingual dictionaries 構(gòu)建雙語(yǔ)詞典
telling them when their guesses are right判斷它們的推測(cè)是否正確
in the ways words cluster around one another詞匯結(jié)合方面
a computer maps out these co-occurrences like a giant road atlas with words for cities計(jì)算機(jī)將這些共現(xiàn)組合像一個(gè)城市巨大的路網(wǎng)那樣描繪出來(lái)
map out 描繪
road altas 路網(wǎng)
maps for different languages will resemble each other不同語(yǔ)言的圖譜就會(huì)彼此相似
overlay one atlas on another.將一個(gè)圖譜集覆蓋在另一個(gè)圖譜集
The new systems, which use remarkably similar methods, can also translate at the sentence level. They both use two training strategies, called back translation and denoising. In back translation, a sentence in one language is roughly translated into the other, then translated back into the original language. If the back-translated sentence is not identical to the original, the neural networks are adjusted so that next time they’ll be closer. Denoising is similar to back translation, but instead of going from one language to another and back, it adds noise to a sentence (by rearranging or removing words) and tries to translate that back into the original. Together, these methods teach the networks the deeper structure of language.
called back translation 回譯
denoising 去噪
roughly粗略地
translated back into the original language
轉(zhuǎn)譯回最初的語(yǔ)言
Denoising is similar to back translation去噪類(lèi)似于回譯
rearranging or removing words 編排或刪除文字
Together, these methods teach the networks the deeper structure of language.
這兩種方法相結(jié)合教會(huì)了網(wǎng)絡(luò)更深層次的語(yǔ)言結(jié)構(gòu)末融。
Both systems encode a sentence from one language into a more abstract representation before decoding it into the other language, but the Facebook system verifies that the intermediate “l(fā)anguage” is truly abstract. Artetxe and Lample both say they could improve their results by applying techniques from the other’s system.
encode a sentence from one language into a more abstract representation將一種語(yǔ)言的一個(gè)句子編碼成一種更加抽象的表征
verifies that the intermediate “l(fā)anguage” is truly abstract 核實(shí)中間語(yǔ)言是完全抽象的
In addition to translating between languages without many parallel texts, both Artetxe and Lample say their systems could help with common pairings like English and French if the parallel texts are all the same kind, like newspaper reporting, but you want to translate into a new domain, like street slang or medical jargon. But, “This is in infancy,” Artetxe’s co-author Eneko Agirre cautions. “We just opened a new research avenue, so we don’t know where it’s heading.”
Artetxe的共同作者Eneko Agirre說(shuō)状蜗,“我們剛剛開(kāi)啟新的研究之路嗓奢,還不知道它會(huì)通向哪里键袱×蔷剑”
translating between languages 跨語(yǔ)言翻譯
common pairings 常用匹配
a new domain 新的領(lǐng)域
street slang 街頭俚語(yǔ)
medical jargon 醫(yī)學(xué)術(shù)語(yǔ)
This is in infancy 這一切屬于新生階段
open a new research avenue 開(kāi)啟了新的研究之路
where it’s heading. 通向何方
One caveat? The systems are not as accurate as current parallel text deep learning systems–but the fact that a computer can guess all this without any human guidance is, like Microsoft AI expert Di He points out, nothing short incredible. We’re just scratching the surface of this new learning method. It seems very likely that sometime soon, a true universal translator that allows us to talk to anyone in their native tongue won’t just be the stuff of sci-fi.
computer can guess all this without any human guidance 電腦能在沒(méi)有任何人類(lèi)指導(dǎo)的情況下猜測(cè)所有這些事實(shí)
nothing short incredible 本身就很不可思議
scratching the surface of 淺層接觸
talk to anyone in their native tongue用對(duì)方的母語(yǔ)交談
stuff of sci-fi.科幻