Google 可以依賴搜索引擎和大量應(yīng)用的交互來訓(xùn)練自己的人工智能系統(tǒng),
但 IBM 并沒有那么多面向消費(fèi)市場的應(yīng)用和產(chǎn)品稀火,那沃森人工智能系統(tǒng)是如何訓(xùn)練出來的暖哨?
https://www.zhihu.com/question/39279528#answer-27898289
……………………………………………………………………
1,
向Watson數(shù)據(jù)庫喂入公開的海量非結(jié)構(gòu)和半結(jié)構(gòu)數(shù)據(jù),就像搜索引擎建立索引一樣。這一步是線下的篇裁,比如在Jeopardy秀之前就完成了這一步沛慢。
Huge amount of unstructured and semistructured data that is publicly available is feed into the Watson database, just like what a search engine does to build its index. This phase is offline, i.e. it is done before taking the jeopardy show.
2, 電視節(jié)目直播的時候,真人選手看到問題的同時达布,問題以文本的形式發(fā)送(輸入)Watson团甲。[Thanks to Marcus, see comment for the link]At the show, the questions are sent in text form to Watson, the same time human players see them.
3, 文本形式的問題作為搜索請求,在數(shù)據(jù)庫中搜索黍聂,就像在Google里搜索一樣躺苦。只有幾百個最佳答案得以保留。
The questions in their text form are used as a search query to search the database, like you search it at Google. And only hundreds of the best search results are kept.
4, 搜索結(jié)果产还,和問題一起匹厘,被用來在數(shù)據(jù)庫中重新檢索支持證據(jù)。
The search results, together with the question, are used to retrieve support evidence from the database.
5,
每一個搜索結(jié)果脐区,在回答問題的同時愈诚,也形成了一個假設(shè),然后再根據(jù)重新索引的證據(jù)來評估這些假設(shè)牛隅。然后在多個維度上為這些答案評分炕柔。
Each search result, when answering its question, now forms a hypothesis. This hypothesis is then evaluated on the retrieved evidence.
And the answer is scored on many dimensions.
6, 使用合并算法,這些高緯度問題被排位媒佣,然后其中的某一個就贏了匕累。
The hi-dimension scored answers are ranked using some merge algorithm, and then someone will win.
7,如果Watson對它最終得出來的答案足夠自信,他就會嘗試回答這個問題默伍。當(dāng)然欢嘿,把答案轉(zhuǎn)化成一個
對happy jeopardy的問題。
這段需要看過這個節(jié)目的人才理解也糊,一點背景資料际插。(該節(jié)目的比賽以一種獨(dú)特的問答形式進(jìn)行,問題設(shè)置的涵蓋面非常廣泛显设,涉及到歷史、文學(xué)辛辨、藝術(shù)捕捂、流行文化、科技斗搞、體育指攒、地理、文字游戲等等各個領(lǐng)域僻焚。根據(jù)以答案形式提供的各種線索允悦,參賽者必須以問題的形式做出簡短正確的回答。與一般問答節(jié)目相反虑啤,《危險邊緣》以答案形式提問隙弛、提問形式作答架馋。參賽者需具備歷
史、文學(xué)全闷、政治叉寂、科學(xué)和通俗文化等知識,還得會解析隱晦含義总珠、反諷與謎語等屏鳍,而電腦并不擅長進(jìn)行這類復(fù)雜思考。)
If Watson is confident enough with its final answer, it will try to answer that question. Of course, convert the answer into a question to happy jeopardy.
說了這么多局服,Watson是一個復(fù)雜的系統(tǒng)钓瞭,以上描述的每一步都應(yīng)用了各種算法。再就是整個系統(tǒng)在并行平臺上運(yùn)行以便用最快速度給出答案淫奔。
That said, Watson is a complicated system that each phase described above adopts various of algorithms. And the system runs on a parallel platform in order to give the answer as soon as possible.
更多信息山涡,Google “Deep QA”。
For further information, Google DeepQA.