Abstract
We investigated a novel deep learning method to recognize clinical entities in Chinese clinical documents using the minimal feature engineering approach.
We developed a deep neural network (DNN) to generate word embeddings from a large unlabeled corpus through unsupervised learning and another DNN for the NER task.
兩次DNN方法,一次生成詞嵌入桅狠,第二次用作實(shí)體識(shí)別饱搏。
Introduction
介紹electronic health record的應(yīng)用價(jià)值,以及面臨實(shí)體識(shí)別的問(wèn)題逮走。
Many existing clinical NLP systems use dictionariesand rule-based methods to identify clinical concepts, such as MedLEE, MetaMap, cTAKES.
More recently, a number of challenges on NER involving shared tasks in clinical text have been organized, including the 2009 i2b2, the 2010 i2b2, the 2013 Share/CLEF challenge and the 2014 Semantic Evaluation challenge.(有空著重了解下=_=)
Conventional ML-based methods have been applied to Chinese clinical NER tasks.
In summary, current efforts on NER in Chinese clinical text primarily focus on investigating different machine learning algorithms or optimizing combinations of different types of features via human engineering.
最近越來(lái)越多人對(duì)基于深度學(xué)習(xí)的NLP系統(tǒng)感興趣或悲。這種系統(tǒng)能從大規(guī)模的未標(biāo)注的語(yǔ)料通過(guò)非監(jiān)督的方法學(xué)習(xí)到有用的特征表達(dá)式风喇。深度學(xué)習(xí)是一個(gè)能通過(guò)深度神經(jīng)網(wǎng)絡(luò)學(xué)習(xí)高級(jí)特征表達(dá)的機(jī)器學(xué)習(xí)的研究領(lǐng)域。現(xiàn)在在圖像處理樊破,語(yǔ)音自動(dòng)識(shí)別和機(jī)器翻譯方面獲得了先進(jìn)的表現(xiàn)愉棱。NLP研究者開(kāi)發(fā)出DNNs從大量的未標(biāo)注的數(shù)據(jù)中去學(xué)習(xí)有用的特征,不再用花費(fèi)大量時(shí)間去尋找任務(wù)特性的特征哲戚。Dr. Ronan Collobert的系統(tǒng)通過(guò)單個(gè)深度神經(jīng)網(wǎng)絡(luò)在很多NLP任務(wù)中獲得了最先進(jìn)的表現(xiàn)奔滑。
本文首個(gè)應(yīng)用DNNs研究中文病歷NER,并對(duì)比了傳統(tǒng)的CRF方法惫恼。