數(shù)據(jù)挖掘技術在醫(yī)學數(shù)據(jù)中的應用
中文摘要
隨著大數(shù)據(jù)技術與人工智能技術的發(fā)展,數(shù)據(jù)挖掘技術被應用在越來越多的領域之中,其中不乏金融祸穷、教育、醫(yī)療等行業(yè)勺三。其中雷滚,在醫(yī)療行業(yè)的應用上又包括精準醫(yī)療、基因工程檩咱、基因測序等學科前沿領域中揭措。本文則是以數(shù)據(jù)挖掘的模型算法在醫(yī)學臨床數(shù)據(jù)和醫(yī)院信息系統(tǒng)數(shù)據(jù)中所發(fā)揮的作用進行了論述。
數(shù)據(jù)挖掘技術在醫(yī)學數(shù)據(jù)中應用的目的是從大量的醫(yī)學數(shù)據(jù)中挖掘出潛在的且與致病有關的因素刻蚯,并且在此過程中獲取到更多的信息绊含、模型、關聯(lián)規(guī)則等炊汹,將這些挖掘出的成果應用于臨床躬充,從而能夠幫助醫(yī)生進行更快更準的疾病判斷。本文的主要工作如下:
首先,本文第二章詳細闡述了醫(yī)學數(shù)據(jù)的特點以及常用的數(shù)據(jù)挖掘算法的理論基礎充甚,方法結構以政。還介紹了各種數(shù)據(jù)挖掘模型的簡單解釋。
其次伴找,本文主要通過一個乳腺癌相關的醫(yī)學數(shù)據(jù)集盈蛮,探索了數(shù)據(jù)挖掘中的logistic回歸分析預測和隨機森林(決策樹)分類預測技術在醫(yī)學數(shù)據(jù)上的分類功能。并在分類結果上取得較好的分類精確度技矮。之后可以作為輔助醫(yī)生的一種診斷方案抖誉,對被預測得乳腺癌概率較高的患者可以重點觀察,重點診斷衰倦。
最后袒炉,本文對兩個數(shù)據(jù)集中所得出的分類和預測結果進行解釋說明,并提出相關的對策和改進意見樊零。并在文末提出了關于本文的不足與將來進行改進的方向我磁。
關鍵詞:數(shù)據(jù)挖掘;回歸分析驻襟;決策樹夺艰;乳腺癌
The application of data mining technology in medical data.
Abstract in Chinese
The application of data mining has become a hot topic with the development of big data technology and Artificial Intelligence Technology, and it has been applied in a great many fields, such as financial industry, educational industry, healthcare industry and other industries. Among them, the application of healthcare industry covers precision medicine, gene engineering,gene sequencing and other frontier fields . This article fully discusses the role of model algorithm of data mining in medical clinical data and hospital information system data.
The purpose of data mining technology applied in the medical data is to dig out the potential factors that are related to the disease from a large number of medical data, and to get more information, models, association rules and so on from the process. the excavated achievements are used for clinical medicine ,which can help doctors to judge disease faster and more accurate . The main work of this article is as follows:
First of all, the second chapter ot this article elaborates the characteristics of medical data and common theoretical basis and method structure of data mining algorithms. A brief explanation of various data mining models is also introduced.
Secondly, this article mainly explores the classificatory function of the logistic regression analysis and random forest (decision tree) in data mining ,through a breast cancer related medical data sets . Moreover, the classification results acquireed better classification accuracy. It can be used as a diagnostic program to assist doctors to concentrate on observating patients with a higher probability of breast cancer.
Finally, this article makes an explaination for the classification and prediction results of two data sets, and puts forward relevant countermeasures and suggestions. At the end of the article, the author comes up with the deficiency and the direction of the future improvement.
Key words: Data mining; Regression analysis; Decision tree; Breast cancer