Transparent Medical Image AI via MONET Model
[1] Kim C, Gadgil S U, DeGrave A J, et al. Transparent medical image AI via an image–text foundation model grounded in medical literature[J]. Nature Medicine, 2024, 30(4): 1154-1165.
Overview
The study introduces MONET (medical concept retriever), an image-text foundation model designed to enhance the transparency and trustworthiness of medical artificial intelligence (AI) systems. MONET connects medical images with text and provides dense scoring on concept presence, which is crucial for various tasks in medical AI development and deployment.
Key Features of MONET
- Concept Annotation: MONET can annotate medical images with semantically meaningful concepts.
- Training Data: Trained on 105,550 dermatological images paired with descriptions from medical literature.
- Performance: Competes with supervised models built on clinically annotated datasets.
- Use Cases: Enables AI transparency across the development pipeline, including data auditing, model auditing, and interpretation.
Dermatology as a Use Case
- Dermatology was chosen due to the heterogeneity in diseases, skin tones, and imaging modalities.
- MONET's annotation capability was verified by board-certified dermatologists.
Technical Approach
- Contrastive Learning: Utilizes an AI technique to leverage natural language descriptions directly on images.
- Encoder: Transforms images and text into a lower-dimensional vector space, forcing paired elements to be close and unpaired elements to be distant.
對(duì)比學(xué)習(xí)(Contrastive Learning)
目的:對(duì)比學(xué)習(xí)是一種人工智能技術(shù),用于使模型能夠直接利用圖像上的自然語言描述。
方法:通過訓(xùn)練,使得同一圖像-文本對(duì)在表示空間中彼此靠近军援,而不同對(duì)的表示則彼此遠(yuǎn)離伤极。
模型架構(gòu):
- 圖像編碼器(Image Encoder):使用視覺變換器架構(gòu)(如ViT-L/14)椿息,將輸入圖像轉(zhuǎn)換為一個(gè)固定維度的嵌入向量鸿捧。
- 文本編碼器(Text Encoder):采用具有多層自注意力機(jī)制的變換器架構(gòu)泻轰,將文本轉(zhuǎn)換為相應(yīng)的嵌入向量铺然。
數(shù)據(jù)預(yù)處理
- 圖像:調(diào)整圖像大小俗孝,進(jìn)行中心裁剪和標(biāo)準(zhǔn)化處理,以符合編碼器的輸入要求魄健。
- 文本:使用小寫字節(jié)對(duì)編碼進(jìn)行標(biāo)記化赋铝,并對(duì)超長(zhǎng)文本進(jìn)行分割
訓(xùn)練過程
- 損失函數(shù):使用對(duì)稱的交叉熵?fù)p失函數(shù),基于余弦相似度評(píng)分沽瘦。
- 優(yōu)化器:采用Adam優(yōu)化器革骨,并使用余弦學(xué)習(xí)率調(diào)度策略农尖。
- 超參數(shù)調(diào)整:通過將數(shù)據(jù)集分為訓(xùn)練集和驗(yàn)證集,選擇最佳的批次大小和學(xué)習(xí)率良哲。
自動(dòng)概念注釋
- 原理:訓(xùn)練完成后盛卡,MONET能夠測(cè)量圖像與任意文本的接近程度,用于自動(dòng)注釋概念筑凫。
- 方法:通過計(jì)算圖像嵌入和概念提示嵌入之間的余弦相似度窟扑,得到概念存在分?jǐn)?shù)。
數(shù)據(jù)審計(jì)
- 概念差異分析:利用MONET將圖像集映射到共同的嵌入空間漏健,以自然語言描述圖像集之間的不同特征嚎货。
模型審計(jì)
- MA-MONET:通過聚類測(cè)試集圖像,并比較低性能和高性能圖像集之間的概念存在分?jǐn)?shù)蔫浆,以識(shí)別導(dǎo)致模型錯(cuò)誤的醫(yī)學(xué)概念殖属。
構(gòu)建固有可解釋的神經(jīng)網(wǎng)絡(luò)(Concept Bottleneck Models, CBMs)
目的:創(chuàng)建一個(gè)可解釋的模型,使醫(yī)生或開發(fā)者能夠理解影響模型決策的因素瓦盛。
方法:利用MONET自動(dòng)注釋的概念來構(gòu)建瓶頸層洗显,然后在此層上訓(xùn)練一個(gè)簡(jiǎn)單的線性分類器。
評(píng)估設(shè)置
- 預(yù)測(cè)目標(biāo):區(qū)分惡性和良性病變原环,以及黑色素瘤與其類似病變挠唆。
- 圖像類型:臨床圖像和皮膚鏡圖像。
- 訓(xùn)練與測(cè)試:使用不同的訓(xùn)練-測(cè)試分割重復(fù)評(píng)估嘱吗,以驗(yàn)證模型性能玄组。
統(tǒng)計(jì)分析
- AUROC值:通過不同的訓(xùn)練-測(cè)試集運(yùn)行獲得,并使用配對(duì)樣本學(xué)生t檢驗(yàn)來比較MONET與其他方法的性能谒麦。
臨床試驗(yàn)評(píng)估
- PROVE-AI研究:使用MONET對(duì)ADAE算法的臨床試驗(yàn)進(jìn)行復(fù)制和評(píng)估俄讹,分析與低特異性相關(guān)的概念。
數(shù)據(jù)和代碼可用性
- 數(shù)據(jù)集:使用的是公開可訪問的數(shù)據(jù)集绕德,如ISIC患膛、Derm7pt、Fitzpatrick 17k和DDI耻蛇。
- 代碼:分析中使用的代碼可在GitHub上獲得踪蹬,包括數(shù)據(jù)收集、模型訓(xùn)練和基準(zhǔn)研究的腳本臣咖。
Results
- Automatic Concept Annotation: MONET successfully retrieves relevant clinical and dermoscopic images for various dermatological terms.
- Performance Assessment: Compared favorably with supervised learning and CLIP models.
- Diverse Skin Tones: MONET demonstrated consistent performance across different skin tones.
- Nonclinical Concepts: Identified irrelevant artifacts that can affect AI predictions.
Data and Model Auditing
- Data Auditing: MONET automatically examines datasets for irregularities, aiding in the auditing of large-scale datasets.
- Model Auditing: A method called MA-MONET was developed to detect medical concepts leading to model errors.
Inherently Interpretable Models
- MONET facilitates the creation of Concept Bottleneck Models (CBMs), which are inherently interpretable and allow physicians to understand factors influencing model decisions.
Real-world Application
- MONET was applied to assess a clinical trial of a dermatology AI algorithm, providing insights into cases of lower specificity.
Limitations and Future Work
- MONET may struggle with concepts not present in its training data.
- Performance across skin tones for dermoscopic images was not examined due to dataset limitations.
- MONET is not intended for diagnostic tasks and may exhibit biases present in the training data.
Conclusion
The MONET model presents a significant advancement in the transparency and interpretability of medical image AI, with potential applications in auditing, model development, and clinical deployment.