Ikarus代碼實操：運用機器學(xué)習(xí)鑒定腫瘤細(xì)胞

前言

在之前的一篇推文：Genome Biology：運用機器學(xué)習(xí)鑒定腫瘤細(xì)胞中，Immugent根據(jù)Ikarus的原文簡單介紹了一下它的功能框架源葫，在本期推文中將通過代碼實操的方式來一步步演示如何使用Ikarus來鑒定腫瘤細(xì)胞严就。

不同于之前的Infercnv和CopyKAT軟件椿争，主要都是基于R語言，Ikarus是由更擅長機器學(xué)習(xí)的python所構(gòu)建的禁炒。事實上破托，在同等運算量的情況下吮炕，python處理數(shù)據(jù)的速度更快参淹，這是因為它可以和其它底層語言銜接的更好醉锄。當(dāng)然，大家也不要有畏懼python的心理浙值，其實使用起來和R基本差不多恳不，不斷去適應(yīng)就好啦。

廢話不多說开呐，下面開始展示烟勋。。筐付。

代碼實操

首先是安裝Ikarus軟件卵惦，要求python>=3.8。

pip install ikarus

#也可以通過下面的代碼安裝
python -m pip install git+https://github.com/BIMSBbioinfo/ikarus.git

接著是導(dǎo)入所需要的程序瓦戚，和R中的library()類似沮尿。

import urllib.request
import anndata
import pandas as pd
from pathlib import Path
from ikarus import classifier, utils, data

接下來是載入示例數(shù)據(jù)。较解。畜疾。

Path("data").mkdir(exist_ok=True)

if not Path("data/laughney20_lung/adata.h5ad").is_file():
    Path("data/laughney20_lung").mkdir(exist_ok=True)
    urllib.request.urlretrieve(
        url="https://bimsbstatic.mdc-berlin.de/akalin/Ikarus/part_1/data/laughney20_lung/adata.h5ad",
        filename=Path("data/laughney20_lung/adata.h5ad")
    )
    
if not Path("data/lee20_crc/adata.h5ad").is_file():
    Path("data/lee20_crc").mkdir(exist_ok=True)
    urllib.request.urlretrieve(
        url="https://bimsbstatic.mdc-berlin.de/akalin/Ikarus/part_1/data/lee20_crc/adata.h5ad",
        filename=Path("data/lee20_crc/adata.h5ad")
    )

if not Path("data/tirosh17_headneck/adata.h5ad").is_file():
    Path("data/tirosh17_headneck").mkdir(exist_ok=True)
    urllib.request.urlretrieve(
        url="https://bimsbstatic.mdc-berlin.de/akalin/Ikarus/part_1/data/tirosh17_headneck/adata.h5ad",
        filename=Path("data/tirosh17_headneck/adata.h5ad")
    )
    
paths = [
    Path("data/laughney20_lung/"),
    Path("data/lee20_crc/"),
    Path("data/tirosh17_headneck/")
]
names = [
    "laughney",
    "lee",
    "tirosh"
]
adatas = {}
for path, name in zip(paths, names):
    adatas[name] = anndata.read_h5ad(path / "adata.h5ad")
    # Uncomment to perform preprocessing. Here, the loaded anndata objects are already preprocessed. 
    # adatas[name] = data.preprocess_adata(adatas[name])

關(guān)鍵一步，訓(xùn)練模型印衔。庸疾。。

signatures_path = Path("out/signatures.gmt")
pd.read_csv(signatures_path, sep="\t", header=None)

model = classifier.Ikarus(signatures_gmt=signatures_path, out_dir="out")

train_adata_list = [adatas["laughney"], adatas["lee"]]
train_names_list = ["laughney", "lee"]
obs_columns_list = ["tier_0_hallmark_corrected", "tier_0_hallmark_corrected"]

model.fit(train_adata_list, train_names_list, obs_columns_list, save=True)

model_path = Path("out/core_model.joblib")
model = classifier.Ikarus(signatures_gmt=signatures_path, out_dir="out")
model.load_core_model(model_path)

image.png

基因的名稱(見示例中的第一列)在簽名中列出当编，GMT文件對應(yīng)于它們有意義的單元格類型，并且是以制表符分隔徒溪。

截止到現(xiàn)在忿偷，我們的模型已經(jīng)搭建完畢，接下來是對新數(shù)據(jù)集進(jìn)行預(yù)測臊泌。

_ = model.predict(adatas["tirosh"], "tirosh", save=True)

import numpy as np
import scanpy as sc
import matplotlib.pyplot as plt
from sklearn import metrics

def plot_confusion_matrix(
    y_true, y_pred, classes, normalize=False, title=None, cmap=plt.cm.Blues, ax=None
):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    """
    plt.rcParams["figure.figsize"] = [6, 4]
    # print(classes)
    if not title:
        if normalize:
            title = "Normalized confusion matrix"
        else:
            title = "Confusion matrix, without normalization"

    # Compute confusion matrix
    cm = metrics.confusion_matrix(y_true, y_pred, labels=classes)
    # Only use the labels that appear in the data
    # classes = classes[unique_labels(y_true, y_pred)]
    if normalize:
        cm = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis]

    if ax is None:
        (fig, ax) = plt.subplots()

    im = ax.imshow(cm, interpolation="nearest", cmap=cmap)
    ax.figure.colorbar(im, ax=ax)
    # We want to show all ticks...
    ax.set(
        xticks=np.arange(cm.shape[1]),
        yticks=np.arange(cm.shape[0]),
        # ... and label them with the respective list entries
        xticklabels=classes,
        yticklabels=classes,
        title=title,
        ylabel="True label",
        xlabel="Predicted label",
    )
    for item in (
        [ax.title, ax.xaxis.label, ax.yaxis.label]
        + ax.get_xticklabels()
        + ax.get_yticklabels()
    ):
        item.set_fontsize(12)

    # Rotate the tick labels and set their alignment.
    plt.setp(ax.get_xticklabels(), rotation=45, ha="right", rotation_mode="anchor")

    # Loop over data dimensions and create text annotations.
    fmt = ".2f" if normalize else "d"
    thresh = cm.max() / 2.0
    for i in range(cm.shape[0]):
        for j in range(cm.shape[1]):
            ax.text(
                j,
                i,
                format(cm[i, j], fmt),
                ha="center",
                va="center",
                color="white" if cm[i, j] > thresh else "black",
            )

    return fig, ax

_ = model.get_umap(adatas["tirosh"], "tirosh", save=True)

path = Path("out/tirosh")
results = pd.read_csv(path / "prediction.csv", index_col=0)
adata = anndata.read_h5ad(path / "adata_umap.h5ad")

y = adata.obs.loc[:, "tier_0"]
y_pred_lr = results["final_pred"]
acc = metrics.balanced_accuracy_score(y, y_pred_lr)
print(metrics.classification_report(y, y_pred_lr, labels=["Normal", "Tumor"]))
fig, ax = plot_confusion_matrix(
    y,
    y_pred_lr,
    classes=["Normal", "Tumor"],
    title=f"confusion matrix \n train on laughney+lee, test on tirosh \n balanced acc: {acc:.3f}",
)
fig.tight_layout()

image.png

adata.obs.loc[:, "tier_0_pred_correctness"] = "wrongly assigned"
adata.obs.loc[
    adata.obs["tier_0"] == adata.obs["final_pred"],
    "tier_0_pred_correctness"
] = "correctly assigned"
adata.obs.loc[:, "tier_0_pred_wrong"] = pd.Categorical(
    adata.obs["tier_0"].copy(),
    categories=np.array(["Normal", "Tumor", "wrongly assigned"]),
    ordered=True
)
adata.obs.loc[
    adata.obs["tier_0_pred_correctness"] == "wrongly assigned",
    "tier_0_pred_wrong"
] = "wrongly assigned"

plt.rcParams["figure.figsize"] = [9, 6]

colors = [
    ["major"],
    ["tier_0_pred_wrong"]
    ]
titles = [
    ["major types"],
    ["prediction"]
    ]
palettes = [
    ["#7f7f7f", "#98df8a", "#aec7e8", "#9467bd", "#ff0000"],
    ["#aec7e8", "#ff0000", "#0b559f"], 
]
for color, title, palette in zip(colors, titles, palettes):
    ax = sc.pl.umap(
        adata, ncols=1, size=20, 
        color=color,
        title=title,
        wspace=0.25,
        vmax="p99",
        legend_fontsize=12,
        palette=palette,
        show=False
    )
    for item in ([ax.title, ax.xaxis.label, ax.yaxis.label] + 
                 ax.get_xticklabels() + ax.get_yticklabels()):
        item.set_fontsize(12)
    ax.legend(loc="best")
    plt.tight_layout()
    plt.show()

image.png

從這個圖可以看出Ikarus預(yù)測的結(jié)果還是相當(dāng)準(zhǔn)確的鲤桥。

說在最后

Ikarus其實不僅可以預(yù)測腫瘤細(xì)胞，也可以預(yù)測其它特殊的細(xì)胞渠概，如各種干細(xì)胞茶凳。這個主要取決于使用者是通過什么數(shù)據(jù)來訓(xùn)練出的模型。Immugent有一個大膽的猜想播揪，Ikarus甚至都可以用于對各種免疫細(xì)胞的注釋贮喧。但是有一個問題是，在不同模型或者疾病中猪狈，各種免疫細(xì)胞的功能狀態(tài)可能差異很大箱沦，鑒于此種考慮，我們不太適合用統(tǒng)一的模型去定義所有同類型免疫細(xì)胞雇庙。

此外谓形，有一說一灶伊，使用python分析數(shù)據(jù)的感覺確實比R更絲滑。而且基于python出的圖寒跳，無論是配色還是構(gòu)圖都看起來高達(dá)上很多聘萨，小伙伴門趕緊實操起來吧。

好啦童太，本期分享到這里就結(jié)束啦米辐，我們下期再會~~

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個濱河市康愤，隨后出現(xiàn)的幾起案子儡循，更是在濱河造成了極大的恐慌，老刑警劉巖征冷，帶你破解...
沈念sama閱讀 216,324評論 6贊 498
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件择膝，死亡現(xiàn)場離奇詭異，居然都是意外死亡检激，警方通過查閱死者的電腦和手機肴捉，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 92,356評論 3贊 392
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來叔收，“玉大人齿穗，你說我怎么就攤上這事〗嚷桑” “怎么了窃页？”我有些...
開封第一講書人閱讀 162,328評論 0贊 353
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵，是天一觀的道長复濒。經(jīng)常有香客問我脖卖，道長，這世上最難降的妖魔是什么巧颈？我笑而不...
開封第一講書人閱讀 58,147評論 1贊 292
?港島之戀（遺憾婚禮）
正文為了忘掉前任畦木，我火速辦了婚禮，結(jié)果婚禮上砸泛，老公的妹妹穿的比我還像新娘十籍。我一直安慰自己，他們只是感情好唇礁，可當(dāng)我...
茶點故事閱讀 67,160評論 6贊 388
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布勾栗。她就那樣靜靜地躺著，像睡著了一般盏筐。火紅的嫁衣襯著肌膚如雪械姻。梳的紋絲不亂的頭發(fā)上，一...
開封第一講書人閱讀 51,115評論 1贊 296
城市分裂傳說
那天，我揣著相機與錄音楷拳，去河邊找鬼绣夺。笑死，一個胖子當(dāng)著我的面吹牛欢揖，可吹牛的內(nèi)容都是我干的陶耍。我是一名探鬼主播，決...
沈念sama閱讀 40,025評論 3贊 417
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼她混，長吁一口氣：“原來是場噩夢啊……” “哼烈钞！你這毒婦竟也來了？” 一聲冷哼從身側(cè)響起坤按，我...
開封第一講書人閱讀 38,867評論 0贊 274
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤毯欣，失蹤者是張志新（化名）和其女友劉穎，沒想到半個月后臭脓，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體酗钞，經(jīng)...
沈念sama閱讀 45,307評論 1贊 310
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 37,528評論 2贊 332
?白月光啟示錄
正文我和宋清朗相戀三年来累，在試婚紗的時候發(fā)現(xiàn)自己被綠了砚作。大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
茶點故事閱讀 39,688評論 1贊 348
活死人
序言：一個原本活蹦亂跳的男人離奇死亡嘹锁，死狀恐怖葫录，靈堂內(nèi)的尸體忽然破棺而出，到底是詐尸還是另有隱情领猾，我是刑警寧澤米同，帶...
沈念sama閱讀 35,409評論 5贊 343
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布，位于F島的核電站摔竿，受9級特大地震影響面粮，放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜拯坟，卻給世界環(huán)境...
茶點故事閱讀 41,001評論 3贊 325
男人毒藥：我在死后第九天來索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望韭山。院中可真熱鬧郁季，春花似錦、人聲如沸钱磅。這莊子的主人今日做“春日...
開封第一講書人閱讀 31,657評論 0贊 22
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽盖淡。三九已至年柠，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間褪迟，已是汗流浹背冗恨。一陣腳步聲響...
開封第一講書人閱讀 32,811評論 1贊 268
情欲美人皮
我被黑心中介騙來泰國打工答憔，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留，地道東北人掀抹。一個月前我還...
沈念sama閱讀 47,685評論 2贊 368
代替公主和親
正文我出身青樓虐拓，卻偏偏與公主長得像，于是被迫代替她去往敵國和親傲武。傳聞我的和親對象是個殘疾皇子蓉驹，可洞房花燭夜當(dāng)晚...
茶點故事閱讀 44,573評論 2贊 353

Ikarus代碼實操：運用機器學(xué)習(xí)鑒定腫瘤細(xì)胞

前言

代碼實操

說在最后

推薦閱讀更多精彩內(nèi)容