單細(xì)胞Scanpy常用腳本2023-08-28

前言

Scanpy是基于Python語(yǔ)言的類似Seurat的單細(xì)胞轉(zhuǎn)錄組分析工具,本文匯集了Scanpy的常規(guī)操作。

  • 加載模塊和設(shè)置

import time
print('Start time: ' + time.strftime('%Y-%m-%d %H:%M:%S',time.localtime()))

import numpy as np
import pandas as pd
import scanpy as sc
import louvain

import matplotlib
matplotlib.use('AGG')
import matplotlib.pyplot as plt
from matplotlib import rcParams

sc.settings.verbosity = 3
sc.settings.autosave = True
results_file = './write/81libs.h5ad'
sc.settings.autosave = True
sc.settings.set_figure_params(dpi=300, frameon=False)
  • 讀入矩陣

inpath= 'Matrix/scanpy_mat'
adata2 = sc.read_10x_mtx(path, var_names='gene_symbols', cache=True)
adata2.obs['batch1'] = 'C7-4'

adata= sc.read_10x_h5("./matrix.h5", genome=None, gex_only=True)
adata = sc.read_csv('./matrix.tsv',delimiter='\t')
adata = sc.read_h5ad('./result.h5ad')
adata = sc.read('./result.h5ad')
  • 合并多個(gè)對(duì)象

#load data
adata1 = sc.read("30libs.h5ad")
adata2 = sc.read("22libs.h5ad")
adata3 = sc.read("28libs.h5ad")

nadata1 = adata1.raw.to_adata()
nadata1.obs['batch1'] = adata1.obs['batch']
nadata2 = adata2.raw.to_adata()
nadata2.obs['batch1'] = adata2.obs['batch']
nadata3 = adata3.raw.to_adata()
nadata3.obs['batch1'] = adata3.obs['batch']

datalist = [nadata2, nadata3]
nadata1 = nadata1[nadata1.obs.batch != "C7-4" , :]

path= mat+'C7-4'+'/04.Matrix/scanpy_mat'
datalist.append(sc.read_10x_mtx(path, var_names='gene_symbols', cache=True))
datalist[2].obs['batch1'] = 'C7-4'


path= mat+'C5-2'+'/04.Matrix/scanpy_mat'
datalist.append(sc.read_10x_mtx(path, var_names='gene_symbols', cache=True))
datalist[3].obs['batch1'] = 'C5-2'

adata = nadata1.concatenate(datalist, join='outer')
  • 添加注釋

adata.obs['lable'] = "NA"
c=[ i for i, word in enumerate(list(adata.obs['batch1'])) if word.startswith('A') ]
adata.obs['lable'][c]= "A"
c=[ i for i, word in enumerate(list(adata.obs['batch1'])) if word.startswith('B') ]
adata.obs['lable'][c]= "B"
c=[ i for i, word in enumerate(list(adata.obs['batch1'])) if word.startswith('C') ]
adata.obs['lable'][c]= "C"
c=[ i for i, word in enumerate(list(adata.obs['batch1'])) if word.startswith('Y') ]
adata.obs['lable'][c]= "Y"
  • 輸出矩陣

#save raw matrix
pd.DataFrame(data=adata.X.todense().T, index=adata.var_names,columns=adata.obs_names).to_csv('raw_mat.csv',sep="\t",
#                                                                                             float_format='%.0f')
print(str(len(datalist)+1)+' datasets were merged!')
  • 質(zhì)控

#quality control
sc.pp.filter_cells(adata, min_genes=500)
sc.pp.filter_genes(adata, min_cells=3)

adata.var['mt'] = adata.var_names.str.startswith('MT-')
sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], percent_top=None, log1p=False, inplace=True)

adata.obs['percent_mito'] = adata.obs['pct_counts_mt']
adata.obs['n_counts'] = adata.obs['total_counts']

adata = adata[adata.obs.pct_counts_mt < 10, :]
sc.pl.violin(adata, ['n_genes', 'n_counts', 'percent_mito'],jitter=0.4, multi_panel=True, save='_qc.pdf', stripplot=False)
plt.savefig('./figures/01.22libs_qc.pdf')

sc.pp.calculate_qc_metrics(data, percent_top=None, log1p=False, inplace=True)
sc.pl.violin(data, ['n_genes_by_counts', 'total_counts'],jitter=False, multi_panel=True,stripplot=False)
plt.savefig("QC_violin.pdf")
data = data[data.obs.n_genes < 2500, :]
print(data.obs['n_genes_by_counts'].median(),'\n')
print(data.obs['total_counts'].median(),"\n")
  • 常規(guī)聚類

#normalize scale pca umap
sc.pp.normalize_per_cell(adata, counts_per_cell_after=1e4)
sc.pp.log1p(adata)
adata.raw = adata

sc.pp.highly_variable_genes(adata, n_top_genes=3000)
#sc.pl.highly_variable_genes(adata)
adata = adata[:, adata.var['highly_variable']]
sc.pp.regress_out(adata, ['n_counts', 'percent_mito'])
sc.pp.scale(adata, max_value=10)

sc.tl.pca(adata, n_comps=50, random_state=int(2020), svd_solver='arpack')
sc.pp.neighbors(adata, n_neighbors=10, n_pcs=40, random_state=int(2020))
sc.tl.umap(adata, random_state=int(2020))

#cluster
sc.tl.louvain(adata, random_state=int(2020))
sc.pl.umap(adata, color=['louvain'], save='_cluster1.pdf')
sc.pl.umap(adata, color=['lable'], save='_cluster2.pdf')
#plt.savefig('./figures/02.22libs_cluster.pdf')

adata.obs['clusters'] = adata.obs['louvain']
  • 聚類點(diǎn)圖

data.obsm['X_umap']=data.obsm['spatial']
sc.pl.umap(data, color=['leiden'])
plt.savefig("Umap2.pdf")
  • 基因表達(dá)點(diǎn)圖

#plot markers
rcParams['figure.figsize'] = 3,3
marker = ["CD68","CD14","CD163","CD3G","CD4","CD8A"]
marker = [gene for gene in marker if (gene in adata.raw.var._stat_axis.values.tolist())]
sc.pl.umap(adata, color=marker, s=50, frameon=False, ncols=3, vmax='p99', save='_l1.pdf')

rcParams['figure.figsize'] = 3,3
marker = ["RTKN2","CHN2","LRRD1","KIAA0408","CCDC144A"]
marker = [gene for gene in marker if (gene in adata.raw.var._stat_axis.values.tolist())]
sc.pl.umap(adata, color=marker, s=50, frameon=False, ncols=3, vmax='p99', save='_l2.pdf')
  • 保存結(jié)果

adata.write(results_file)
adata.obs.to_csv('./write/Scanpy_cell_info.csv', sep='\t', header=True, index=True)

adata.write(results_file, compression='gzip')
  • 找DEG

# Finding marker genes
sc.settings.verbosity = 2
sc.tl.rank_genes_groups(adata, 'louvain', method='wilcoxon')
sc.pl.rank_genes_groups_heatmap(adata, n_genes=10, swap_axes=True, show_gene_labels=True, vmin=-3, vmax=3, cmap='bwr')

result = adata.uns['rank_genes_groups']
groups = result['names'].dtype.names
trgpc = pd.DataFrame({group + '_' + key[:1]: result[key][group] for group in groups for key in ['names', 'scores', 'logfoldchanges','pvals_adj']}).head(20)
trgpc.to_csv(path_or_buf="./write/top20_genes_per_cluster.csv", sep='\t', header=True, index=True)
  • 更改基因名或細(xì)胞名

adata.var_names = new_gene_names
adata.obs_names='cell_'+adata.obs_names
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末汞斧,一起剝皮案震驚了整個(gè)濱河市琳省,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌员舵,老刑警劉巖扯躺,帶你破解...
    沈念sama閱讀 206,013評(píng)論 6 481
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件捉兴,死亡現(xiàn)場(chǎng)離奇詭異,居然都是意外死亡录语,警方通過(guò)查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 88,205評(píng)論 2 382
  • 文/潘曉璐 我一進(jìn)店門禾乘,熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)澎埠,“玉大人,你說(shuō)我怎么就攤上這事始藕∑盐龋” “怎么了氮趋?”我有些...
    開封第一講書人閱讀 152,370評(píng)論 0 342
  • 文/不壞的土叔 我叫張陵,是天一觀的道長(zhǎng)江耀。 經(jīng)常有香客問(wèn)我剩胁,道長(zhǎng),這世上最難降的妖魔是什么祥国? 我笑而不...
    開封第一講書人閱讀 55,168評(píng)論 1 278
  • 正文 為了忘掉前任昵观,我火速辦了婚禮,結(jié)果婚禮上舌稀,老公的妹妹穿的比我還像新娘啊犬。我一直安慰自己,他們只是感情好壁查,可當(dāng)我...
    茶點(diǎn)故事閱讀 64,153評(píng)論 5 371
  • 文/花漫 我一把揭開白布觉至。 她就那樣靜靜地躺著,像睡著了一般睡腿。 火紅的嫁衣襯著肌膚如雪语御。 梳的紋絲不亂的頭發(fā)上,一...
    開封第一講書人閱讀 48,954評(píng)論 1 283
  • 那天席怪,我揣著相機(jī)與錄音沃暗,去河邊找鬼。 笑死何恶,一個(gè)胖子當(dāng)著我的面吹牛孽锥,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播细层,決...
    沈念sama閱讀 38,271評(píng)論 3 399
  • 文/蒼蘭香墨 我猛地睜開眼惜辑,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼!你這毒婦竟也來(lái)了疫赎?” 一聲冷哼從身側(cè)響起盛撑,我...
    開封第一講書人閱讀 36,916評(píng)論 0 259
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤,失蹤者是張志新(化名)和其女友劉穎捧搞,沒想到半個(gè)月后抵卫,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 43,382評(píng)論 1 300
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡胎撇,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 35,877評(píng)論 2 323
  • 正文 我和宋清朗相戀三年介粘,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片晚树。...
    茶點(diǎn)故事閱讀 37,989評(píng)論 1 333
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡姻采,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出爵憎,到底是詐尸還是另有隱情慨亲,我是刑警寧澤婚瓜,帶...
    沈念sama閱讀 33,624評(píng)論 4 322
  • 正文 年R本政府宣布,位于F島的核電站刑棵,受9級(jí)特大地震影響巴刻,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜蛉签,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 39,209評(píng)論 3 307
  • 文/蒙蒙 一胡陪、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧正蛙,春花似錦督弓、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,199評(píng)論 0 19
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)。三九已至锻全,卻和暖如春狂塘,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背鳄厌。 一陣腳步聲響...
    開封第一講書人閱讀 31,418評(píng)論 1 260
  • 我被黑心中介騙來(lái)泰國(guó)打工荞胡, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留,地道東北人了嚎。 一個(gè)月前我還...
    沈念sama閱讀 45,401評(píng)論 2 352
  • 正文 我出身青樓泪漂,卻偏偏與公主長(zhǎng)得像,于是被迫代替她去往敵國(guó)和親歪泳。 傳聞我的和親對(duì)象是個(gè)殘疾皇子萝勤,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 42,700評(píng)論 2 345

推薦閱讀更多精彩內(nèi)容