近日10X終于發(fā)布了他們擁有亞細(xì)胞分辨率的空間轉(zhuǎn)錄組產(chǎn)品鸳粉,并給出了一些列的測試數(shù)據(jù)捌蚊。今天帶大家進(jìn)行一些基本的探索與實(shí)戰(zhàn)
1 數(shù)據(jù)下載
這里筆者選擇了10X給到的小鼠的腦組織進(jìn)行測試集畅,具體的下載路徑可以見
https://www.10xgenomics.com/datasets/visium-hd-cytassist-gene-expression-libraries-of-mouse-brain-he
Space ranger默認(rèn)會(huì)給出8和16兩種Bin的結(jié)果,這里我們選擇8bin用于測試缅糟。
2 數(shù)據(jù)分析
空轉(zhuǎn)轉(zhuǎn)錄組的分析主要使用R挺智、python。由于Visium HD的超高的spot數(shù)量窗宦,在8bin的情況下整個(gè)表達(dá)矩陣包含了近400W個(gè)基因赦颇,因此我們后續(xù)的分析默認(rèn)使用scanpy完成。
import os
import scanpy as sc
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import anndata
#使用jutypter notebook加上這兩句
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
#首先導(dǎo)入數(shù)據(jù)迫摔,我們將數(shù)據(jù)下載到/data/outs/binned_outputs中
#這里出現(xiàn)了一個(gè)坑,由于scanpy并沒有適配Visium HD泥从,會(huì)導(dǎo)致無法讀取數(shù)據(jù)句占,這是由于坐標(biāo)文件由原來的csv轉(zhuǎn)為了parquet,這里首先進(jìn)行轉(zhuǎn)換
tissue_position_file = '/data/outs/binned_outputs/square_008um/spatial/tissue_positions.parquet'
tissue_position_csv = '/data/outs/binned_outputs/square_008um/spatial/tissue_positions_list.csv'
if not os.path.exists(tissue_position_csv):
tissue_position_df = pd.read_parquet(tissue_position_file)
tissue_position_df.to_csv(tissue_position_csv, index=False, header=None)
adata = sc.read_visium(path = '/data/outs/binned_outputs/square_008um')
整個(gè)數(shù)據(jù)無比龐大躯嫉,我們接著往后進(jìn)行細(xì)胞質(zhì)控和過濾等操作
adata.var_names_make_unique()
# qc
adata.var["mt"] = adata.var_names.str.startswith("MT-")
sc.pp.calculate_qc_metrics(
adata, qc_vars=["mt"], percent_top=None, log1p=False, inplace=True
)
sc.pl.violin(
adata,
["n_genes_by_counts", "total_counts", "pct_counts_mt"],
jitter=0.4,
multi_panel=True
)
sc.pl.scatter(adata, x='total_counts', y='pct_counts_mt')
sc.pl.scatter(adata, x="total_counts", y="n_genes_by_counts")
sc.pl.scatter(adata, "total_counts", "n_genes_by_counts", color="pct_counts_mt")
#這里當(dāng)然也可以選擇不進(jìn)行過濾纱烘,由于數(shù)據(jù)過大,導(dǎo)致spot的數(shù)據(jù)相比我們常規(guī)的單細(xì)胞明顯的不足祈餐,進(jìn)而單個(gè)spot的轉(zhuǎn)錄本較少擂啥,所以比較適合較為寬松的閾值。過濾太狠的情況下出現(xiàn)較多的空spot
sc.pp.filter_cells(adata, min_genes=50)
sc.pp.filter_cells(adata, max_genes=8000)
adata = adata[(adata.obs.pct_counts_mt < 20)]
sc.pp.filter_genes(adata, min_cells=3)
隨后就是比較常規(guī)的HVG鑒定帆阳、歸一化和降維聚類乳丰,這一步需要一定的時(shí)間端姚,請耐心等待
sc.pp.highly_variable_genes(adata, n_top_genes= 2000, flavor='seurat_v3')
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
sc.pp.pca(adata, n_comps=50, use_highly_variable=True)
sc.pp.neighbors(adata, n_neighbors=20, n_pcs=30)
sc.tl.umap(adata, min_dist = 0.3)
sc.tl.leiden(adata, key_added="seurat_clusters", resolution=0.5)
sc.tl.tsne(adata, n_pcs = 30)
然后我們看一下結(jié)果,保存到clustering.png文件中
fig, ax = plt.subplots(2,2,figsize=(12, 10), dpi=300)
sc.pl.tsne(adata, color = "seurat_clusters", wspace=0.4, title='clustering using tsne', ax = ax[0,0], show=False)
sc.pl.umap(adata, color = "seurat_clusters", wspace=0.4, title = 'clustering using umap', ax = ax[0,1], show=False)
point_size = 1
sc.pl.spatial(adata, img_key="hires", color="n_genes_by_counts", size=point_size, ax = ax[1,0], show=False)
sc.pl.spatial(adata, img_key="hires", color="seurat_clusters", size=point_size, ax = ax[1,1], show=False)
fig.tight_layout()
fig.savefig('clustering.png')
fig.savefig('clustering.pdf')
整個(gè)分辨率的提升真是令人嘆為觀止
3 題外
如果使用Seurat
進(jìn)行分析時(shí),記得使用最新的版本庙睡,安裝命令如下。如果使用先前的版本也會(huì)出現(xiàn)不兼容的問題
# packages required for Visium HD
install.packages("hdf5r")
install.packages("arrow")
remotes::install_github(repo = "satijalab/seurat", ref = "visium-hd")