10X單細胞（10X空間轉(zhuǎn)錄組）轉(zhuǎn)錄組 & VDJ 聯(lián)合分析（14）之CoNGA

hello锅风，今天我們繼續(xù)來分享分析代碼部分牢撼，這一次，我們直接用單細胞Seurat 的分析結(jié)果聯(lián)合VDJ進行分析爽锥，話不多說涌韩，直接開始。

import scanpy as sc
import numpy as np
import anndata2ri as ad
# Activate the anndata2ri conversion between SingleCellExperiment and AnnData
ad.activate()

#Loading the rpy2 extension enables cell magic to be used
#This runs R code in jupyter notebook cells
%load_ext rpy2.ipython

sc.settings.verbosity = 3
sc.logging.print_versions()
%%R
# IMPORTANT! Use Seurat v3.0.2 as subsequent versions are packaged with "Reticulate" which interferes with rpy2
library('Seurat') 
library('base')
library('patchwork')

#read your Seurat object in
V1 <- readRDS('../conga_datasets/processed/vdj_v1_hs_V1_sc_5gex.rds')

DimPlot(V1) | FeaturePlot( V1, "CD3E")
# Keep in mind, conga will keep only the T cells with paired TCR information

圖片.png

寫出h5ad

%%R -o V1_sce
# R-magics line above tells anndata2ri to convert the dataset to AnnData

#  but first convert your object to SingleCellExperiment Format 
V1_sce <- as.SingleCellExperiment(V1)
V1_sce
#write to h5ad for now. Ideally be nice to get away from strictly using the path
V1_sce.write("../conga_datasets/processed/vdj_v1_hs_V1_sc_5gex.h5ad")

開始分析,數(shù)據(jù)大家換成自己的數(shù)據(jù)

%matplotlib inline

#begin conga workflow
import conga
import pandas as pd
import matplotlib.pyplot as plt
from conga.tcrdist.make_10x_clones_file import make_10x_clones_file


# call the Anndata file and conga on
gex_datafile = "../conga_datasets/processed/vdj_v1_hs_V1_sc_5gex.h5ad"
gex_datatype = 'h5ad' # other possibilities right now: ['10x_mtx', 'h5ad'] (h5ad from scanpy)
tcr_datafile = './testDrive/vdj_v1_hs_pbmc_t_filtered_contig_annotations.csv'
organism = 'human'
clones_file = 'tmp_hs_pbmc_clones.tsv'
outfile_prefix = 'tmp_hs_pbmc'
# this creates the TCRdist 'clones file'
make_10x_clones_file( tcr_datafile, organism, clones_file )
# this command will create another file with the kernel PCs for subsequent reading by conga
conga.preprocess.make_tcrdist_kernel_pcs_file_from_clones_file( clones_file, organism )

數(shù)據(jù)聯(lián)合

adata = conga.preprocess.read_dataset(gex_datafile, gex_datatype, clones_file )

# store the organism info in adata
adata.uns['organism'] = organism

# top 50 TCRdist kPCS 
adata.obsm['X_pca_tcr'].shape

(2287, 50)

# CDR3-alpha regions:
adata.obs['cdr3a'].head(3)

AAACCTGAGTCTTGCA-1 CAQSDSNYQLIW
AAACCTGCAGATGGGT-1 CAVLTNDMRF
AAACCTGTCCTTGCCA-1 CATDFNTGANSKLTF
Name: cdr3a, dtype: object

前處理

adata = conga.preprocess.filter_and_scale( adata )

挑選單個克隆的細胞

adata = conga.preprocess.reduce_to_single_cell_per_clone( adata )
adata

compute pca to find rep cell for each clone (1630, 1324)
computing PCA
on highly variable genes
with n_comps=50
finished (0:00:00)
num_clones: 1539
Normalize and logging matrix...
unable to find feature_types varname
Index(['HES4', 'TNFRSF4', 'RP3-395M20.12', 'MEGF6', 'ACOT7', 'RP1-202O8.3',
'RP11-312B8.1', 'UTS2', 'TNFRSF9', 'SPSB1',
...
'PKNOX1', 'AP001055.6', 'ITGB2', 'ITGB2-AS1', 'COL18A1', 'COL6A1',
'COL6A2', 'C21orf58', 'S100B', 'MT-ND6'],
dtype='object', length=1324)
found 489 of 512 good_genes in var_names organism=human
reduce from 1630 cells to 1539 cells (one per clonotype)
normalize_and_log_the_raw_matrix:: matrix is already logged
AnnData object with n_obs × n_vars = 1539 × 1324
obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'RNA_snn_res.0.8', 'seurat_clusters', 'ident', 'va', 'ja', 'cdr3a', 'cdr3a_nucseq', 'vb', 'jb', 'cdr3b', 'cdr3b_nucseq', 'n_genes', 'percent_mito', 'n_counts', 'clone_sizes'
var: 'vst.mean', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized', 'vst.variable', 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std'
uns: 'organism', 'log1p', 'pca', 'raw_matrix_is_logged', 'X_igex_genes'
obsm: 'X_umap', 'X_pca_tcr', 'X_igex'
varm: 'PCs'
layers: 'logcounts'

降維聚類

adata = conga.preprocess.cluster_and_tsne_and_umap( adata )

adata

computing PCA
on highly variable genes
with n_comps=50
finished (0:00:00)
computing neighbors
using 'X_pca' with n_pcs = 40
finished: added to .uns['neighbors']
.obsp['distances'], distances for each pair of neighbors
.obsp['connectivities'], weighted adjacency matrix (0:00:01)
computing UMAP
finished: added
'X_umap', UMAP coordinates (adata.obsm) (0:00:02)
running Louvain clustering
using the "louvain" package of Traag (2017)
finished: found 8 clusters and added
'louvain_gex', the cluster labels (adata.obs, categorical) (0:00:00)
ran louvain clustering: louvain_gex
computing neighbors
using 'X_pca' with n_pcs = 50
finished: added to .uns['neighbors']
.obsp['distances'], distances for each pair of neighbors
.obsp['connectivities'], weighted adjacency matrix (0:00:00)
computing UMAP
finished: added
'X_umap', UMAP coordinates (adata.obsm) (0:00:02)
running Louvain clustering
using the "louvain" package of Traag (2017)
finished: found 10 clusters and added
'louvain_tcr', the cluster labels (adata.obs, categorical) (0:00:00)
ran louvain clustering: louvain_tcr
AnnData object with n_obs × n_vars = 1539 × 1324
obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'RNA_snn_res.0.8', 'seurat_clusters', 'ident', 'va', 'ja', 'cdr3a', 'cdr3a_nucseq', 'vb', 'jb', 'cdr3b', 'cdr3b_nucseq', 'n_genes', 'percent_mito', 'n_counts', 'clone_sizes', 'louvain_gex', 'clusters_gex', 'louvain_tcr', 'clusters_tcr'
var: 'vst.mean', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized', 'vst.variable', 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std'
uns: 'organism', 'log1p', 'pca', 'raw_matrix_is_logged', 'X_igex_genes', 'neighbors', 'umap', 'louvain'
obsm: 'X_pca_tcr', 'X_igex', 'X_pca_gex', 'X_umap_gex', 'X_gex_2d', 'X_umap_tcr', 'X_tcr_2d'
varm: 'PCs'
layers: 'logcounts'
obsp: 'distances', 'connectivities'

聯(lián)合分析

plt.figure(figsize=(12,6))
plt.subplot(121)
xy = adata.obsm['X_gex_2d']
clusters = np.array(adata.obs['clusters_gex'])
cmap = plt.get_cmap('tab20')
colors = [ cmap.colors[x] for x in clusters]
plt.scatter( xy[:,0], xy[:,1], c=colors)
plt.title('GEX UMAP colored by GEX clusters')

plt.subplot(122)
xy = adata.obsm['X_tcr_2d']
clusters = np.array(adata.obs['clusters_tcr'])
cmap = plt.get_cmap('tab20')
colors = [ cmap.colors[x] for x in clusters]
plt.scatter( xy[:,0], xy[:,1], c=colors)
plt.title('TCR UMAP colored by TCR clusters');
# these are the nbrhood sizes, as a fraction of the entire dataset:
nbr_fracs = [0.01, 0.1]

# we use this nbrhood size for computing the nndists
nbr_frac_for_nndists = 0.01

all_nbrs, nndists_gex, nndists_tcr = conga.preprocess.calc_nbrs(
    adata, nbr_fracs, also_calc_nndists=True, nbr_frac_for_nndists=nbr_frac_for_nndists)

# stash these in obs array, they are used in a few places...                                                                                                                
adata.obs['nndists_gex'] = nndists_gex
adata.obs['nndists_tcr'] = nndists_tcr

conga.preprocess.setup_tcr_cluster_names(adata) #stores in adata.uns

圖片.png

graph_vs_graph

results = conga.correlations.run_graph_vs_graph(adata, all_nbrs)
conga_scores = adata.obs['conga_scores']
good_mask = ( conga_scores <= 1.0 )
adata.obs['good_score_mask'] = good_mask
print(f'found {np.sum(good_mask)} conga hits')
results.sort_values('conga_score', inplace=True)
results.head(3)

conga_score	num_neighbors_gex	num_neighbors_tcr	overlap	overlap_corrected	mait_fraction	clone_index	nbr_frac	overlap_type	num_neighbors	cluster_size
5.722190e-49	NaN	NaN	72	59	0.888889	103	0.1	cluster_nbr	153.0	110.0
1.738370e-47	NaN	NaN	71	58	0.887324	77	0.1	cluster_nbr	153.0	110.0
1.738370e-47	NaN	NaN	71	58	0.887324	52	0.1	cluster_nbr	153.0	110.0

# write the results to a tsv file
clusters_gex = np.array(adata.obs['clusters_gex'])
clusters_tcr = np.array(adata.obs['clusters_tcr'])

indices = results['clone_index']
results['gex_cluster'] = clusters_gex[indices]
results['tcr_cluster'] = clusters_tcr[indices]
for tag in 'va ja cdr3a vb jb cdr3b'.split():
    results[tag] = list(adata.obs[tag][indices])
tsvfile = outfile_prefix+'_graph_vs_graph_hits.tsv'
print('saving graph-vs-graph results to file:',tsvfile)

results.to_csv(tsvfile, sep='\t', index=False)
#put the conga hits on top
colors = np.sqrt(np.maximum(-1*np.log10(conga_scores),0.0))
reorder = np.argsort(colors)

plt.figure(figsize=(12,6))
plt.subplot(121)
xy = adata.obsm['X_gex_2d']
plt.scatter( xy[reorder,0], xy[reorder,1], c=colors[reorder], vmin=0, vmax=np.sqrt(5))
plt.title('GEX UMAP colored by conga score')

plt.subplot(122)
xy = adata.obsm['X_tcr_2d']
plt.scatter( xy[reorder,0], xy[reorder,1], c=colors[reorder], vmin=0, vmax=np.sqrt(5))
plt.title('TCR UMAP colored by conga score');

圖片.png

最后的結(jié)果

nbrs_gex, nbrs_tcr = all_nbrs[0.1]

min_cluster_size = 5

# calc tcr sequence features of good cluster pairs
good_bicluster_tcr_scores = conga.correlations.calc_good_cluster_tcr_features(
    adata, good_mask, clusters_gex, clusters_tcr, conga.tcr_scoring.all_tcr_scorenames, verbose=False,
    min_count=min_cluster_size)

# run rank_genes on most common biclusters
rank_genes_uns_tag = 'rank_genes_good_biclusters'
conga.correlations.run_rank_genes_on_good_biclusters(
    adata, good_mask, clusters_gex, clusters_tcr, min_count=min_cluster_size, key_added= rank_genes_uns_tag)

gex_header_tcr_score_names = ['mhci2', 'cdr3len', 'cd8', 'nndists_tcr']

logo_pngfile = outfile_prefix+'_bicluster_logos.png'

conga.plotting.make_logo_plots(
    adata, nbrs_gex, nbrs_tcr, min_cluster_size, logo_pngfile,
    good_bicluster_tcr_scores=good_bicluster_tcr_scores,
    rank_genes_uns_tag = rank_genes_uns_tag,
    gex_header_tcr_score_names = gex_header_tcr_score_names )

圖片.png

pval_threshold = 1.
results = []
for nbr_frac in nbr_fracs:
    nbrs_gex, nbrs_tcr = all_nbrs[nbr_frac]
    print('finding biased GEX features for nbrhoods with size', nbr_frac, nbrs_gex.shape)
    results.append( conga.correlations.tcr_nbrhood_rank_genes_fast( adata, nbrs_tcr, pval_threshold, verbose=False))
    results[-1]['nbr_frac'] = nbr_frac

# save the results to a file
tsvfile = outfile_prefix+'_tcr_nbr_graph_vs_gex_features.tsv'
print('making:', tsvfile)
results_df = pd.concat(results, ignore_index=True)
results_df.to_csv(tsvfile, index=False, sep='\t')

pngfile = outfile_prefix+'_tcr_nbr_graph_vs_gex_features.png'
print('making:', pngfile)
conga.plotting.plot_ranked_strings_on_cells(
    adata, results_df, 'X_tcr_2d', 'clone_index', 'mwu_pvalue_adj', 1.0, 'feature', pngfile)

圖片.png

pngfile = outfile_prefix+'_tcr_nbr_graph_vs_gex_features_panels.png'
print('making:', pngfile)
conga.plotting.make_feature_panel_plots(adata, 'tcr', all_nbrs, results_df, pngfile, 'gex')

圖片.png

pval_threshold = 1.
results = []
tcr_score_names = conga.tcr_scoring.all_tcr_scorenames # the TCR features to use
for nbr_frac in nbr_fracs:
    nbrs_gex, nbrs_tcr = all_nbrs[nbr_frac]
    results.append( conga.correlations.gex_nbrhood_rank_tcr_scores(
        adata, nbrs_gex, tcr_score_names, pval_threshold, verbose=False ))
    results[-1]['nbr_frac'] = nbr_frac
results_df = pd.concat(results, ignore_index=True)

tsvfile = outfile_prefix+'_gex_nbr_graph_vs_tcr_features.tsv'
print('making:', tsvfile)
results_df.to_csv(tsvfile, index=False, sep='\t')

pngfile = outfile_prefix+'_gex_nbr_graph_vs_tcr_features.png'
print('making:', pngfile)

conga.plotting.plot_ranked_strings_on_cells(
    adata, results_df, 'X_gex_2d', 'clone_index', 'mwu_pvalue_adj', 1.0, 'feature', pngfile,
    direction_column='ttest_stat')

圖片.png

pngfile = outfile_prefix+'_gex_nbr_graph_vs_tcr_features_panels.png'
print('making:', pngfile)
conga.plotting.make_feature_panel_plots(adata, 'gex', all_nbrs, results_df, pngfile, 'tcr')

圖片.png

代碼奉上氯夷，生活很好臣樱，有你更好

最后編輯于：2021.10.13 15:09:53

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

禁止轉(zhuǎn)載，如需轉(zhuǎn)載請通過簡信或評論聯(lián)系作者腮考。

人面猴
序言：七十年代末雇毫，一起剝皮案震驚了整個濱河市，隨后出現(xiàn)的幾起案子踩蔚，更是在濱河造成了極大的恐慌棚放，老刑警劉巖，帶你破解...
沈念sama閱讀 206,013評論 6贊 481
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件寂纪，死亡現(xiàn)場離奇詭異席吴，居然都是意外死亡赌结，警方通過查閱死者的電腦和手機，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 88,205評論 2贊 382
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門孝冒，熙熙樓的掌柜王于貴愁眉苦臉地迎上來柬姚，“玉大人，你說我怎么就攤上這事庄涡×砍校” “怎么了？”我有些...
開封第一講書人閱讀 152,370評論 0贊 342
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵穴店，是天一觀的道長撕捍。經(jīng)常有香客問我，道長泣洞，這世上最難降的妖魔是什么忧风？我笑而不...
開封第一講書人閱讀 55,168評論 1贊 278
?港島之戀（遺憾婚禮）
正文為了忘掉前任，我火速辦了婚禮球凰，結(jié)果婚禮上狮腿，老公的妹妹穿的比我還像新娘。我一直安慰自己呕诉，他們只是感情好缘厢，可當我...
茶點故事閱讀 64,153評論 5贊 371
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布。她就那樣靜靜地躺著甩挫，像睡著了一般贴硫。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發(fā)上伊者，一...
開封第一講書人閱讀 48,954評論 1贊 283
城市分裂傳說
那天英遭，我揣著相機與錄音，去河邊找鬼亦渗。笑死贪绘，一個胖子當著我的面吹牛，可吹牛的內(nèi)容都是我干的央碟。我是一名探鬼主播税灌，決...
沈念sama閱讀 38,271評論 3贊 399
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長吁一口氣：“原來是場噩夢啊……” “哼亿虽！你這毒婦竟也來了菱涤？” 一聲冷哼從身側(cè)響起，我...
開封第一講書人閱讀 36,916評論 0贊 259
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤洛勉，失蹤者是張志新（化名）和其女友劉穎粘秆，沒想到半個月后，有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體收毫，經(jīng)...
沈念sama閱讀 43,382評論 1贊 300
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡攻走，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 35,877評論 2贊 323
?白月光啟示錄
正文我和宋清朗相戀三年殷勘，在試婚紗的時候發(fā)現(xiàn)自己被綠了。大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片昔搂。...
茶點故事閱讀 37,989評論 1贊 333
活死人
序言：一個原本活蹦亂跳的男人離奇死亡玲销，死狀恐怖，靈堂內(nèi)的尸體忽然破棺而出摘符，到底是詐尸還是另有隱情贤斜，我是刑警寧澤，帶...
沈念sama閱讀 33,624評論 4贊 322
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布逛裤，位于F島的核電站瘩绒，受9級特大地震影響，放射性物質(zhì)發(fā)生泄漏带族。R本人自食惡果不足惜锁荔，卻給世界環(huán)境...
茶點故事閱讀 39,209評論 3贊 307
男人毒藥：我在死后第九天來索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望蝙砌。院中可真熱鬧堕战，春花似錦、人聲如沸拍霜。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,199評論 0贊 19
一樁弒父案薪介，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽祠饺。三九已至，卻和暖如春汁政，著一層夾襖步出監(jiān)牢的瞬間道偷，已是汗流浹背。一陣腳步聲響...
開封第一講書人閱讀 31,418評論 1贊 260
情欲美人皮
我被黑心中介騙來泰國打工记劈，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留勺鸦，地道東北人。一個月前我還...
沈念sama閱讀 45,401評論 2贊 352
代替公主和親
正文我出身青樓目木，卻偏偏與公主長得像换途，于是被迫代替她去往敵國和親。傳聞我的和親對象是個殘疾皇子刽射，可洞房花燭夜當晚...
茶點故事閱讀 42,700評論 2贊 345

10X單細胞（10X空間轉(zhuǎn)錄組）轉(zhuǎn)錄組 & VDJ 聯(lián)合分析（14）之CoNGA

hello锅风，今天我們繼續(xù)來分享分析代碼部分牢撼，這一次，我們直接用單細胞Seurat 的分析結(jié)果聯(lián)合VDJ進行分析爽锥，話不多說涌韩，直接開始。

寫出h5ad

開始分析,數(shù)據(jù)大家換成自己的數(shù)據(jù)

數(shù)據(jù)聯(lián)合

前處理

挑選單個克隆的細胞

降維聚類

聯(lián)合分析

graph_vs_graph

最后的結(jié)果

推薦閱讀更多精彩內(nèi)容