關鍵詞:Transposable Element;ERV內(nèi)源性反轉(zhuǎn)錄病毒裂七;單細胞測序分析唬格;Seurat家破;scTE。
背景:
采用scTE對10X 單細胞測序數(shù)據(jù)進行TE定量购岗,再倒入Seurat進行下游分析汰聋。Jiekai 實驗室,2021年3月發(fā)表在自然通訊雜志藕畔。
轉(zhuǎn)座因子 (Transposable Element马僻,TE) 占典型真核生物基因組的大部分庄拇,并以不清楚的方式導致細胞異質(zhì)性注服。單細胞測序技術是探索細胞的強大工具,但分析通常以基因為中心措近,并且尚未解決 TE 表達問題溶弟。
方法:
1. 安裝scTE
# scTE works with python >=3.6.
$ git clone https://github.com/JiekaiLab/scTE.git ## 進入你想要下載scTE的文件夾。
$ cd scTE
$ python setup.py install ## 進行安裝
# Building genome indices
$ scTE_build -g mm10 # Mouse
$ scTE_build -g hg38 # Human
2. 對10x的輸出結(jié)果bam文件進行scTE分析瞭郑。
$ scTE -i ../run_cellranger_count/run_count_YL002273_S2/outs/possorted_genome_bam.bam -o YL002272_S2 -x /home/ye.liu/yang-secondary/ye/biotools/scTE/mm10.exclusive.idx --hdf5 True -CB CR -UMI UB
--hdf5 True
結(jié)果輸出是hdf5格式辜御。如果用Seurat進行下游分析需要轉(zhuǎn)換為Seurat object。
-CB
cell barcode屈张,要確認bam
文件中你的cell barcode的標簽是CR還是CB擒权。如果是CR就-CB CR
,如果是CB就-CB CB
。
查看示例bam阁谆,倒數(shù)第四列是CB:
$ samtools view test.bam
A00519:758:HTCCHDSXY:3:2535:21296:19774 16 chr1 14021 0 90M * 0 0 TGGATTTCTATCTCCCTGGCTTGGTGCCAGTTCCTCCAAGTCGATGGCACCTCCCTCCCTCTCAACCACTTGAGCAAACTCCAAGACATC ,FFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:F:FFFFFFFFFFFFFFFFFFF:FFFFF NH:i:5 HI:i:1 AS:i:88 nM:i:0 RG:Z:SC3_v3_NextGem_DI_CellPlex_Human_PBMC_10K:0:1:HTCCHDSXY:3 RE:A:I xf:i:0 CR:Z:CTCCCTCCACTGCGAC CY:Z:FFFFFFFFFFFFFFFF CB:Z:CTCCCTCCACTGCGAC-1 UR:Z:AAGGCGTAGTAG UY:Z:FFFFFFFFFFFF UB:Z:AAGGCGTAGTAG
A00519:758:HTCCHDSXY:1:1355:17237:31720 0 chr1 14260 0 90M * 0 0 CTCCCTCTCATCCCAGAGAAACAGGTCAGCTGGGAGCTTCTGCCCCCACTGCCTAGGGACCAACAGGGGCAGGAGGCAGTCACTGACCCC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:5 HI:i:1 AS:i:88 nM:i:0 RG:Z:SC3_v3_NextGem_DI_CellPlex_Human_PBMC_10K:0:1:HTCCHDSXY:1 RE:A:I xf:i:0 CR:Z:TCGTCCACAGTATGAA CY:Z:FFFFFFFFFFFFFFFF CB:Z:TCGTCCACAGTATGAA-1 UR:Z:GACTTATTTTTT UY:Z:FFFFFFFFFFFF UB:Z:GACTTATTTTTT
A00519:758:HTCCHDSXY:3:2227:16703:32080 16 chr1 14411 1 90M * 0 0 TCAGTTCTTTATTGATTGGTGTGCCGTTTTCTCTGGAAGCCTCTTAAGAACACAGTGGCGCAGGCTGGGTGGAGCCGTCCCCCCATGGAG FFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF:FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:3 HI:i:1 AS:i:88 nM:i:0 RG:Z:SC3_v3_NextGem_DI_CellPlex_Human_PBMC_10K:0:1:HTCCHDSXY:3 RE:A:I xf:i:0 CR:Z:TTGAGTGGTTGTGGCC CY:Z:FFFFFFFFFFFFFFFF CB:Z:TTGAGTGGTTGTGGCC-1 UR:Z:TATAATGCTCAG UY:Z:FFFFFFFFFFFF UB:Z:TATAATGCTCAG
A00519:758:HTCCHDSXY:3:2563:23665:33802 16 chr1 14411 1 90M * 0 0 TCAGTTCTTTATTGATTGGTGTGCCGTTTTCTCTGGAAGCCTCTTAAGAACACAGTGGCGCAGGCTGGGTGGAGCCGTCCCCCCATGGAG FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:3 HI:i:1 AS:i:88 nM:i:0 RG:Z:SC3_v3_NextGem_DI_CellPlex_Human_PBMC_10K:0:1:HTCCHDSXY:3 RE:A:I xf:i:0 CR:Z:TGTTGAGAGGCAATGC CY:Z:FFFFFFFFFFFFFFFF CB:Z:TGTTGAGAGGCAATGC-1 UR:Z:ACGGGTGTGGAG UY:Z:FFFFFFFFFFFF UB:Z:ACGGGTGTGGAG
3. hdf5 轉(zhuǎn)化成Seurat object
使用Convert()進行轉(zhuǎn)換碳抄。
using the function Convert from SeuratDisk.
# R
library(SeuratDisk)
library(Seurat)
# 轉(zhuǎn)換為h5seurat 文件
Convert("../../../YL002272_S1.h5ad", dest = "h5seurat", overwrite = TRUE)
# 再將其導入R
Seurat.obj <- LoadH5Seurat("../../../YL002272_S1.h5seurat")
將count matrix中的gene 和 TE分開
# R
## load TE names
te = read.csv('../data/mm10.TEname.txt', sep = '\t', header = F)
##
Gene = subset(Seurat.obj, features = rownames(Seurat.obj)[!rownames(Seurat.obj) %in% te$V1])
TEs = subset(Seurat.obj, features = rownames(Seurat.obj)[rownames(Seurat.obj) %in% te$V1])
TEs
可以進行Seurat對應的分析。
如何下載mm10.TEname.txt文件
# hg38
$ wget -c http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/rmsk.txt.gz -O hg38.te.txt
$ zcat hg38.te.txt | grep -E 'LINE|SINE|LTR|DNA|Retroposon' | cut -f 11 | sort | uniq > hg38.TEname.txt
# mm10
wget -c http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/rmsk.txt.gz -O mm10.te.txt
zcat mm10.te.txt | grep -E 'LINE|SINE|LTR|DNA|Retroposon' | cut -f 11 | sort | uniq > mm10.TEname.txt
# if you need to know the family and class info for the TE names
zcat hg38.te.txt | grep -E 'LINE|SINE|LTR|DNA|Retroposon' | cut -f 11,12,13 | sort | uniq > hg38.TEnamefamilyclass.txt
zcat mm10.te.txt | grep -E 'LINE|SINE|LTR|DNA|Retroposon' | cut -f 11,12,13 | sort | uniq > mm10.TEnamefamilyclass.txt
### Note: check this page https://github.com/jphe/scTE/issues/3
參考文獻:
https://github.com/JiekaiLab/scTE
https://www.nature.com/articles/s41467-021-21808-x