多聚腺苷化(polyadenylation,poly(A))是轉(zhuǎn)錄本成熟過(guò)程中在3'末端發(fā)生的重要修飾步驟制跟。選擇性多聚腺苷化(Alternative Poly(A),APA)是真核生物中一種廣泛存在的基礎(chǔ)調(diào)控機(jī)制,不僅增加細(xì)胞中轉(zhuǎn)錄組和蛋白組的復(fù)雜性印屁,并且影響目標(biāo)RNA的功能、穩(wěn)定性斩例、定位和翻譯效率雄人。Poly(A)位點(diǎn)標(biāo)識(shí)著轉(zhuǎn)錄本末尾,其準(zhǔn)確識(shí)別是基因注釋和轉(zhuǎn)錄調(diào)控機(jī)制研究的基礎(chǔ)念赶。APA表現(xiàn)出組織特異性础钠,對(duì)細(xì)胞增殖和分化具有重要作用。
選擇性聚腺苷酸(APA)在真核生物的mRNA穩(wěn)定性和功能中起著關(guān)鍵的轉(zhuǎn)錄后調(diào)控作用叉谜。單細(xì)胞RNA-seq (scRNA-seq)是發(fā)現(xiàn)基因表達(dá)水平細(xì)胞異質(zhì)性的有力工具旗吁。最常用的 10× scRNA-seq 3’豐富的建庫(kù)策略, 使我們能夠?qū)PA的研究分辨率提高到單細(xì)胞水平停局。然而很钓,目前還沒(méi)有可用的計(jì)算工具來(lái)調(diào)查來(lái)自scRNA-seq數(shù)據(jù)的APA概況。
在這里董栽,我們提出了一個(gè)軟件包scDAPA檢測(cè)和可視化動(dòng)態(tài)APA從scRNA-seq數(shù)據(jù)码倦。以bam/sam文件和細(xì)胞簇標(biāo)簽為輸入,scDAPA使用基于直方圖的方法和Wilcoxon秩和檢驗(yàn)檢測(cè)APA動(dòng)態(tài)锭碳,并使用動(dòng)態(tài)APA可視化候選基因袁稽。對(duì)標(biāo)結(jié)果表明,scDAPA能從scRNA-seq數(shù)據(jù)中有效識(shí)別不同細(xì)胞群中具有動(dòng)態(tài)APA的基因工禾。 :https://scdapa.sourceforge.io.
一运提、APA類(lèi)型:
(1)3’UTRAPA
大部分APA位點(diǎn)處于含有順勢(shì)作用元件(ciselements)的3’UTR區(qū),3’UTR-APA會(huì)對(duì)轉(zhuǎn)錄后基因調(diào)控產(chǎn)生許多影響闻葵,如mRNA穩(wěn)定性民泵、mRNA核轉(zhuǎn)移和定位以及編碼蛋白定位。
(2)Upstream Region APA(UR-APA)
UR-APA位點(diǎn)位于最后一個(gè)外顯子前槽畔,UR-APA引起末端外顯子的可變表達(dá)栈妆,導(dǎo)致mRNA編碼序列和3’UTR的變化。根據(jù)polyadenylation sites(PAS)的剪接模型,可將UR-APA分為兩類(lèi):Skipped terminal exon和Composite terminal exon鳞尔。Skipped terminal exon略過(guò)了末端外顯子嬉橙,而Composite terminal exon則由內(nèi)部外顯子延伸產(chǎn)生。
unset PYTHONPATH
source software/miniconda3/bin/activate software/miniconda3/envs/velocyto
10X_RNA/Development/scDAPA/extractReads.sh -r 10X_RNA/Development/velocyto/example/CellRanger/pbmc5k/outs/possorted_genome_bam.bam -c 10X_RNA/Development/velocyto/example/CellRanger/pbmc5k/outs/analysis/clustering/kmeans_10_clusters/clusters.csv -o ./result
10X_RNA/Development/scDAPA/extractGenes.sh -i10X_RNA/pipeline2.1/database/10X_Ref/refdata-cellranger-GRCh38-1.2.0/genes/genes.gtf -o hg38.gene.gff
export PATH=bedtools2/bin/:$PATH
10X_RNA/Development/scDAPA/annotate3Ends.sh -d 10X_RNA/Development/scDAPA/example/result/ -g 10X_RNA/Development/scDAPA/example/hg38.gene.gff
Column Name | Explanation |
---|---|
seqname | The name of the sequence |
source | The program that generated this feature |
feature | The name of this type of feature |
start | The starting position of the feature in the sequence |
end | The ending position of the feature |
score | A score between 0 and 1000 |
strand | Valid entries include "+", "-", or "." |
frame | If the feature is not a coding exon, the value should be "." |
gene | Gene ID and name |
start of read | The starting positions of reads annoted to this gene, separated by comma |
end of read | The ending positions of reads annoted to this gene, separated by comma |
將上述結(jié)果導(dǎo)入R包scDAPAminer
> library(scDAPAminer)
> # creat a folder named 'stat'
> # 1. only compare two specific cell groups
> scDAPAdetect(file1='./result/1.anno',file2='./result/2.anno',type='f2f',output_dir='./stat')
>
> # 2. compare every two cell groups stored in the ./result directory
> scDAPAdetect(dir='./result',type='d',output_dir='./stat',bin_size=100,count_cutoff=20)
Column Name | Explanation |
---|---|
chr | Name of the chromosome/scaffold |
gene | Gene ID and name |
meanlen1 | Mean length of 3′ ends to gene's start site in cell group 1 |
meanlen2 | Mean length of 3′ ends to gene's start site in cell group 2 |
SDD | Site distribution difference SDD∈[0,1] |
p.value | Statistical test p values |
p.adjust | Adjusted p values |
> dp = scDAPAview(files=c('./result/1.anno','./result/2.anno'),alt_names=c('cell_A','cell_B'),gtf=gtf,gene_id='ENSG00000160062',legend.position = c(0.2,0.8))
>
> # customize colour theme
> library(ggsci)
> dp + scale_colour_aaas()
>
> # customize legend title
> dp + labs(colour = "Cell type")
>
> # customize legend position
> dp + theme(legend.position = c(0.6, 0.9))
>
> # customize simultaneuouly
> dp + scale_colour_aaas() + labs(colour = "Cell type") + theme(legend.position = c(0.6, 0.9))
[1]Tian B, Manley J L. Alternative polyadenylation of mRNA precursors[J]. Nature Reviews Molecular Cell Biology, 2016, 18(1):18.
[2]Abdelghany S E, Hamilton M, Jacobi J L, et al. A survey of the sorghum transcriptome using single-molecule long reads[J]. Nature Communications, 2016, 7:11706.
http://www.frasergen.com/cn/info_173.aspx?itemid=258
Congting Ye, Qian Zhou, Xiaohui Wu, Chen Yu, Guoli Ji, Daniel R Saban, Qingshun Q Li, scDAPA: detection and visualization of dynamic alternative polyadenylation from single cell RNA-seq data, Bioinformatics, , btz701, https://doi.org/10.1093/bioinformatics/btz701