在ASEReadCounter完成位點(diǎn)的覆蓋度信息計(jì)數(shù)統(tǒng)計(jì)之后,還需要對(duì)位點(diǎn)添加基因ID,隨后做二項(xiàng)分布和費(fèi)舍爾精確檢驗(yàn)磕蒲,這里推薦GENEiase軟件。
ASE (等位基因特異性表達(dá))—— ASEReadCounter - 簡(jiǎn)書 (jianshu.com)
GENEiase軟件論文:
https://www.nature.com/articles/srep21134.pdf
找到了一個(gè)介紹ASE的PPT:
https://scilifelab.github.io/courses/rnaseq/1610/slides/ASE_Olof_Emanuelsson.pdf
1.下載安裝
1.1 下載
https://github.com/edsgard/geneiase/tags
1.2 安裝
$ tar xvf geneiase-1.0.1.tar.gz
$ cd /your/path/geneiase-1.0.1/bin
geneiase是基于R的只盹,首先需要進(jìn)入R環(huán)境辣往,安裝依賴包:
$ R
> install.packages(c('getopt', 'binom', 'VGAM'))
> q()
安裝完成后,退出R殖卑,即可正常使用geneiase站削。
$ geneiase
Usage: geneiase [-[-ase.type|t] <character>] [-[-in.file|i] <character>] [-[-out.file|o] <character>] [-[-betabin.p|p] <double>] [-[-betabin.rho|r] <double>] [-[-n.bootstrap.samples|b] <integer>] [-[-min.feat.vars|m] <integer>] [-[-nmax.vars|x] <integer>] [-[-lib.file|l] <character>] [-[-help|h]]
出現(xiàn)Usage,安裝成功孵稽。
2. 參數(shù)
geneiase只需要兩個(gè)參數(shù)许起,-t和-i:
-t,
"static"或者"icd",
指定數(shù)據(jù)類型是靜態(tài)的"static"還是獨(dú)立的條件依賴"icd"的ASE
-i,
輸入文件的文件名
安裝包解壓后的test文件夾中有兩種數(shù)據(jù)類型的示例數(shù)據(jù)。
static數(shù)據(jù)包含四列信息菩鲜,分別為基因ID(feautureID), snpID, 替代等位基因數(shù)(alternative allele count),參考等位基因數(shù)目( reference allele count)园细,示例格式:
$ less static.test.input.tab
gene snp.id alt.dp ref.dp
10.9 1 4 6
10.9 2 6 4
10.9 3 5 5
10.9 4 0 10
10.9 5 9 1
10.9 6 5 5
10.9 7 3 7
10.9 8 8 2
10.9 9 7 3
101.2 10 6 4
101.2 11 5 5
103.3 12 4 6
103.3 13 9 1
103.3 14 1 9
105.5 15 5 5
105.5 16 0 10
105.5 17 7 3
icd數(shù)據(jù)包含六列信息,分別為基因ID接校,SNPid猛频,未經(jīng)處理的替代等位基因數(shù)目(Untreated alternative allele count), 未處理的參考等位基因數(shù)目(Untreated reference allele count), 處理的替代等位基因數(shù)目(Treated alternative allele count), 處理的參考等位基因數(shù)目(Treated reference allele count),示例格式:
$ less icd.test.input.tab
gene snp.id U.alt.dp U.ref.dp T.alt.dp T.ref.dp
1.11 1 8 2 7 3
1.11 2 3 7 4 6
1.11 3 8 2 6 4
1.11 4 5 5 7 3
1.11 5 6 4 1 9
1.11 6 9 1 5 5
1.11 7 4 6 5 5
3.ASE檢驗(yàn)
ASEReadCounter完成位點(diǎn)的覆蓋度信息計(jì)數(shù)統(tǒng)計(jì)之后,將結(jié)果中的Chr和位點(diǎn)的位置信息提取出來鹿寻,整理為下列各式的表格:
$ less LPF1_MP_pos.txt
Mpar_chr1 2001 2001
Mpar_chr1 2015 2015
Mpar_chr1 2034 2034
Mpar_chr1 2037 2037
Mpar_chr1 2206 2206
3.1 查找位點(diǎn)的基因信息
bedtools的使用方法睦柴,這篇文章有詳細(xì)的介紹:
最全Bedtools使用說明--只看本文就夠了 - 簡(jiǎn)書 (jianshu.com)
首先對(duì)基因組文件position文件進(jìn)行排序,注意pos文件和gff文件中的染色體名稱要一致:
$ bedtools sort -chrThenSizeA -i LPF1_MP.pos > LPF1_MP_sort.pos
$ bedtools sort -chrThenSizeA -i Mparg_v2.0.gff3 > Mparg_v2.0_sort.gff3
返回pos文件中毡熏,SNP位點(diǎn)在基因組上的位置:
$ bedtools intersect -a LPF1_MP_sort.pos -b Mparg_v2.0_sort.gff3 -wb > LPF1_MP_gene.pos
3.2 在R中添加基因信息
在ASEReadCounter輸出的位點(diǎn)覆蓋度信息計(jì)數(shù)文件結(jié)果中爱只,添加上一步得到的基因信息。
ASE (等位基因特異性表達(dá))—— ASEReadCounter - 簡(jiǎn)書 (jianshu.com)
# 讀取LPF1_MP_ASE.table和LPF1_MP_gene.pos
> ASE<-read.table("LPF1_MP_ASE.table",header = T)
> gene<-read.table("LPF1_MP_gene.pos")
創(chuàng)建snp_id:合并LPF1_MP_ASE.table中的contig和position兩列招刹,以及CPF1_CE_gene.pos中的V1和V2兩列恬试,創(chuàng)建snp_id。
> ASE <- tidyr::unite(ASE, "snp_id", contig, position,remove = FALSE)
> head(ASE)
snp_id contig position variantID refAllele altAllele refCount altCount totalCount
1 Mpar_chr1_4724 Mpar_chr1 4724 . A C 47 39 86
2 Mpar_chr1_4881 Mpar_chr1 4881 . C G 52 33 85
3 Mpar_chr1_4900 Mpar_chr1 4900 . T C 46 31 77
4 Mpar_chr1_4962 Mpar_chr1 4962 . T C 49 34 83
5 Mpar_chr1_4995 Mpar_chr1 4995 . G T 45 44 89
lowMAPQDepth lowBaseQDepth rawDepth otherBases improperPairs
1 0 0 88 0 2
2 0 0 86 1 0
3 0 0 77 0 0
4 0 0 83 0 0
5 0 0 89 0 0
> gene <- tidyr::unite(gene, "snp_id", V1, V2,remove = FALSE)
> head(gene)
snp_id V1 V2 V3 V4 V5 V6 V7 V8
1 Mpar_chr1_29618717 Mpar_chr1 29618717 29618717 Mpar_chr1 AUGUSTUS intron 29618269 29618971
2 Mpar_chr1_29618717 Mpar_chr1 29618717 29618717 Mpar_chr1 AUGUSTUS gene 29617909 29621312
3 Mpar_chr1_29618717 Mpar_chr1 29618717 29618717 Mpar_chr1 AUGUSTUS transcript 29617909 29621312
4 Mpar_chr1_29511536 Mpar_chr1 29511536 29511536 Mpar_chr1 AUGUSTUS CDS 29511235 29511554
5 Mpar_chr1_29511536 Mpar_chr1 29511536 29511536 Mpar_chr1 AUGUSTUS exon 29511235 29511554
V9 V10 V11 V12
1 1 - . Parent=MP1G214900.1
2 1 - . ID=MP1G214900
3 1 - . ID=MP1G214900.1
4 1 - 0 Parent=MP1G214200.1
5 . - . Parent=MP1G214200.1
提取注釋中所有的CDS疯暑,ASE位于CDS區(qū)域更加準(zhǔn)確:
> gene<-subset(gene,V6=='CDS')
根據(jù)snp_id進(jìn)行匹配训柴,并添加基因ID在ASE文件中:
> merga<-merge(ASE,gene, by = "snp_id", all.x = TRUE)
> write.csv(merga,"LPF1_MP_merga.csv",row.names = F)
3.3 準(zhǔn)備輸入文件
以static數(shù)據(jù)為例,需要四列信息妇拯,LPF1_MP_merga.csv中提然媚佟:
> raw<-read.csv("LPF1_MP_merga.csv")
> GeneiASE_input<-raw[,c(26,1,8,7)]
> head(GeneiASE_input)
V12 snp_id altCount refCount
1 <NA> Mpar_c2518_pilon_116563 1 395
2 <NA> Mpar_c2518_pilon_132171 3 3
3 <NA> Mpar_c2518_pilon_133271 1 1
4 <NA> Mpar_c2518_pilon_153461 2 5
5 <NA> Mpar_c2518_pilon_155680 2 4
去除gene ID缺失的行:
> GeneiASE_input <- na.omit(GeneiASE_input)
> names(GeneiASE_input)[1] <-"gene_id"
> head(GeneiASE_input)
gene_id snp_id altCount refCount
72 Parent=MP1G130700.1 Mpar_chr1_10006428 14 19
73 Parent=MP1G130700.1 Mpar_chr1_10006455 14 17
87 Parent=MP1G130700.1 Mpar_chr1_10006863 27 24
88 Parent=MP1G130700.1 Mpar_chr1_10006921 23 18
89 Parent=MP1G130700.1 Mpar_chr1_10006970 23 18
寫出:
> write.table(GeneiASE_input,"LPF1_MP_GeneiASE_input.tab",quote = FALSE,row.names = FALSE,col.names = T,sep ='\t')
3.4 ASE檢驗(yàn)
$ cd your/path/geneiase/bin
$ geneiase -t static -i LPF1_MP_GeneiASE_input.tab -b 100
- -b n.bootstrap.samples
The number of bootstrap samples (B) to be used to generate the null distribution. Default: 1e5
結(jié)果文件中包含以下幾列:
- feat: 基因ID
- n.vars: 基因變異的數(shù)量
- mean.s: Mean of s across the variants within the gene
- median.s: Median of s across the variants within the gene
- sd.s: Standard deviation of s across the variants within the gene
- cv.s: Coefficient of variation of s across the variants within the gene
- liptak.s: Stouffer-Liptak combination of s
- p.nom: Nominal p-value
- fdr: Benjamini-Hochberg corrected p-value
3.5 整理ASE檢驗(yàn)結(jié)果
> p_value<-read.csv("LPF1_MP_GeneiASE_input.tab.static.gene.pval.tab",sep ='\t')
> names(p_value)[1] <-"gene_id"
> names(raw)[26] <-"gene_id"
> result <- merge(p_value,raw, by = "gene_id", all.x = TRUE)
> result <- result[,c(10:12,14:17,1:9)]
> write.csv(result,"LPF1_MP_result.csv",row.names = F)
引用轉(zhuǎn)載請(qǐng)注明出處,如有錯(cuò)誤敬請(qǐng)指出越锈。