白利合子酵母菌在乙酸或銅脅迫下早期反應(yīng)的轉(zhuǎn)錄譜分析
摘要:
醋酸在食品工業(yè)和生物技術(shù)中是非常重要的微生物抑制化合物五慈,而非傳統(tǒng)酵母種白利合子酵母菌對(duì)醋酸有顯著的耐受性。Zbhaa1是釀酒酵母haa1基因的功能同源物衫哥,是一種能調(diào)節(jié)白利合子酵母菌對(duì)醋酸和銅脅迫的適應(yīng)性反應(yīng)的雙功能轉(zhuǎn)錄因子。在2018年5月發(fā)表的一篇文章Transcriptional profiling of Zygosaccharomyces bailii early response to acetic acid or copper stress mediated by ZbHaa1中襟锐,作者對(duì)白利合子酵母菌在亞致死濃度的乙酸(140 mM撤逢,pH4.0)或銅(0.08 mM)的早期反應(yīng)中的基因組轉(zhuǎn)錄變化做了相關(guān)研究,他們對(duì)野生型和脅迫條件下酵母的轉(zhuǎn)錄組進(jìn)行了測(cè)序粮坞,我們利用他們的實(shí)驗(yàn)數(shù)據(jù)蚊荣,對(duì)這兩組表達(dá)數(shù)據(jù)計(jì)算了各組的基因表達(dá)情況,進(jìn)行了聚類分析和差異表達(dá)基因的鑒定莫杈。我們將基因表達(dá)數(shù)據(jù)上傳至我們的數(shù)據(jù)庫互例,并且發(fā)現(xiàn)相同的組別在PCA分析中聚在一起,共鑒定了6個(gè)差異表達(dá)基因筝闹。
關(guān)鍵詞:
白利合子酵母菌媳叨,醋酸脅迫,銅脅迫关顷,差異表達(dá)基因
一糊秆、介紹
白利合子酵母菌因其對(duì)弱酸(乙酸)的高耐受性而被認(rèn)為是最重要的食品變質(zhì)酵母。與釀酒酵母相比议双,白利酵母對(duì)這種酸性物質(zhì)的耐受性高出了3倍痘番。Transcriptional profiling of Zygosaccharomyces bailii early response to acetic acid or copper stress mediated by ZbHaa1一文中,對(duì)白利合子酵母菌在亞致死濃度的乙酸(140 mM聋伦,pH4.0)或銅(0.08 mM)的早期反應(yīng)中的基因組轉(zhuǎn)錄情況進(jìn)行了測(cè)序夫偶,我們利用它們的測(cè)序數(shù)據(jù)界睁,計(jì)算了基因表達(dá)量,鑒定了差異表達(dá)基因以及進(jìn)行了PCA等分析兵拢。
我們利用的分析流程是根據(jù)2016年發(fā)表在Nature Protocols上的一篇名為Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown的文章翻斟,主要用到以下三個(gè)軟件:
HISAT 利用大量FM索引,以覆蓋整個(gè)基因組说铃,能夠?qū)NA-Seq的讀取與基因組進(jìn)行快速比對(duì)访惜,相較于STAR、Tophat腻扇,該軟件比對(duì)速度快债热,占用內(nèi)存少。
StringTie能夠應(yīng)用流神經(jīng)網(wǎng)絡(luò)算法和可選的de novo組裝進(jìn)行轉(zhuǎn)錄本組裝并預(yù)計(jì)表達(dá)水平幼苛。與Cufflinks等程序相比窒篱,StringTie實(shí)現(xiàn)了更完整、更準(zhǔn)確的基因重建舶沿,并更好地預(yù)測(cè)了表達(dá)水平墙杯。
Ballgown是R語言中基因差異表達(dá)分析的工具,能利用RNA-Seq實(shí)驗(yàn)的數(shù)據(jù)(StringTie, RSEM, Cufflinks)的結(jié)果預(yù)測(cè)基因括荡、轉(zhuǎn)錄本的差異表達(dá)高镐。
我們同樣利用DESeq2、airway等R包進(jìn)行了數(shù)據(jù)分析和處理畸冲。
二嫉髓、方法
1、原始數(shù)據(jù)下載
(1)二代測(cè)序數(shù)據(jù)
物種:白利合子酵母菌 (Zygosaccharomyces bailii)
對(duì)照組:野生型
實(shí)驗(yàn)組:Zbhaa 1 delta基因缺失型
測(cè)序方法:轉(zhuǎn)錄組測(cè)序邑闲,Illumina 單端測(cè)序得到 SRA 文件
樣本個(gè)數(shù):對(duì)照組和實(shí)驗(yàn)組各3組算行,共6個(gè)
(2)參考基因組數(shù)據(jù)
(3)基因組注釋文件
2、RNA-seq 分析流程
(1)安裝所需軟件:SAMtools苫耸、HISAT2纱意、stringtie、R包(ballgown鲸阔、DESeq2、airway)等迄委。
(2)組裝比對(duì)
1)使用hisat2-build建立索引文件
2)使用生成的索引文件對(duì)RNA-Seq進(jìn)行比對(duì)拼接
3)SAM tools 排序及格式轉(zhuǎn)換
4)stringtie 序列初組裝
(3)差異表達(dá)基因分析
1)計(jì)算基因表達(dá)量
2)鑒定組間差異表達(dá)基因
3)聚類分析
4)PCA分析
三褐筛、結(jié)果
1、原始數(shù)據(jù)下載
nohup wget ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR705/SRR7058083/SRR7058083.sra &
共下載了對(duì)照組3個(gè)樣本(SRR7058078叙身,SRR7058079渔扎,SRR7058080)和實(shí)驗(yàn)組3個(gè)樣本(SRR7058081,SRR7058082信轿,SRR7058083)SRA數(shù)據(jù)(<u>https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRP142338</u>)晃痴。
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/442/885/GCA_000442885.1_ZYBA0/GCA_000442885.1_ZYBA0_genomic.fna.gz
下載得到百利合子酵母菌參考基因組(<u>https://www.ncbi.nlm.nih.gov/genome/?term=Zygosaccharomyces+bailii</u>)
wget [<u>ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/442/885/GCA_000442885.1_ZYBA0/GCA_000442885.1_ZYBA0_genomic.gff.gz</u>]
(ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/442/885/GCA_000442885.1_ZYBA0/GCA_000442885.1_ZYBA0_genomic.gff.gz)
下載得到基因組注釋文件(<u>https://www.ncbi.nlm.nih.gov/genome/?term=Zygosaccharomyces+bailii</u>)
2残吩、使用hisat2-build建立索引文件
1)建立基因組索引
hisat2-build -p 4 GCA_000442885.1_ZYBA0_genomic.fna genome/genome
3、使用生成的索引文件對(duì)RNA-Seq進(jìn)行比對(duì)拼接
(1)轉(zhuǎn)換SRA文件為fastaq格式
(2)比對(duì)
hisat2 -x data/indexes/genome/genome -U SRR7058080.fastq -S SRR7058080.sam
4倘核、SAM tools 排序及格式轉(zhuǎn)換
samtools sort -@ 8 -o SRR7058080.bam SRR7058080.sam
5泣侮、stringtie 序列初組裝
stringtie SRR7058083_sorted.bam -p 8 -G ../genes/GCA_000442885.1_ZYBA0_genomic.gtf -l SRR7058083 -o SRR7058083.gtf
stringtie --merge -p 8 -G ../genes/GCA_000442885.1_ZYBA0_genomic.gtf -o stringtie_merged.gtf mergelist.txt
stringtie-1.3.6.OSX_x86_64/stringtie SRR7058083_sorted.bam -p 8 -G stringtie_merged.gtf -o ballgown/SRR7058083/SRR7058083.gtf -e -B
6、差異表達(dá)分析(在R中進(jìn)行)
library(ballgown)
library(RSkittleBrewer)
library(genefilter)
library(dplyr)
library(devtools)
bg_filt = subset(bg,"rowVars(texpr(bg)) >1",genomesubset=TRUE)
#濾掉低豐度的基因紧唱,這里選擇過濾掉樣本間差異少于一個(gè)轉(zhuǎn)錄本的數(shù)據(jù)
results_transcripts = stattest(bg_filt,feature="transcript",covariate="group", getFC=TRUE, meas="FPKM")
#用于確定組件有差異的轉(zhuǎn)錄本活尊,指定的分析參數(shù)是“transcripts”,主變量是“group"漏益,getFC可以指定輸出結(jié)果顯示組間表達(dá)量的foldchange蛹锰。
results_transcripts = data.frame(geneNames = ballgown::geneNames(bg_filt), geneIDs =ballgown::geneIDs(bg_filt), results_transcripts)
#為轉(zhuǎn)錄本添加基因名
results_genes = stattest(bg_filt, feature = "gene", covariate = "group", getFC = TRUE, meas = "FPKM")
#用于確定各組間有差異的基因
results_transcripts = arrange(results_transcripts,pval)
results_genes = arrange(results_genes,pval)
#按照p-value值從小到大排序
write.csv(results_transcripts, "transcript_results.csv",row.names=FALSE)
write.csv(results_genes, "gene_results.csv",row.names=FALSE)
#將兩個(gè)結(jié)果寫到csv文件中
subset(results_transcripts,results_transcripts$qval<0.05)
subset(results_genes,results_genes$qval<0.05)
#選出q值<0.05的轉(zhuǎn)錄本和基因,即組間表達(dá)有差異的基因
tropical= c('darkorange', 'dodgerblue','hotpink', 'limegreen', 'yellow')
palette(tropical)
fpkm = texpr(bg,meas="FPKM")
#以FPKM為參考值作圖绰疤,以組別作為區(qū)分條件
boxplot(fpkm,col=as.numeric(phenotype_table$group),las=2,ylab='log2(FPKM+1)')
#對(duì)所有轉(zhuǎn)錄本做箱線圖
ballgown::transcriptNames(bg)[12]
ballgown::geneNames(bg)[12]
plot(fpkm[12,]~phenotype_table$group,border=c(1,2),main=paste(ballgown::geneNames(bg)[12],' : ',ballgown::transcriptNames(bg)[12]),pch=19, xlab="group",ylab='log2(FPKM+1)')
points(fpkm[12,] ~ jitter(as.numeric(phenotype_table$group)),col=as.numeric(phenotype_table$group))
#對(duì)單個(gè)轉(zhuǎn)錄本做箱線圖
plotTranscripts(ballgown::geneIDs(bg)[1729], bg, main=c('Gene MSTRG.1531 in sample SRR7058083'), sample=c('SRR7058083'))
#查看某一基因位置上所有轉(zhuǎn)錄本(以ID=1729的基因?yàn)槔?
plotMeans('MSTRG.1531', bg_filt,groupvar="group",legend=FALSE)
#以組別為區(qū)分铜犬,查看基因MSTRG.1531的表達(dá)情況
BiocManager::install("airway")
BiocManager::install("Rsamtools")
BiocManager::install("GenomicFeatures")
BiocManager::install("DESeq2")
BiocManager::install("pheatmap")
library("airway")
pData(bg) = data.frame(id=sampleNames(bg), group=rep(c("wild-type", "Zbhaa1delta deletion"), each=3))
sampleTable <- pData(bg)
filenames <- file.path(paste0(sampleTable$id, "_sorted.bam"))
library("Rsamtools")
bamfiles <- BamFileList(filenames, yieldSize=2000000)
library("GenomicFeatures")
gfffile <- file.path("/Users/guoyafei/peppa/data/genes/GCA_000442885.1_ZYBA0_genomic.gff")
(txdb <- makeTxDbFromGFF(gfffile, format="gff", circ_seqs=character()))
(ebg <- exonsBy(txdb, by="gene"))
library("GenomicAlignments")
library("BiocParallel")
register(SerialParam())
se <- summarizeOverlaps(features=ebg, reads=bamfiles,mode="Union",singleEnd=TRUE,ignore.strand=TRUE,fragments=FALSE )
#基因計(jì)數(shù)
(colData(se) <- DataFrame(sampleTable))
#填充樣本的具體信息,方便后續(xù)分組轻庆,尋找差異基因
library("DESeq2")
#采用 DESeq2 包進(jìn)行差異表達(dá)基因的分析
countdata <- assay(se)
coldata <- colData(se)
(ddsMat <- DESeqDataSetFromMatrix(countData = countdata,colData = coldata,design = ~group))
dds <- DESeqDataSet(se, design = ~ group)
# design 參數(shù)為 formula癣猾,此處為group一個(gè)因素。
dds <- dds[ rowSums(counts(dds)) > 1, ]
nrow(dds)
rld <- rlog(dds, blind=FALSE)
par( mfrow = c( 1, 2 ) )
plot(log2(counts(dds, normalized=FALSE)[,1:2] + 1),pch=16, cex=0.3)
(sampleDists <- dist( t( assay(rld) ) ))
sampleDistMatrix <- as.matrix( sampleDists )
#構(gòu)建距離矩陣
rownames(sampleDistMatrix) <- rld$group
colnames(sampleDistMatrix) <- NULL
library("RColorBrewer")
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
library("pheatmap")
pheatmap(sampleDistMatrix,clustering_distance_rows=sampleDists,clustering_distance_cols=sampleDists,col=colors)
#差異表達(dá)基因聚類分析
plotPCA(rld, intgroup = group)
#PCA分析
參考文獻(xiàn)
[1] Miguel Antunes, Margarida Palma, Isabel Sa?-Correia. Transcriptional profiling of Zygosaccharomyces bailii early response to acetic acid or copper stress mediated by ZbHaa1.[J] Scientific Reports,2018,DOI:10.1038/s41598-018-32266-9
[2] Mortazavi Ali,Williams Brian A,McCue Kenneth,Schaeffer Lorian,Wold Barbara. Mapping and quantifying mammalian transcriptomes by RNA-Seq.[J]. Nature Methods,2008,5(7).
[3] Pertea Mihaela,Kim Daehwan,Pertea Geo M,Leek Jeffrey T,Salzberg Steven L. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown.[J]. Nature protocols,2016,11(9).
[4] Heng Li,Bob Handsaker,Alec Wysoker,Tim Fennell,Jue Ruan,Nils Homer,Gabor Marth,Goncalo Abecasis,Richard Durbin. The Sequence Alignment/Map format and SAMtools[J]. Bioinformatics,2009,25(16).
[5] Garber Manuel,Grabherr Manfred G,Guttman Mitchell,Trapnell Cole. Computational methods for transcriptome annotation and quantification using RNA-seq.[J]. Nature Methods,2011,8(6).
[6] HISAT2 http://ccb.jhu.edu/software/hisat2/index.shtml
[7] StringTie http://ccb.jhu.edu/software/stringtie/
[8] Bowtie: An ultrafast, memory-efficient short read aligner http://bowtie-bio.sourceforge.net/index.shtml