一脸侥、讀文章獲取下載數(shù)據(jù)
1、讀文章
一般我都從NCBI上面下載文章盈厘,找到數(shù)據(jù)號
2睁枕、下載數(shù)據(jù)
進(jìn)入NCBI的GEO數(shù)據(jù)庫,輸入數(shù)據(jù)號沸手,通過FTP下載數(shù)據(jù)外遇,并使用fastq-dump命令將SRR格式轉(zhuǎn)換為FASTQ格式
for ((i=56;i<=62;i++));
do
fastq-dump --gzip --split-files SRR35899$i.sra
done?
二、質(zhì)控
(1)契吉、fastqc -o ../fastqc *.fastq.gz
(2)跳仿、數(shù)據(jù)過濾 trimmomatic;fastx_toolkit
三、下載參考基因組及注釋序列(無參基因組需要拼接轉(zhuǎn)錄本TRINITY)
一般ensembl數(shù)據(jù)庫下載FATSTA及GTF文件
ftp://ftp.ensembl.org/pub/
四捐晶、序列比對
使用HISAT2
(1)菲语、構(gòu)建索引序列
~/Python-2.7.14/python ~/software/hisat2-2.1.0/extract_exons.py Homo_sapiens.GRCh38.83.chr.gtf>genome.exon
~/Python-2.7.14/python ~/software/hisat2-2.1.0/
extract_splice_sites.py Homo_sapiens.GRCh38.83.chr.gtf>genome.ss
hisat2-build -p 32 Mus_musculus.GRCm38.dna.primary_assembly.fa --ss genome.ss --exon genome.exon genome_tran
(2)、序列比對
nohup hisat2 -p 16 --dta -x ~/20171014/refs/grch38/genome -1 ~/20171014/raw_data/SRR3589956_1.fastq.gz -2 ~/20171014/raw_data/SRR3589956_2.fastq.gz -S SRR3589956.sam
五惑灵、格式轉(zhuǎn)換
samtools view -S ? ?-b
samtools sort ?-o
samtools index
六山上、計(jì)數(shù)
使用htseq進(jìn)行計(jì)數(shù)
/stor9000/apps/users/NWSUAF/2016050428/Python-2.7.14/python? ~/Python-2.7.14/bin/htseq-count ~/20171014/hisat2/SRR3589958_count.sam ~/20171014/human/Homo_sapiens.GRCh38.90.chr.gtf
七、差異基因分析
(1)讀取數(shù)據(jù)文件
control1<-read.table("SRR3589959.count",sep="\t",col.names=c("gene_id",control1))
control12<-read.table("SRR3589960.count",sep="\t",col.names=c("gene_id",control2))
rep1<-read.table("SRR3589961.count",sep="\t",col.names=c("gene_id",akap951))
rep2<-read.table("SRR3589962.count",sep="\t",col.names=c("gene_id",akap952))
raw_count<-merge(merge(control1,control2,by="gene_id),merge(rep1,rep2,by="gene_id"))
ensemble<-gsub("\\.\\*d","",raw_count$gene_id)
row.names(raw_count)=ensemble
raw_count_filter<-raw_count[,-1]
(2)構(gòu)建DDS對象
condition<-factor(c(rep("control",2),rep("apak95",2)),levels=c("control","apak95"))
cound_data<-raw_count_filter[,1:4]
cc_data<-data.frame(row.name=col.names(raw_count_filter),condition)
dds<-DEseqDataSetFromMatrix(count_data,col_data,design=~condition)
(3)Deseq標(biāo)準(zhǔn)化dds
dds2<-Deseq(dds)
resultsname(dds2)
res<-result(dds2)
summary(res)
(4)提取差異分析結(jié)果
res<-res[order(res$padj),])
diff_gene_deseq2<-subset(res,padj<0.05 & (log2foldchange>1 |?log2foldchange<-1))
resdata<merge(as.data.frame(res),as.data.frame(counts(dds2,normalize=TRUE)),by="row.names",sort=FALSE)
write.csv(as.data.frame(resdata),"差異分析結(jié)果")
八英支、GO胶哲、PATHWAY分析
enrichGO<-enrichGO(gene, OrgDb, keyType = "ENTREZID", ont = "MF",
? pvalueCutoff = 0.05, pAdjustMethod = "BH", universe, qvalueCutoff = 0.2,
? minGSSize = 10, maxGSSize = 500, readable = FALSE, pool = FALSE)
enrichPATHWAY<-enrichKEGG(gene, organism = "hsa", keyType = "kegg", pvalueCutoff = 0.05,
? pAdjustMethod = "BH", universe, minGSSize = 10, maxGSSize = 500,
? qvalueCutoff = 0.2, use_internal_data = FALSE)