1.環(huán)境配置
下載docker到服務(wù)器委刘,并啟動(dòng)
問題:
1.docker權(quán)限
2.選擇系統(tǒng)
3.軟件安裝
4.掛載加上目錄 -v
5.json文件參數(shù)配置
6.鏡像與容器
7.封裝
容器封裝鏡像命令
docker commit 容器ID rna:v1.0
首先后臺(tái)運(yùn)行鏡像
docker run -itd --name rna_test -v /data/xczhang/RNA_seq/pipline/RNA-seq_pipeline:/data/RNA ubuntu
進(jìn)入容器
docker exec -it 容器ID /bin/bash
配置容器環(huán)境
1.apt-getinstall vim
2.apt-getinstall wget
3.安裝R
apt-getinstall r-base
4.安裝java
wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%[2Fwww.oracle.com%2F;](http://2fwww.oracle.com%2f%3b/)oraclelicense=accept-securebackup-cookie" "[http://download.oracle.com/otn-pub/java/jdk/8u141-b15/336fa29ff2bb4ef291e347e091f7f4a7/jdk-8u141-linux-x64.tar.gz](http://download.oracle.com/otn-pub/java/jdk/8u141-b15/336fa29ff2bb4ef291e347e091f7f4a7/jdk-8u141-linux-x64.tar.gz)"
tar -zxvf jdk-8u161-linux-x64.tar.gz #解壓
#重命名為JDK8
mv jdk1.8.0_161 jdk8
配置環(huán)境變量
sudo vim /etc/profile #打開環(huán)境變量配置文件
增加下面內(nèi)容到該文件最后
export JAVA_HOME=/usr/local/src/jdk8
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
使環(huán)境生效
source /etc/profile
2.軟件安裝
5.安裝FastQC-0.11.5
nohup wget -c[http://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.5.zip](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.5.zip)
unzip 不用編譯直接使用
6.Trimmomatic-0.38安裝
wget[http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/Trimmomatic-0.38.zip](http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/Trimmomatic-0.38.zip)
java -jar trimmomatic-0.38.jar
7.samtools-1.4下載
wget [https://github.com/samtools/samtools/releases/download/1.4/samtools-1.4.tar.bz2](https://github.com/samtools/samtools/releases/download/1.4/samtools-1.4.tar.bz2)
./configure --prefix=/home/vip47/biosoft/samtools-1.9
make
make install
8.hisat2-2.1.0下載
直接可以用
wget
[ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/downloads/hisat2-2.1.0-Linux_x86_64.zip](ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/downloads/hisat2-2.1.0-Linux_x86_64.zip)
9.StringTie下載
直接可以用
wget [http://ccb.jhu.edu/software/stringtie/dl/stringtie-1.3.3b.Linux_x86_64.tar.gz](http://ccb.jhu.edu/software/stringtie/dl/stringtie-1.3.3b.Linux_x86_64.tar.gz)
10.cufflinks下載
直接可以用
wget
[http://cole-trapnell-lab.github.io/cufflinks/assets/downloads/cufflinks-2.2.1.Linux_x86_64.tar.gz](http://cole-trapnell-lab.github.io/cufflinks/assets/downloads/cufflinks-2.2.1.Linux_x86_64.tar.gz)
從另一個(gè)服務(wù)器復(fù)制文件到其他服務(wù)器命令
scp cufflinks-2.2.1.Linux_x86_64.tar.gz zhangxc@IP:/data/xczhang/RNA_seq/pipline/RNA-seq_pipeline/PE/
從服務(wù)器復(fù)制文件到docker容器命令
docker cp cufflinks-2.2.1.Linux_x86_64.tar.gz rna_test:/home/ubuntu/SSD/software/
11.STAR下載
直接可以用
wget -c[https://github.com/alexdobin/STAR/archive/2.7.3a.tar.gz](https://github.com/alexdobin/STAR/archive/2.7.3a.tar.gz)
12.STAR-Fusion下載
wget[https://github.com/STAR-Fusion/STAR-Fusion/releases/download/STAR-Fusion-v1.8.1/STAR-Fusion-v1.8.1.FULL.tar.gz](https://github.com/STAR-Fusion/STAR-Fusion/releases/download/STAR-Fusion-v1.8.1/STAR-Fusion-v1.8.1.FULL.tar.gz)
13.jq命令安裝
apt-get install jq
14.q30下載q30-master/q30.py
網(wǎng)址 [https://github.com/dayedepps/q30/find/master](https://github.com/dayedepps/q30/find/master)
fastq模塊也在這個(gè)網(wǎng)址下
15.python安裝
apt-get install python
pip命令未找到
/etc/apt/sources.list 文件添加軟件源
wget[https://bootstrap.pypa.io/get-pip.py](https://bootstrap.pypa.io/get-pip.py)
下載并 python get-pip.py
16.homo基因組下載建立hisat索引
直接下載hisat2索引文件
mwget -n 30[https://cloud.biohpc.swmed.edu/index.php/s/grch37_tran/download](https://cloud.biohpc.swmed.edu/index.php/s/grch37_tran/download)
17.STAR-fusion安裝
安裝conda
conda install -c bioconda star-fusion
從pkgs中找到 STAR-Fusion-v1.8.1.FULL.tar.gz 重新解壓修改bin目錄下可執(zhí)行程序路徑
注意要先添加STAR軟件路徑到環(huán)境變量
18.報(bào)錯(cuò)02-alignSummary.sh: line 18: bc: command not found
解決:apt-get install bc
19.STAR比對(duì)fa文件
wget
[ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna.toplevel.fa.gz](ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna.toplevel.fa.gz)
gtf文件
20.cufflinks測試
bash 03-assemblyCuff.sh 腳本運(yùn)行正常 生成樣本轉(zhuǎn)錄本文件
bash 03-gtfMerge.sh 腳本運(yùn)行正常
轉(zhuǎn)錄本重新組裝cuffmerge、gffread定量
Cuffcompare是將組裝后的轉(zhuǎn)錄本與參考基因組的轉(zhuǎn)錄本進(jìn)行比較,從而對(duì)比對(duì)結(jié)果進(jìn)行分類
對(duì)于stringtie組裝后的gtf文件卿嘲,想將組裝后的轉(zhuǎn)錄本的序列從對(duì)應(yīng)的參考基因組上提取出來,這時(shí)就可以用gffread
21.cuffdiff轉(zhuǎn)錄本定量
bash 04-dge.sh
22.篩選差異基因閾值 q-value < 0.05
23.dge基因圖片展示
bash 05-cuffPlot.sh
報(bào)錯(cuò)R包缺失there is no package called 'cummeRbund'
安裝依賴包install.packages("RSQLite")
install.packages("ggplot2")
install.packages("plyr")
install.packages("fastcluster")
install.packages("BiocManager")
BiocManager::install("rtracklayer")
BiocManager::install("Gviz")
BiocManager::install("BiocGenerics")
BiocManager::install("cummeRbund")
報(bào)錯(cuò)
library安裝未成功
apt-get install libcurl4-gnutls-dev
apt-get install libxml2-dev
apt-get install openssl
apt-get install libssl-dev
安裝成功泉沾!
bash 05-cuffPlot.sh正常運(yùn)行圖片生成正常
生成DGE目錄
24魄揉,dge基因富集分析GO KEGG
報(bào)錯(cuò) 缺失R包
1.there is no package called 'topGO'
2.there is no package called 'clusterProfiler'
3.there is no package called 'org.Hs.eg.db'
解決辦法:
BiocManager::install("topGO")
BiocManager::install("Rgraphviz")
BiocManager::install("clusterProfiler")
BiocManager::install("org.Hs.eg.db")
25.KEGG富集分析報(bào)錯(cuò)
Error in setReadable(kegg, OrgDb = org.Hs.eg.db, keytype = "ENTREZID") :
unused argument (keytype = "ENTREZID")
解決辦法:
kegg2<-setReadable(kegg, OrgDb = org.Hs.eg.db, keytype = "ENTREZID")
改為:
kegg2<-setReadable(kegg, OrgDb = org.Hs.eg.db, keyType = "ENTREZID")
26.鏡像導(dǎo)出入導(dǎo)入
整個(gè)流程測試完成后鏡像name為rna:1.0
使用save命令將鏡像打包成tar文件,cp到其他服務(wù)器用load命令導(dǎo)入金赦,完成流程部署
鏡像導(dǎo)入: docker save -o rna_seq.tar rnaseq:v2.0
鏡像導(dǎo)出:docker load -i rna_seq.tar
27.融合基因分析
數(shù)據(jù)庫下載
[https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/__genome_libs_StarFv1.8/GRCh37_gencode_v19_CTAT_lib_Oct012019.plug-n-play.tar.gz](https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/__genome_libs_StarFv1.8/GRCh37_gencode_v19_CTAT_lib_Oct012019.plug-n-play.tar.gz)
選擇版本下載
bash 06-fusionDetc.sh 報(bào)錯(cuò)
Can't locate Set/IntervalTree.pm in @INC (you may need to install the Set::IntervalTree module)
安裝缺少perl模塊
install Set::IntervalTree
再次報(bào)錯(cuò)缺少模塊
install JSON::XS
再次報(bào)錯(cuò)
Argument "brkpt_donorA" isn't numeric in preincrement (++) at /home/ubuntu/SSD/software/star-fusion/lib/STAR-Fusion/util/STAR-Fusion.map_chimeric_reads_to_genes line 379, <$fh> line 1.
Can't use an undefined value as an ARRAY reference at /home/ubuntu/SSD/software/star-fusion/lib/STAR-Fusion/util/STAR-Fusion.map_chimeric_reads_to_genes line 511, <$fh> line 1.
谷歌顯示版本未兼容
解決辦法:
軟件版本:star-fusion-1.6.0-0.tar.bz2
STAR-2.7.3a
數(shù)據(jù)庫版本
重新下載star-fusion-1.9.0軟件 STAR暫時(shí)不換 數(shù)據(jù)庫版本符合
下載地址:
[https://github.com/STAR-Fusion/STAR-Fusion/releases/download/STAR-Fusion-v1.9.0/STAR-Fusion.v1.9.0.FULL.tar.gz](https://github.com/STAR-Fusion/STAR-Fusion/releases/download/STAR-Fusion-v1.9.0/STAR-Fusion.v1.9.0.FULL.tar.gz)
不用編譯
再次運(yùn)行腳本
* STAR-Fusion complete. See output: /data/RNA/RNA_test/nCov9/FusionDetc/star-fusion.fusion_predictions.tsv (or .abridged.tsv version)
![image](https://upload-images.jianshu.io/upload_images/14081483-7b8c71155a26459c.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)
進(jìn)行進(jìn)度json文件生成
思路:
每運(yùn)行一個(gè)模塊音瓷,生成該模塊的結(jié)果json,并判斷是否出錯(cuò)夹抗,通過err = 0控制
通過config文件傳參绳慎,得到樣本,任務(wù)ID漠烧,模塊名稱杏愤,索引等信息,
新建固定字典已脓,通過上面json文件里的值更新字典珊楼,并生成結(jié)果json文件
運(yùn)行腳本例子:
python star_pipline.py -i /data/analysis-dir/config.json
修改流程
1.新建run_rna_seq.py 整理json 并生成job配置文件,運(yùn)行pipeline.sh
2.修改pipeline.sh 添加step.py 目的流程中增加進(jìn)度反饋度液,
每運(yùn)行一個(gè)bash腳本厕宗,判斷err,并進(jìn)行下一步bash
3.發(fā)送QC報(bào)告與結(jié)果報(bào)告
run_rna_seq.py 總運(yùn)行腳本--目的 生成最終json文件 并運(yùn)行bash模塊
sendMessage.sh 發(fā)送信息腳本里調(diào)用sh為另一個(gè)bash腳本
step.py 判斷出錯(cuò)
發(fā)送QC報(bào)告
將模板QC_json文件填充堕担,然后發(fā)送
發(fā)送結(jié)題報(bào)告
流程測試問題:
1.中文顯示問題:導(dǎo)致發(fā)送進(jìn)度日志報(bào)錯(cuò)
export LANG="C.UTF-8"
source /etc/profile
使用Dockerfile 生成鏡像
docker build -t second:v1.0 .
然后集群啟動(dòng)bash腳本 通過-v 掛載目錄 運(yùn)行成功已慢!