導(dǎo)讀
宏轉(zhuǎn)錄組數(shù)據(jù)分析第一步。
一、KneadData下載逆屡、安裝、質(zhì)控
鏈接:https://bitbucket.org/biobakery/kneaddata/wiki/Home#markdown-header-installation
wget -c https://files.pythonhosted.org/packages/a4/6a/4176eee7a83b80ac12ca6727df6cb9dd3fec2051cca8a707ccbebc5962d3/kneaddata-0.7.3.tar.gz
# 下載
tar -zxvf kneaddata-0.7.3.tar.gz
# 解壓
rm kneaddata-0.7.3.tar.gz
# 刪除安裝包
python setup.py install
# 安裝依賴Trimmomatic和Bowtie2踱讨。使用"--bypass-dependencies-install"可取消魏蔗。
for i in `ls 1.rawdata`; do
/[route]/kneaddata-0.7.3/kneaddata/knead_data.py \
-i 1.rawdata/$i/${i}_1.fq.gz \
-i 1.rawdata/$i/${i}_2.fq.gz \
-o result/qc/kneaddata \
-db /[route]/Databases/hg38 \
--trimmomatic /[route]/Trimmomatic-0.39 \
-t $threads \
--trimmomatic-options "SLIDINGWINDOW:4:20 MINLEN:50" \
--bowtie2-options "--very-sensitive --dovetail --al-gz" \
--remove-intermediate-output \
--run-fastqc-start \
--run-fastqc-end
done
# 質(zhì)控、去宿主
二痹筛、SortMeRNA安裝
鏈接:https://bioinfo.lifl.fr/RNA/sortmerna/
1 apt安裝
SortMeRNA version 2.0, 29/11/2014
一鍵下載莺治、安裝、apt安裝版本低不推薦
sudo apt install sortmerna
sortmerna --version
2 conda安裝
conda create -n python3.6 python=3.6
conda activate python3.6
conda config --show channels
conda install sortmerna
sortmerna --version
# SortMeRNA version 4.2.0
# Build Date: Mar 12 2020
三帚稠、建庫(kù)谣旁、去rRNA(only for version2)
version4以上直接用fasta序列,無(wú)需建庫(kù)滋早,非常的奈斯
wget -c http://bioinfo.lifl.fr/RNA/sortmerna/code/sortmerna-2.1-linux-64-multithread.tar.gz
# 在linux沒find到數(shù)據(jù)庫(kù)榄审,所以再下載一次,里面包含需要的數(shù)據(jù)庫(kù)杆麸。
indexdb_rna --ref \
./rRNA_databases/silva-bac-16s-id90.fasta,./index/silva-bac-16s-db:\
./rRNA_databases/silva-bac-23s-id98.fasta,./index/silva-bac-23s-db:\
./rRNA_databases/silva-arc-16s-id95.fasta,./index/silva-arc-16s-db:\
./rRNA_databases/silva-arc-23s-id98.fasta,./index/silva-arc-23s-db:\
./rRNA_databases/silva-euk-18s-id95.fasta,./index/silva-euk-18s-db:\
./rRNA_databases/silva-euk-28s-id98.fasta,./index/silva-euk-28s:\
./rRNA_databases/rfam-5s-database-id98.fasta,./index/rfam-5s-db:\
./rRNA_databases/rfam-5.8s-database-id98.fasta,./index/rfam-5.8s-db
# 建索引
# version 2
for i in /[route]/*kneaddata_paired_[12].fastq; do
base=${i##*/}
head=${base%%_*}
tail=${base#*kneaddata_}
sortmerna --ref /[route]/sortmerna-2.1b/rRNA_databases/silva-bac-16s-id90.fasta,/[route]/sortmerna-2.1b/index/silva-bac-16s-db:\
/[route]/sortmerna-2.1b/rRNA_databases/silva-bac-23s-id98.fasta,/[route]/sortmerna-2.1b/index/silva-bac-23s-db:\
/[route]/sortmerna-2.1b/rRNA_databases/silva-arc-16s-id95.fasta,/[route]/sortmerna-2.1b/index/silva-arc-16s-db:\
/[route]/sortmerna-2.1b/rRNA_databases/silva-arc-23s-id98.fasta,/[route]/sortmerna-2.1b/index/silva-arc-23s-db:\
/[route]/sortmerna-2.1b/rRNA_databases/silva-euk-18s-id95.fasta,/[route]/sortmerna-2.1b/index/silva-euk-18s-db:\
/[route]/sortmerna-2.1b/rRNA_databases/silva-euk-28s-id98.fasta,/[route]/sortmerna-2.1b/index/silva-euk-28s:\
/[route]/sortmerna-2.1b/rRNA_databases/rfam-5s-database-id98.fasta,/[route]/sortmerna-2.1b/index/rfam-5s-db:\
/[route]/sortmerna-2.1b/rRNA_databases/rfam-5.8s-database-id98.fasta,/[route]/sortmerna-2.1b/index/rfam-5.8s-db \
--reads $i \
--aligned result/qc/sortmerna/${head}_${tail}.rRNA \
--sam --num_alignments 1 --fastx -a $threads \
--other result/qc/sortmerna/${head}_${tail}.non.rRNA --log -v
done
四搁进、去rRNA(for newest version4)
# version4
route_db="/home/cheng/huty/softwares/sortmerna-2.1b/rRNA_databases"
route_index="/home/cheng/huty/softwares/sortmerna-2.1b/index"
threads=52
mkdir non_rrna
sortmerna \
--ref $route_db/silva-bac-16s-id90.fasta \
$route_db/silva-bac-23s-id98.fasta \
$route_db/silva-arc-16s-id95.fasta \
$route_db/silva-arc-23s-id98.fasta \
$route_db/silva-euk-18s-id95.fasta \
$route_db/silva-euk-28s-id98.fasta \
$route_db/rfam-5s-database-id98.fasta \
$route_db/rfam-5.8s-database-id98.fasta \
--reads CONT1_R1.fastq \
--reads CONT1_R2.fastq \
--fastx \
--paired_out \
--threads $threads \
-v \
--out2 \
--workdir run \
--other non_rrna/cont1
rm -r run # 刪除中間文件
# 對(duì)kneaddata的結(jié)果進(jìn)行去RNA
參數(shù)
--fastx 【布爾】輸出fastq
--paired_out 【布爾】輸出配對(duì)結(jié)果
-v 【布爾】不羅嗦
--out2 【布爾】輸出結(jié)果文件為兩個(gè)
--workdir dir 中間文件輸出目錄
--other dir/prefix 輸出目錄和前綴
五、結(jié)果(for newest version4)
# 輸入文件
-rw-r--r-- 1 bayegy WST 13911342438 8月 4 11:00 CONT1_R1.fastq
-rw-r--r-- 1 bayegy WST 13911342438 8月 4 10:58 CONT1_R2.fastq
# 輸出文件
tree non_rrna/
non_rrna/
├── cont1_fwd.fastq
└── cont1_rev.fastq
-rw-rw-r-- 1 cheng WST 13903074018 8月 27 18:24 cont1_fwd.fastq
-rw-rw-r-- 1 cheng WST 13903074018 8月 27 18:24 cont1_rev.fastq
相關(guān)閱讀:宏轉(zhuǎn)錄組分析:SortMeRNA鑒定過濾rRNA