好久沒更新啦责静,因為最近沒有在跑項目哦袁滥,所以就擱置了,最近要做一兩張圖泰演,所以把前面寫的教程先發(fā)布一下~
前期文章
一個超簡單的轉(zhuǎn)錄組項目全過程--iMac+RNA-Seq(一)
一個超簡單的轉(zhuǎn)錄組項目全過程--iMac+RNA-Seq(二)QC
一個超簡單的轉(zhuǎn)錄組項目全過程--iMac+RNA-Seq(三)Alignment 比對
4 featureCounts
比對結(jié)束后需要進行計數(shù)咯呻拌!
老規(guī)矩,先看說明書睦焕。
featureCounts -h ## 好像發(fā)現(xiàn)了什么不得了的
$featureCounts -h
featureCounts: invalid option -h
Version 1.6.2
Usage: featureCounts [options] -a <annotation_file> -o <output_file> input_file1 [input_file2] ...
-T number of threads
-p If specified, fragments (or templates) will be counted instead of reads. This option is only applicable for paired-end reads.
-t <string> Specify feature type in GTF annotation. 'exon' by default. Features used for read counting will be extracted from annotation using the provided value.
-g <string> Specify attribute type in GTF annotation. 'gene_id' by default. Meta-features used for read counting will be extracted from annotation using the provided value.
-a <string> Name of an annotation file. GTF/GFF format by default. See -F option for more format information. Inbuilt annotations (SAF format) is available in 'annotation' directory of the package. Gzipped file is also accepted.
-o <string> Name of the output file including read counts. A separate file including summary statistics of counting results is also included in the output ('<string>.summary')
用法一
## 教程代碼藐握,使用for循環(huán)
mkdir $wkd/align #這樣會生成單個的,每個都有自己的名字
cd $wkd/align
source activate rna
for fn in {512..520}
do
featureCounts -T 4 -p -t exon -g gene_id \
-a /Users/bioinformatic/reference/hg38/hg38.gtf \
-o counts.SRR1039$fn.txt /home/rna/clean/SRR1039$fn.hisat.bam
done
source deactivate
用法二:酷酷的
vim count.sh ## 創(chuàng)建一個腳本叫做count
#!/bin/bash
set -e
set -u
set -o pipefail
# set PATH 設置路徑
HUMAN=/Users/bioinformatic/reference/hg38
OUTDIR=/Users/Desktop/project/clean/align
# source activate rna
cd $OUTDIR # 打開目標文件夾
pwd ## 顯示當前路徑
ls *bam | cut -d"_" -f 1| sort -u |\
while read id
do # 用echo語句來提示腳本運行進度
echo "processing ${id}_combined.hisat.bam"
if [ ! -f ${id}.hisat.done ]
then
featureCounts -T 4 -p \
-t exon -g gene_id -a $HUMAN/hg38.gtf \
-o counts.${id}.txt \
$OUTDIR/${id}_combined.hisat.bam && touch ${id}.hisat.done
fi
done
# source deactivate
Note:腳本很長的時候垃喊,記得使用“\”符號加回車猾普,讓行文變得整潔易讀。
運行腳本
nohup bash count.sh &
查看結(jié)果
cat nohup.out就可以看到featureCounts的運行日志本谜,請注意以下兩個數(shù)據(jù)
26834745 (92.89%) aligned concordantly exactly 1 time
97.89% overall alignment rate
小結(jié):本章所需知識背景和技能
(1)基因表達量的計算到底在算什么初家?
(2)TPM, RPKM, FPKM這幾個計量方式的區(qū)別
(3)學會查看幫助文檔,了解軟件用法
(4)nohup是什么,nohup和&的區(qū)別等等
用法二的知識點
在酷酷的用法二里面有幾個很酷的地方:
#!/bin/bash
set -e
set -u
set -o pipefail
# touch的用法
# if then fi 語句的使用
shell腳本和正則表達式:必學
感謝洲更