一個超簡單的轉(zhuǎn)錄組項目全過程--iMac+RNA-Seq（四）featureCounts

好久沒更新啦责静，因為最近沒有在跑項目哦袁滥，所以就擱置了，最近要做一兩張圖泰演，所以把前面寫的教程先發(fā)布一下~

前期文章

一個超簡單的轉(zhuǎn)錄組項目全過程--iMac+RNA-Seq（一）
一個超簡單的轉(zhuǎn)錄組項目全過程--iMac+RNA-Seq（二）QC
一個超簡單的轉(zhuǎn)錄組項目全過程--iMac+RNA-Seq（三）Alignment 比對

4 featureCounts

比對結(jié)束后需要進行計數(shù)咯呻拌！

老規(guī)矩，先看說明書睦焕。

featureCounts -h ## 好像發(fā)現(xiàn)了什么不得了的
$featureCounts -h
featureCounts: invalid option -h

Version 1.6.2

Usage: featureCounts [options] -a <annotation_file> -o <output_file> input_file1 [input_file2] ...

-T number of threads

-p If specified, fragments (or templates) will be counted instead of reads. This option is only applicable for paired-end reads.

-t <string> Specify feature type in GTF annotation. 'exon' by default. Features used for read counting will be extracted from annotation using the provided value.

-g <string> Specify attribute type in GTF annotation. 'gene_id' by default. Meta-features used for read counting will be extracted from annotation using the provided value.

-a <string> Name of an annotation file. GTF/GFF format by default. See -F option for more format information. Inbuilt annotations (SAF format) is available in 'annotation' directory of the package. Gzipped file is also accepted.

-o <string> Name of the output file including read counts. A separate file including summary statistics of counting results is also included in the output ('<string>.summary')

用法一

## 教程代碼藐握，使用for循環(huán)
mkdir $wkd/align   #這樣會生成單個的，每個都有自己的名字
cd $wkd/align 

source activate rna 
for fn in {512..520} 
do 
featureCounts -T 4 -p -t exon -g gene_id \
-a /Users/bioinformatic/reference/hg38/hg38.gtf \
-o counts.SRR1039$fn.txt /home/rna/clean/SRR1039$fn.hisat.bam 
done 
source deactivate

用法二：酷酷的

vim count.sh ## 創(chuàng)建一個腳本叫做count

#!/bin/bash
set -e
set -u
set -o pipefail

# set PATH 設置路徑
HUMAN=/Users/bioinformatic/reference/hg38
OUTDIR=/Users/Desktop/project/clean/align

# source activate rna
cd $OUTDIR  # 打開目標文件夾

pwd  ## 顯示當前路徑

ls *bam | cut -d"_" -f 1| sort -u |\
while read id 
do   # 用echo語句來提示腳本運行進度
     echo "processing  ${id}_combined.hisat.bam" 
     if [ ! -f ${id}.hisat.done ] 
     then
     featureCounts -T 4 -p \
         -t exon -g gene_id -a $HUMAN/hg38.gtf \
         -o counts.${id}.txt \
         $OUTDIR/${id}_combined.hisat.bam && touch ${id}.hisat.done
      fi

done

# source deactivate

Note：腳本很長的時候垃喊，記得使用“\”符號加回車猾普，讓行文變得整潔易讀。

運行腳本

nohup bash count.sh &

查看結(jié)果

cat nohup.out就可以看到featureCounts的運行日志本谜，請注意以下兩個數(shù)據(jù)

26834745 (92.89%) aligned concordantly exactly 1 time
97.89% overall alignment rate

小結(jié)：本章所需知識背景和技能

（1）基因表達量的計算到底在算什么初家？

（2）TPM, RPKM, FPKM這幾個計量方式的區(qū)別

（3）學會查看幫助文檔，了解軟件用法

（4）nohup是什么，nohup和&的區(qū)別等等

用法二的知識點

在酷酷的用法二里面有幾個很酷的地方：

#!/bin/bash
set -e
set -u
set -o pipefail

# touch的用法

# if then fi 語句的使用

shell腳本和正則表達式：必學

感謝洲更

最后編輯于：2019.08.01 01:01:45

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者