學(xué)習(xí)完snakemake后寫(xiě)的第一個(gè)流程是RNA-seq上游定量和下游的質(zhì)控和差異分析处面。
使用fastp處理fastq文件骤肛,在使用START比對(duì)到基因組同時(shí)得到raw count,使用非冗余外顯子長(zhǎng)度作為基因的長(zhǎng)度計(jì)算FPKM琼讽、TPM哗戈,同時(shí)也生成了CPM的結(jié)果荒椭。
非冗余外顯子長(zhǎng)度計(jì)算可以參考之前的推文轉(zhuǎn)錄組實(shí)戰(zhàn)02: 計(jì)算非冗余外顯子長(zhǎng)度之和
對(duì)定量結(jié)果質(zhì)控使用生信技能樹(shù)的三張圖(PCA、樹(shù)狀圖笔链、熱圖)段只。
使用python版的DEseq2對(duì)組間做差異分析(火山圖和MA圖)。
流程代碼在https://jihulab.com/BioQuest/SnakeMake-RNA-seq或https://github.com/BioQuestX/SnakeMake-RNA-seq
A SnakeMake workflow for Bulk RNA-seq
Reads were mapped onto ensembl genome with STAR, and adapters were removed with fastp.
For nomalisztion, gtftools was used to calculate gene_length and bioninfokit was used to give TPM, FPKM and CPM results.
For quality control, PCA plot, dendrogram plot and heatmap were used to show differences among samples or groups.
PyDESeq2 was used to perform differential expression anlysis.
General settings
To configure this workflow, modifyconfig/config.yamlaccording to your needs, following the explanations provided in the file.
Sample sheet
Add samples toconfig/samples.tsv. Only the columnSampleis mandatory, but any additional columns can be added.
For each sample, add one or more sequencing units (runs, lanes or replicates) to theUnitcolumn ofconfig/samples.tsv.
For each sample, defineGroupcolumn(experimental or clinical attribute).
Report