Protocols for RNA-seq data analysis and obtain target genes

1.Using rice as an example

2.All scripts and rawdata can be found in '/data/dta/shared/rnaseqworkflow'（For lab members）

Before working :

Create a root directory to store all future data
Create a subdirectory , download reference genome data and annotations
Use the alignment software you like to make index for genome
Create other subdirectories to store different data such as raw data, matrix, script

code:

$ mkdir Drought_stress
$ mkdir Drought_stress/Rice  && cd Drought_stress/Rice
$ mkdir data matrix homology olddata reference src_rice
$ mkdir reference/IRGSP && cd  reference/IRGSP
$ wget ftp://ftp.ensemblgenomes.org/pub/release-47/plants/fasta/oryza_sativa/dna/Oryza_sativa.IRGSP-1.0.dna.toplevel.fa.gz
$ wget ftp://ftp.ensemblgenomes.org/pub/release-47/plants/gtf/oryza_sativa/Oryza_sativa.IRGSP-1.0.47.gtf.gz
$ wget ftp://ftp.ensemblgenomes.org/pub/release-47/plants/gff3/oryza_sativa/Oryza_sativa.IRGSP-1.0.47.gff3.gz
$ gunzip *.gz
$ module load Anaconda3 hisat2
$ mkdir hsindex
$ hisat2-build -p 8 Oryza_sativa.IRGSP-1.0.dna.toplevel.fa hsindex/IRGSP
$ module unload Anaconda3 hisat2

Workflow:

1-3 ：Run on the server. 4-7：Run on personal computer. 8-9：Run on the server

Find bioprojects according to drought, roots and other conditions
Make a samplelist.txt and save the sra number to be downloaded under data subdirectory
command ： nohup sh RNAseq_workflow.sh &

code:

$ cd ~/Drought_stress/Rice/data
$ vim samplelist.txt  # Then Enter the sra number we want to download
$ cd ../src_rice
$ nohup sh RNAseq_workflow.sh &  # This script can be found in the attachment

Send count files to the local for downstream analysis（The R version of the server is too high to support the R package “biomRt”）
(We can use scp command or FileZilla software to transfer files between local and server )
Build an R project and use DESeq2 and biomaRt for diff analysis and annotation in Rstudio locally
Run the following R scripts in sequence ：downstream.R > Deseq2analysis.R > merge_desingn.R （Whole project can be found in the attachment named Rice4.zip）
Send the diff gene table and gene count table to the server肢预，Put them in the '~/Drought_stress/Rice/homology' directory

Go to src_rice subdirectory
Run related scripts

code:

$ cd ~/Drought_stress/Rice/src_rice
$ nohup sh anno.sh &
$ nohup sh merge.sh &
#Scripts can be found in the attachment
# the Rice.anno.txt can be found in the attachment
# the head.txt is a Colname for the final output table which was edit  and bind  from the colname of those raw files we used.

Attention：

If you have any suggestions or comments, please contact the author via xuyp8121@mail.ustc.edu.cn

We have been looking forward to friends who have the same interests in systems biology and comparative biology Ｔ春Ｌ颉！

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者