1.Using rice as an example
2.All scripts and rawdata can be found in '/data/dta/shared/rnaseqworkflow'(For lab members)
Before working :
- Create a root directory to store all future data
- Create a subdirectory , download reference genome data and annotations
- Use the alignment software you like to make index for genome
- Create other subdirectories to store different data such as raw data, matrix, script
code:
$ mkdir Drought_stress
$ mkdir Drought_stress/Rice && cd Drought_stress/Rice
$ mkdir data matrix homology olddata reference src_rice
$ mkdir reference/IRGSP && cd reference/IRGSP
$ wget ftp://ftp.ensemblgenomes.org/pub/release-47/plants/fasta/oryza_sativa/dna/Oryza_sativa.IRGSP-1.0.dna.toplevel.fa.gz
$ wget ftp://ftp.ensemblgenomes.org/pub/release-47/plants/gtf/oryza_sativa/Oryza_sativa.IRGSP-1.0.47.gtf.gz
$ wget ftp://ftp.ensemblgenomes.org/pub/release-47/plants/gff3/oryza_sativa/Oryza_sativa.IRGSP-1.0.47.gff3.gz
$ gunzip *.gz
$ module load Anaconda3 hisat2
$ mkdir hsindex
$ hisat2-build -p 8 Oryza_sativa.IRGSP-1.0.dna.toplevel.fa hsindex/IRGSP
$ module unload Anaconda3 hisat2
Workflow:
1-3 :Run on the server. 4-7:Run on personal computer. 8-9:Run on the server
- Find bioprojects according to drought, roots and other conditions
- Make a samplelist.txt and save the sra number to be downloaded under data subdirectory
- command : nohup sh RNAseq_workflow.sh &
code:
$ cd ~/Drought_stress/Rice/data
$ vim samplelist.txt # Then Enter the sra number we want to download
$ cd ../src_rice
$ nohup sh RNAseq_workflow.sh & # This script can be found in the attachment
- Send count files to the local for downstream analysis(The R version of the server is too high to support the R package “biomRt”)
(We can use scp command or FileZilla software to transfer files between local and server )
- Build an R project and use DESeq2 and biomaRt for diff analysis and annotation in Rstudio locally
- Run the following R scripts in sequence :downstream.R > Deseq2analysis.R > merge_desingn.R (Whole project can be found in the attachment named Rice4.zip)
- Send the diff gene table and gene count table to the server肢预,Put them in the '~/Drought_stress/Rice/homology' directory
- Go to src_rice subdirectory
- Run related scripts
code:
$ cd ~/Drought_stress/Rice/src_rice
$ nohup sh anno.sh &
$ nohup sh merge.sh &
#Scripts can be found in the attachment
# the Rice.anno.txt can be found in the attachment
# the head.txt is a Colname for the final output table which was edit and bind from the colname of those raw files we used.
Attention:
If you have any suggestions or comments, please contact the author via xuyp8121@mail.ustc.edu.cn
We have been looking forward to friends who have the same interests in systems biology and comparative biology T春L颉!