介紹
SRA Toolkit是下載NCBI SRA數(shù)據(jù)庫文件的下載和轉(zhuǎn)換為fastq的工具
SRA Toolkit安裝
首先進(jìn)入官網(wǎng)下載對應(yīng)版本的SRA Toolkit:Download : Software : Sequence Read Archive : NCBI/NLM/NIH
安裝過程非常簡單:
cd /local/txm/software
wget https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/2.11.0/sratoolkit.2.11.0-centos_linux64.tar.gz
tar -zvxf sratoolkit.2.11.0-centos_linux64.tar.gz
vi ~/.bashrc
export PATH=$PATH:/local/txm/software/sratoolkit.2.11.0-centos_linux64/bin
添加環(huán)境變量之后,輸入命令prefetch,提示:
This sra toolkit installation has not been configured.
Before continuing, please run: vdb-config --interactive
For more information, see https://www.ncbi.nlm.nih.gov/sra/docs/sra-cloud/
查看官方指南https://github.com/ncbi/sra-tools/wiki/02.-Installing-SRA-Toolkit德崭,發(fā)現(xiàn)還要進(jìn)行Configuration,終端繼續(xù)輸入:
vdb-config -i
出現(xiàn)如下界面东且,需要設(shè)置一個空目錄作為文件下載位置
SRA Toolkit使用
下載sra文件
首先進(jìn)入GEO數(shù)據(jù)庫某個GSE的主界面,如果有原始數(shù)據(jù)提供本讥,會在最下方有SRA Run Selector按鈕,點(diǎn)擊進(jìn)入后鲁冯,即可看到相關(guān)的SRR文件拷沸。
下載單個SRR文件:
prefetch SRR********
下載多個SRR文件需要先下載Accession List,然后:
prefetch --option-file SRR_Acc_List.txt
如果提示:
'SRRXXXX' (316GB) is larger than maximum allowed: skipped
添加--max-size參數(shù)即可
prefetch --max-size 999999999999 SRR6367155
sra轉(zhuǎn)化成fastq格式
單端測序(SINGLE)數(shù)據(jù):
fastq-dump SRR2061752.sra
雙端測序(PAIRED)數(shù)據(jù):
fasterq-dump --split-files SRR2061752.sra
參考:
http://www.reibang.com/p/26f6083f0e7f
http://www.reibang.com/p/88b2852d4573
https://www.biostars.org/p/334930/
https://zhuanlan.zhihu.com/p/577913028
https://blog.csdn.net/weixin_43745169/article/details/93311701