文章復(fù)現(xiàn)-全外顯子數(shù)據(jù)分析學(xué)習(xí)1下載數(shù)據(jù)

教程在:腫瘤外顯子數(shù)據(jù)處理系列教程(一)讀文獻并且下載測序數(shù)據(jù) (qq.com)
點開

image.png

里面會有很多后續(xù)的分析

Reliability of Whole-Exome Sequencing for Assessing Intratumor Genetic Heterogeneity - ScienceDirect文章鏈接

image.png

image.png

image.png

數(shù)據(jù)下載

image.png

NCBI的Sequence Read Archive (SRA),每個項目的url格式都是一樣的,https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRPXXX
https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRP070662
image.png

image.png

我們只需要下載wes的
image.png

下載好了枉长,我是自己電腦上建了一個目錄D:\work\腫瘤外顯子數(shù)據(jù)分析學(xué)習(xí)

image.png
runtable

acclist

RunInfo Table: 包含了較多的信息蠢箩,可用于數(shù)據(jù)下載完成后對文件的重新命名
Accesssion List: 只有一列杆查,prefetch可以接受改文件也殖,下載列表中包含的所有樣本
下載數(shù)據(jù)使用的軟件是prefetch既绕,SRA Toolkit的套件之一缴挖,如果使用conda的話袋狞,需要安裝的軟件是sra-tools,而不是prefetch醇疼。
首先檢查一下有沒有這個軟件

11:08:26 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES
$
prefetch  -help
image.png

首先建立一個命名為wes的conda環(huán)境

## 組織項目
mkdir 0.sra log
## 安裝conda
#wget https://repo.anaconda.com/miniconda/Miniconda2-latest-Linux-x86_64.sh 
#bash Miniconda2-latest-Linux-x86_64.sh
## 使用官方鏡像
conda config --add channels conda-forge
## 創(chuàng)建一個wes環(huán)境
conda create -n wes python=2
conda info --envs
## 創(chuàng)建后需要激活該環(huán)境
source activate wes
## 所有的安裝都是建立在該環(huán)境已經(jīng)激活的前提下硕并,后續(xù)使用到的軟件,同樣需要激活后再安裝
#conda install sra-tools

這個腳本按照自己需求更改

11:15:48 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES
$
mkdir run
11:15:58 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES
$
cd run
11:16:01 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
vim creat-wes-envs.sh
11:17:26 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
sh creat-wes-envs.sh
Warning: 'conda-forge' already in 'channels' list, moving to the top
Collecting package metadata (current_repodata.json): |
Proceed ([y]/n)? y
Downloading and Extracting Packages
sqlite-3.37.1        | 1.5 MB    | ##################################### | 100%
pip-20.0.2           | 1.9 MB    | ##################################### | 100%
zlib-1.2.11          | 88 KB     | ##################################### | 100%
ncurses-6.3          | 1012 KB   | ##################################### | 100%
readline-8.1         | 295 KB    | ##################################### | 100%
python_abi-2.7       | 4 KB      | ##################################### | 100%
python-2.7.15        | 12.2 MB   | ##################################### | 100%
setuptools-36.4.0    | 557 KB    | ##################################### | 100%
libgcc-ng-11.2.0     | 906 KB    | ##################################### | 100%
ca-certificates-2021 | 139 KB    | ##################################### | 100%
libstdcxx-ng-11.2.0  | 4.2 MB    | ########################9             |  68%
…
certifi-2016.9.26    | 217 KB    | ##################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate wes
#
# To deactivate an active environment, use
#
#     $ conda deactivate

# conda environments:
#
bioinfo                  /data1/jiarongf/anaconda3/envs/bioinfo
r-reticulate             /data1/jiarongf/anaconda3/envs/r-reticulate
base                  *  /data1/jiarongf/anaconda_se/anaconda3
jupyter_notebook         /home/jiarongf/.conda/envs/jupyter_notebook
celltalk                 /home/jiarongf/my-envs/celltalk
chipseq                  /home/jiarongf/my-envs/chipseq
d2l                      /home/jiarongf/my-envs/d2l
pyscenic                 /home/jiarongf/my-envs/pyscenic
wes                      /home/jiarongf/my-envs/wes

creat-wes-envs.sh: 12: creat-wes-envs.sh: source: not found
11:22:15 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
source activate wes
(/home/jiarongf/my-envs/wes) 11:23:05 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
prefetch -help

Usage: prefetch [ options ] [ accessions(s)... ]

Parameters:

  accessions(s)                    list of accessions to process


Options:

  -T|--type <file-type>            Specify file type to download. Default: sra
  -N|--min-size <size>             Minimum file size to download in KB
                                     (inclusive).
  -X|--max-size <size>             Maximum file size to download in KB
                                     (exclusive). Default: 20G
  -f|--force <no|yes|all|ALL>      Force object download - one of: no, yes,
                                     all, ALL. no [default]: skip download if
                                     the object if found and complete; yes:
                                     download it even if it is found and is
                                     complete; all: ignore lock files (stale
                                     locks or it is being downloaded by
                                     another process - use at your own
                                     risk!); ALL: ignore lock files, restart
                                     download from beginning
  -p|--progress                    Show progress
  -r|--resume <yes|no>             Resume partial downloads - one of: no, yes
                                     [default]
  -C|--verify <yes|no>             Verify after download - one of: no, yes
                                     [default]
  -c|--check-all                   Double-check all refseqs
  -o|--output-file <file>          Write file to <file> when downloading
                                     single file
  -O|--output-directory <directory>
                                   Save files to <directory>/
     --ngc <path>                  <path> to ngc file
     --perm <path>                 <path> to permission file
     --location <location>         location in cloud
     --cart <path>                 <path> to cart file
  -V|--version                     Display the version of the program
  -v|--verbose                     Increase the verbosity of the program
                                     status messages. Use multiple times for
                                     more verbosity.
  -L|--log-level <level>           Logging level as number or enum string.
                                     One of
                                     (fatal|sys|int|err|warn|info|debug) or
                                     (0-6) Current/default is warn
     --option-file file            Read more options and parameters from the
                                     file.
  -h|--help                        print this message

"prefetch" version 2.11.0

(/home/jiarongf/my-envs/wes) 11:25:16 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$

prefetch秧荆,默認通過https下載數(shù)據(jù)倔毙,但是速度不理想。
aspera的下載速度很快乙濒,但是它不是SRA Toolkit的套件之一陕赃,不能用conda安裝,需要下載安裝腳本

wget https://d3gcli72yxqn2z.cloudfront.net/connect_latest/v4/bin/ibm-aspera-connect_4.1.3.93_linux.tar.gz
tar -zxvf ibm-aspera-connect_4.1.3.93_linux.tar.gz
bash ibm-aspera-connect_4.1.3.93_linux.sh
## 需要手動添加環(huán)境變量
export PATH='$HOME/.aspera/connect/bin:$PATH'
source ~/.bashrc
##私鑰文件位于 $HOME/.aspera/connect/etc
## 這是調(diào)用的是aspera
nohup prefetch --option-file ../data/SRR_Acc_List.txt -O ../0.sra -X 200G > ../log/0.download_sra.log 2>&1 &

自己運行


(/home/jiarongf/my-envs/wes) 11:45:36 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
vim download-aspera.sh
(/home/jiarongf/my-envs/wes) 11:52:32 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
cat download-aspera.sh
wget https://d3gcli72yxqn2z.cloudfront.net/connect_latest/v4/bin/ibm-aspera-connect_4.1.3.93_linux.tar.gz
tar -zxvf ibm-aspera-connect_4.1.3.93_linux.tar.gz
bash ibm-aspera-connect_4.1.3.93_linux.sh


(/home/jiarongf/my-envs/wes) 11:50:16 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
sh download-aspera.sh 2> download-aspera.log
ibm-aspera-connect_4.1.3.93_linux.sh

Installing IBM Aspera Connect

Deploying IBM Aspera Connect (/home/jiarongf/.aspera/connect) for the current user only.

Install complete.
(/home/jiarongf/my-envs/wes) 11:51:00 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
vim  ~/.bashrc
(/home/jiarongf/my-envs/wes) 11:55:28 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run

export PATH=/data1/jiarongf/learning/cancer-WES/run/:$PATH
(/home/jiarongf/my-envs/wes) 11:56:30 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
source ~/.bashrc
11:57:27 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
source activate wes
(/home/jiarongf/my-envs/wes) 11:58:09 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
vim prefetch.sh
(/home/jiarongf/my-envs/wes) 11:59:18 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
cat prefetch.sh
nohup prefetch --option-file ../data/SRR_Acc_List.txt -O ../0.sra -X 200G > ../log/0.download_sra.log 2>&1 &

(/home/jiarongf/my-envs/wes) 11:59:27 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
sh prefetch.sh
(/home/jiarongf/my-envs/wes) 11:59:34 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$


(/home/jiarongf/my-envs/wes) 12:05:06 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/log
$
cat 0.download_sra.log

2022-03-31T04:03:25 prefetch.2.11.0: 1) Downloading 'SRR3182418'...
2022-03-31T04:03:25 prefetch.2.11.0:  Downloading via HTTPS...
(/home/jiarongf/my-envs/wes) 12:05:11 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/log
$

簡單看一下下載好的log

2022-04-02T00:09:59 prefetch.2.11.0: 49) Downloading 'SRR3182442.vdbcache'...
2022-04-02T00:09:59 prefetch.2.11.0: 49) 'SRR3182442.vdbcache' was downloaded successfully

2022-04-02T00:10:01 prefetch.2.11.0: 50) Downloading 'SRR3182443'...
2022-04-02T00:10:01 prefetch.2.11.0:  Downloading via HTTPS...
2022-04-02T00:59:08 prefetch.2.11.0:  HTTPS download succeed
2022-04-02T00:59:08 prefetch.2.11.0: 50.2) Downloading 'SRR3182443.vdbcache'...
2022-04-02T00:59:08 prefetch.2.11.0:  Downloading via HTTPS...
2022-04-02T00:59:24 prefetch.2.11.0:  HTTPS download succeed
2022-04-02T00:59:24 prefetch.2.11.0:  'SRR3182443.vdbcache' is valid
2022-04-02T00:59:24 prefetch.2.11.0: 50.2) 'SRR3182443.vdbcache' was downloaded successfully
2022-04-02T00:59:36 prefetch.2.11.0:  'SRR3182443' is valid
2022-04-02T00:59:36 prefetch.2.11.0: 50) 'SRR3182443' was downloaded successfully
2022-04-02T01:00:01 prefetch.2.11.0: 'SRR3182443' has 0 unresolved dependencies
2022-04-02T01:00:01 prefetch.2.11.0: 50) Downloading 'SRR3182443.vdbcache'...
2022-04-02T01:00:01 prefetch.2.11.0: 50) 'SRR3182443.vdbcache' was downloaded successfully

2022-04-02T01:00:03 prefetch.2.11.0: 51) Downloading 'SRR3182444'...
2022-04-02T01:00:03 prefetch.2.11.0:  Downloading via HTTPS...
2022-04-02T02:03:30 prefetch.2.11.0:  HTTPS download succeed
2022-04-02T02:03:30 prefetch.2.11.0: 51.2) Downloading 'SRR3182444.vdbcache'...
2022-04-02T02:03:30 prefetch.2.11.0:  Downloading via HTTPS...
2022-04-02T02:03:47 prefetch.2.11.0:  HTTPS download succeed
2022-04-02T02:03:47 prefetch.2.11.0:  'SRR3182444.vdbcache' is valid
2022-04-02T02:03:47 prefetch.2.11.0: 51.2) 'SRR3182444.vdbcache' was downloaded successfully
2022-04-02T02:03:58 prefetch.2.11.0:  'SRR3182444' is valid
2022-04-02T02:03:58 prefetch.2.11.0: 51) 'SRR3182444' was downloaded successfully
2022-04-02T02:04:24 prefetch.2.11.0: 'SRR3182444' has 0 unresolved dependencies
2022-04-02T02:04:24 prefetch.2.11.0: 51) Downloading 'SRR3182444.vdbcache'...
2022-04-02T02:04:24 prefetch.2.11.0: 51) 'SRR3182444.vdbcache' was downloaded successfully

2022-04-02T02:04:26 prefetch.2.11.0: 52) Downloading 'SRR3182445'...
2022-04-02T02:04:26 prefetch.2.11.0:  Downloading via HTTPS...
2022-04-02T02:56:41 prefetch.2.11.0:  HTTPS download succeed
2022-04-02T02:56:41 prefetch.2.11.0: 52.2) Downloading 'SRR3182445.vdbcache'...
2022-04-02T02:56:41 prefetch.2.11.0:  Downloading via HTTPS...
2022-04-02T02:57:04 prefetch.2.11.0:  HTTPS download succeed
2022-04-02T02:57:04 prefetch.2.11.0:  'SRR3182445.vdbcache' is valid
2022-04-02T02:57:04 prefetch.2.11.0: 52.2) 'SRR3182445.vdbcache' was downloaded successfully
2022-04-02T02:57:13 prefetch.2.11.0:  'SRR3182445' is valid
2022-04-02T02:57:13 prefetch.2.11.0: 52) 'SRR3182445' was downloaded successfully
2022-04-02T02:57:37 prefetch.2.11.0: 'SRR3182445' has 0 unresolved dependencies
2022-04-02T02:57:37 prefetch.2.11.0: 52) Downloading 'SRR3182445.vdbcache'...
2022-04-02T02:57:37 prefetch.2.11.0: 52) 'SRR3182445.vdbcache' was downloaded successfully

2022-04-02T02:57:40 prefetch.2.11.0: 53) Downloading 'SRR3182446'...
2022-04-02T02:57:40 prefetch.2.11.0:  Downloading via HTTPS...
2022-04-02T20:49:37 prefetch.2.11.0:  HTTPS download succeed
2022-04-02T20:49:37 prefetch.2.11.0: 53.2) Downloading 'SRR3182446.vdbcache'...
2022-04-02T20:49:37 prefetch.2.11.0:  Downloading via HTTPS...
2022-04-02T20:49:54 prefetch.2.11.0:  HTTPS download succeed
2022-04-02T20:49:54 prefetch.2.11.0:  'SRR3182446.vdbcache' is valid
2022-04-02T20:49:54 prefetch.2.11.0: 53.2) 'SRR3182446.vdbcache' was downloaded successfully
2022-04-02T20:50:06 prefetch.2.11.0:  'SRR3182446' is valid
2022-04-02T20:50:06 prefetch.2.11.0: 53) 'SRR3182446' was downloaded successfully
2022-04-02T20:50:31 prefetch.2.11.0: 'SRR3182446' has 0 unresolved dependencies
2022-04-02T20:50:31 prefetch.2.11.0: 53) Downloading 'SRR3182446.vdbcache'...
2022-04-02T20:50:31 prefetch.2.11.0: 53) 'SRR3182446.vdbcache' was downloaded successfully

2022-04-02T20:50:33 prefetch.2.11.0: 54) Downloading 'SRR3182447'...
2022-04-02T20:50:33 prefetch.2.11.0:  Downloading via HTTPS...
2022-04-02T21:03:02 prefetch.2.11.0:  HTTPS download succeed
2022-04-02T21:03:02 prefetch.2.11.0: 54.2) Downloading 'SRR3182447.vdbcache'...
2022-04-02T21:03:02 prefetch.2.11.0:  Downloading via HTTPS...
2022-04-02T21:03:17 prefetch.2.11.0:  HTTPS download succeed
2022-04-02T21:03:17 prefetch.2.11.0:  'SRR3182447.vdbcache' is valid
2022-04-02T21:03:17 prefetch.2.11.0: 54.2) 'SRR3182447.vdbcache' was downloaded successfully
2022-04-02T21:03:21 prefetch.2.11.0:  'SRR3182447' is valid
2022-04-02T21:03:21 prefetch.2.11.0: 54) 'SRR3182447' was downloaded successfully
2022-04-02T21:03:46 prefetch.2.11.0: 'SRR3182447' has 0 unresolved dependencies
2022-04-02T21:03:46 prefetch.2.11.0: 54) Downloading 'SRR3182447.vdbcache'...
2022-04-02T21:03:46 prefetch.2.11.0: 54) 'SRR3182447.vdbcache' was downloaded successfully

下了蠻久的
2022-03-31T04:03:25-2022-04-02T21:03:46
下了兩三天颁股,

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末么库,一起剝皮案震驚了整個濱河市,隨后出現(xiàn)的幾起案子甘有,更是在濱河造成了極大的恐慌诉儒,老刑警劉巖,帶你破解...
    沈念sama閱讀 216,402評論 6 499
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件亏掀,死亡現(xiàn)場離奇詭異忱反,居然都是意外死亡泛释,警方通過查閱死者的電腦和手機,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 92,377評論 3 392
  • 文/潘曉璐 我一進店門温算,熙熙樓的掌柜王于貴愁眉苦臉地迎上來怜校,“玉大人,你說我怎么就攤上這事注竿∏炎拢” “怎么了?”我有些...
    開封第一講書人閱讀 162,483評論 0 353
  • 文/不壞的土叔 我叫張陵巩割,是天一觀的道長裙顽。 經(jīng)常有香客問我,道長喂分,這世上最難降的妖魔是什么锦庸? 我笑而不...
    開封第一講書人閱讀 58,165評論 1 292
  • 正文 為了忘掉前任机蔗,我火速辦了婚禮蒲祈,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘萝嘁。我一直安慰自己梆掸,他們只是感情好,可當(dāng)我...
    茶點故事閱讀 67,176評論 6 388
  • 文/花漫 我一把揭開白布牙言。 她就那樣靜靜地躺著酸钦,像睡著了一般。 火紅的嫁衣襯著肌膚如雪咱枉。 梳的紋絲不亂的頭發(fā)上卑硫,一...
    開封第一講書人閱讀 51,146評論 1 297
  • 那天,我揣著相機與錄音蚕断,去河邊找鬼欢伏。 笑死,一個胖子當(dāng)著我的面吹牛亿乳,可吹牛的內(nèi)容都是我干的硝拧。 我是一名探鬼主播,決...
    沈念sama閱讀 40,032評論 3 417
  • 文/蒼蘭香墨 我猛地睜開眼葛假,長吁一口氣:“原來是場噩夢啊……” “哼障陶!你這毒婦竟也來了?” 一聲冷哼從身側(cè)響起聊训,我...
    開封第一講書人閱讀 38,896評論 0 274
  • 序言:老撾萬榮一對情侶失蹤抱究,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后带斑,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體鼓寺,經(jīng)...
    沈念sama閱讀 45,311評論 1 310
  • 正文 獨居荒郊野嶺守林人離奇死亡酿雪,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 37,536評論 2 332
  • 正文 我和宋清朗相戀三年,在試婚紗的時候發(fā)現(xiàn)自己被綠了侄刽。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片指黎。...
    茶點故事閱讀 39,696評論 1 348
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖州丹,靈堂內(nèi)的尸體忽然破棺而出醋安,到底是詐尸還是另有隱情,我是刑警寧澤墓毒,帶...
    沈念sama閱讀 35,413評論 5 343
  • 正文 年R本政府宣布吓揪,位于F島的核電站,受9級特大地震影響所计,放射性物質(zhì)發(fā)生泄漏柠辞。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點故事閱讀 41,008評論 3 325
  • 文/蒙蒙 一主胧、第九天 我趴在偏房一處隱蔽的房頂上張望叭首。 院中可真熱鬧,春花似錦踪栋、人聲如沸焙格。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,659評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽眷唉。三九已至,卻和暖如春囤官,著一層夾襖步出監(jiān)牢的瞬間冬阳,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 32,815評論 1 269
  • 我被黑心中介騙來泰國打工党饮, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留肝陪,地道東北人。 一個月前我還...
    沈念sama閱讀 47,698評論 2 368
  • 正文 我出身青樓劫谅,卻偏偏與公主長得像见坑,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子捏检,可洞房花燭夜當(dāng)晚...
    茶點故事閱讀 44,592評論 2 353

推薦閱讀更多精彩內(nèi)容