劉小澤寫于19.5.3
數(shù)據(jù)來自2018年9月的NC文章Acquired cancer resistance to combination immunotherapy from transcriptional loss of class I HLA
文章解讀在:http://www.reibang.com/p/b818e38f7e9c
實驗設計
共有兩名患者:
-
患者2586-4:
- The primary patient (2586-4) received hypofractionated radiation for HLA upregulation to some but not all disease sites
- 利用10X 3' Chromium v2.0平臺建庫 + Hiseq2500 "rapid run"模式 GSE117988
- discovery tumor部分:After sequence alignment and filtering, 7431 tumor cells (2243 cells before and 5188 cells after T cell therapy)
- discovery PBMC部分:After sequence alignment and filtering, a total of 12,874 cells were analyzed [其中包含了四個時間點:治療前(Pre)膝但,治療后早期day +27(Early),治療后反應期day+37(Resp),治療后復發(fā)+614 (AR)]
ID Description GSM3330559 Tumor Disc Pre GSM3330560 Tumor Disc AR GSM3330561 PBMC Pre GSM3330562 PBMC Disc Early GSM3330563 PBMC Disc Resp GSM3330564 PBMC Disc AR -
患者9245-3:
- The second validation patient (9245-3) is a 59-year-old man with metastatic MCC that had initially presented as stage IIIB disease, now metastatic at multiple sites
- 利用10X 5' V(D)J 進行cell washing, barcoding and library prep+ NovaSeq 6000(gene expression) + Hiseq4000 (V(D)J) GSE118056
ID Description GSM3317833 PBMC Relapse - L001 GSM3317834 PBMC Relapse - L002 GSM3317835 Tumor Relapse - L001 GSM3317836 Tumor Relapse - L002
軟件環(huán)境
原始數(shù)據(jù)一般是以SRR格式存放热某,這個文件一般都要幾個G,于是下載器首選ascp,但是直接使用ascp下載又需要配置一些參數(shù),對于新手來說幽勒,最好是能提供一個ID羽历,然后直接就下載焊虏,這個就需要用到prefetch 與 ascp的組合了
prefetch是sratools中的一個小工具,因此直接用conda下載就好
conda install -c daler sratoolkit
prefetch -h # 可以顯示幫助文檔就說明安裝成功
# 如果要下載數(shù)據(jù)比如SRR文件秕磷,直接加ID號诵闭,指定輸出目錄就好
prefetch SRRxxxxxxx -O PATH
默認情況下,prefetch是利用https方式去下載原始數(shù)據(jù),這個就像直接從網(wǎng)頁下載一樣疏尿,速度有一定的限制瘟芝。因此我們需要先安裝一款叫做"aspera"的下載工具,它是IBM旗下的商業(yè)高速文件傳輸軟件润歉,與NCBI和EBI有協(xié)作合同
wget http://download.asperasoft.com/download/sw/connect/3.7.4/aspera-connect-3.7.4.147727-linux-64.tar.gz
tar zxvf aspera-connect-3.7.4.147727-linux-64.tar.gz
#安裝
bash aspera-connect-3.7.4.147727-linux-64.sh
# 然后cd到根目錄下看看是不是存在了.aspera文件夾模狭,有的話表示安裝成功
cd && ls -a
# 將aspera軟件加入環(huán)境變量,并激活
echo 'export PATH=~/.aspera/connect/bin:$PATH' >> ~/.bashrc
source ~/.bashrc
# 最后檢查ascp是不是能用了
ascp --help
ascp安裝成功后踩衩,prefetch就會默認將下載方式從https轉(zhuǎn)移到fasp嚼鹉,說明開啟加速模式
一般ascp沒有什么問題,出問題主要是:
ascp: Failed to open TCP connection for SSH, exiting.
Session Stop (Error: Failed to open TCP connection for SSH)
# 官網(wǎng)給出的解決辦法是:https://support.asperasoft.com/hc/en-us/articles/216126918-Error-44-UDP-session-initiation-fatal-error
On many Linux systems the default firewall can be configured with iptables. You will have to allow all incoming and outgoing traffic on UDP port 33001 (or whatever your Aspera UDP port is), which you can do with the following commands:
# 使用下面這兩個命令(但需要管理員權限)
# iptables -I INPUT -p tcp --dport 33001 -j ACCEPT
# iptables -I OUTPUT -p tcp --dport 33001 -j ACCEPT
數(shù)據(jù)下載
以患者2586-4為例驱富,所有數(shù)據(jù)都存放在GEO中
打開https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE117988
(這里注意鏈接是有規(guī)律的锚赤,只需要改變最后的ID號就能獲取其他的GEO數(shù)據(jù))-
點擊SRA這里的
SRP155988
-
send to
=>Run Selector
=>Go
-
下載Accession List,然后就得到了一個文本文件褐鸥,列出了6個SRR ID號
-
下載代碼
wkd=/home/project/single-cell/MCC cd $wkd/raw # for patient 2586-4 cat >SRR_Acc_List-2586-4.txt SRR7722937 SRR7722938 SRR7722939 SRR7722940 SRR7722941 SRR7722942 cat SRR_Acc_List-2586-4.txt |while read i do prefetch $i -O `pwd` && echo "** ${i}.sra done **" done # 一般2.6G文件下載2分鐘左右
-
下載成功會有提示
2019-xxxxxxxx prefetch.2.9.1: fasp download succeed 2019-xxxxxxxx prefetch.2.9.1: 1) 'SRR7722937' was downloaded successfully 2019-xxxxxxxx prefetch.2.9.1: 'SRR7722937' has 0 unresolved dependencies ** SRR7722937.sra done **
兩個患者的十個樣本數(shù)據(jù)下載結束后發(fā)現(xiàn)线脚,SRR7722939和SRR7722942下載失敗,看了一下數(shù)據(jù)源叫榕,這兩個數(shù)據(jù)在sra-sos.public
這個位置浑侥,而不是在ncbi
于是,可以選擇另一個途徑EBI下載
- 進入官網(wǎng)https://www.ebi.ac.uk/ena 晰绎,搜索想下載的SRA號
- 選擇SRR這里[或者直接通過https://www.ebi.ac.uk/ena/data/view/SRR7722939修改ID]
-
EBI有個好處就是可以直接下載fastq格式文件(左邊方框)寓落,如果要下載sra就復制右邊紅色方框中鏈接
- 然后利用這個代碼下載
ascp -QT -l 300m -P33001 -i ~/.aspera/connect/etc/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:vol1/srr/SRR772/009/SRR7722939 ./