昨天老師發(fā)給我一篇生信女神Shirley Liu的文章薛耻,看了里面的內容之后感覺很興奮~它可以不做免疫組測序帕涌,直接從Bulk RNA-seq或者scRNA-seq里面重構得到免疫組的信息续挟。
中文翻譯
文章要點
- Although less sensitive than TCR-seq and BCR-seq, TRUST is able to identify the abundantly expressed and potentially more clonally expanded TCRs/BCRs in the RNA-seq data that are more likely to be involved in antigen binding
- Recent years have also seen other computational methods introduced for immune repertoire construction from RNA-seq data, including V’DJer, MiXCR, CATT and ImRep. These methods focus on reconstruction of complementary-determining region3 (CDR3), with limited ability to assemble full-length V(D)J receptor sequences, although CDR1 and CDR2 on the V sequence still contribute considerably to anti- gen recognition and binding.
TRUST4和其他重構算法相比,它的特點:
- 可利用FASTQ或BAM文件
- 可重構更長链方,甚至全長的TCR或BCR序列
- 更快更敏感
雖然TRUST4也可以從單細胞數(shù)據(jù)中重構,今天我主要想試一試從Bulk中重構
1. 安裝
git clone https://github.com/liulab-dfci/TRUST4.git
make
#我想添加環(huán)境變量修陡,但不知道問什么總是失敗
#所以決定再目標文件夾對run-trust4文件創(chuàng)建軟鏈接
ln -s /home/user/myh/install/TRUST4/run-trust4 /home/user/myh/**/TRUST4_outs
cd /home/user/myh/**/TRUST4_outs
./run-trust4
#可以使用
2.用法
官方Usage
Usage: ./run-trust4 [OPTIONS]
Required:
-b STRING: path to bam file
-1 STRING -2 STRING: path to paired-end read files
-u STRING: path to single-end read file
-f STRING: path to the fasta file coordinate and sequence of V/D/J/C genes
Optional:
--ref STRING: path to detailed V/D/J/C gene reference file, such as from IMGT database. (default: not used). (recommended)
-o STRING: prefix of output files. (default: inferred from file prefix)
--od STRING: the directory for output files. (default: ./)
-t INT: number of threads (default: 1)
--barcode STRING: if -b, bam field for barcode; if -1 -2/-u, file containing barcodes (defaul: not used)
--barcodeRange INT INT CHAR: start, end(-1 for lenght-1), strand in a barcode is the true barcode (default: 0 -1 +)
--barcodeWhitelist STRING: path to the barcode whitelist (default: not used)
--read1Range INT INT: start, end(-1 for length-1) in -1/-u files for genomic sequence (default: 0 -1)
--read2Range INT INT: start, end(-1 for length-1) in -2 files for genomic sequence (default: 0 -1)
--UMI STRING: if -b, bam field for UMI; if -1 -2/-u, file containing UMIs (default: not used)
--umiRange INT INT CHAR: start, end(-1 for lenght-1), strand in a UMI is the true UMI (default: 0 -1 +)
--mateIdSuffixLen INT: the suffix length in read id for mate. (default: not used)
--skipMateExtension: do not extend assemblies with mate information, useful for SMART-seq (default: not used)
--abnormalUnmapFlag: the flag in BAM for the unmapped read-pair is nonconcordant (default: not set)
--noExtraction: directly use the files from provided -1 -2/-u to assemble (default: extraction first)
--repseq: the data is from TCR-seq or BCR-seq (default: not set)
--outputReadAssignment: output read assignment results to the prefix_assign.out file (default: no output)
--stage INT: start TRUST4 on specified stage (default: 0)
0: start from beginning (candidate read extraction)
1: start from assembly
2: start from annotation
3: start from generating the report table
我的數(shù)據(jù)是小鼠的數(shù)據(jù)沧侥,先用一個Fastq文件試一試
./run-trust4 -f /home/user/myh/install/TRUST4/mouse/GRCm38_bcrtcr.fa --ref /home/user/myh/install/TRUST4/mouse/mouse_IMGT+C.fa -1 /home/user/myh/raw_data/AEKIBULK/inputs/clean_data/KI_T/KIT11_1.clean.fq.gz -2 /home/user/myh/raw_data/AEKIBULK/inputs/clean_data/KI_T/KIT11_2.clean.fq.gz -o KIT11
可以通過-t
調節(jié)可用的線程數(shù)
因為我的數(shù)據(jù)里面是分選了T細胞和B細胞的可霎,但我用T細胞的數(shù)據(jù)跑也能重構到BCR的結果,Emmm
注意一下TRUST4跑完是不會主動生成文件夾的宴杀,所有的結果都散在那里……
XX_report.tsv里面有如下信息:可直接用于immunarch
還會生成airr文件癣朗,也可用于immunarch分析
- "airr" - adaptive immune receptor repertoire (AIRR) data format. http://docs.airr-community.org/en/latest/datarep/overview.html
對于T細胞的結果,我把BCR鏈刪掉后旺罢,用immunarch進行后續(xù)分析
補充一點關于用VDJtools分析的內容
下載好VDJtools后
參考
1.Basic analysis
1.1 CalcBasicStats
java -jar /home/user/myh/install/VDJtools/vdjtools-1.2.1/vdjtools-1.2.1.jar CalcBasicStats -m /home/user/myh/raw_data/AEKIBULK/vdjtools/inputs/metadata.txt /home/user/myh/raw_data/AEKIBULK/vdjtools/outs
# /path to vdjtools/: vdjtolls的安裝路徑
#output_prefix: 輸出路徑
VDJtools的格式
注意在CDR3aa里面旷余,要刪除out_of_frame的內容,不然vdjtools無法識別
1.2 CalcSegmentUsage
java -jar /home/user/myh/install/VDJtools/vdjtools-1.2.1/vdjtools-1.2.1.jar CalcSegmentUsage -p -f "group" -m /home/user/myh/raw_data/AEKIBULK/vdjtools/inputs/metadata.txt /home/user/myh/raw_data/AEKIBULK/vdjtools/outs
#-p : 畫圖扁达,依賴于R包
#-f : 指定分組依據(jù),分組信息在metadata文件中
#--plot-type png 輸出png圖片
1.3 CalcSpectratype
Calculates spectratype, that is, histogram of read counts by CDR3 nucleotide length.
java -jar /home/user/myh/install/VDJtools/vdjtools-1.2.1/vdjtools-1.2.1.jar CalcSpectratype -a -m /home/user/myh/raw_data/AEKIBULK/vdjtools/inputs/metadata.txt /home/user/myh/raw_data/AEKIBULK/vdjtools/outs
#-a :Will use CDR3 amino acid sequences for calculation instead of nucleotide ones
1.4 PlotFancySpectratype
Plots a spectratype that also displays CDR3 lengths for top N clonotypes in a given sample.This plot allows to detect the highly-expanded clonotypes.
java -jar /home/user/myh/install/VDJtools/vdjtools-1.2.1/vdjtools-1.2.1.jar PlotFancySpectratype -t 5 /home/user/myh/raw_data/AEKIBULK/vdjtools/inputs/AE_T_5.txt /home/user/myh/raw_data/AEKIBULK/vdjtools/outs
#-t:Number of top clonotypes to visualize. Should not exceed 20, default is 10
#單一樣本
下面這個不知道為啥沒跑出來
java -jar /home/user/myh/install/VDJtools/vdjtools-1.2.1/vdjtools-1.2.1.jar CalcPairwiseDistances -p -m /home/user/myh/raw_data/AEKIBULK/vdjtools/inputs/metadata.txt /home/user/myh/raw_data/AEKIBULK/vdjtools/outs
#-p: plot
如果要看單細胞的數(shù)據(jù):
./run-trust4 -b /home/user/myh/raw_data/***/possorted_genome_bam.bam -f /home/user/myh/install/TRUST4/human/hg38_bcrtcr.fa --ref /home/user/myh/install/TRUST4/human/human_IMGT+C.fa --barcode CB -o XXX