目錄
1.Module 1 - Introduction to RNA sequencing
- Installation
- Reference Genomes
- Annotations
- Indexing
- RNA-seq Data
- Pre-Alignment QC
2.Module 2 - RNA-seq Alignment and Visualization
- Adapter Trim
- Alignment
- IGV
- Alignment Visualization
- Alignment QC
3.Module 3 - Expression and Differential Expression
- Expression
- Differential Expression
- DE Visualization
- Kallisto for Reference-Free Abundance Estimation
4.Module 4 - Isoform Discovery and Alternative Expression
- Reference Guided Transcript Assembly
- de novo Transcript Assembly
- Transcript Assembly Merge
- Differential Splicing
- Splicing Visualization
5.Module 5 - De novo transcript reconstruction
- De novo RNA-Seq Assembly and Analysis Using Trinity
6.Module 6 - Functional Annotation of Transcripts
- Functional Annotation of Assembled Transcripts Using Trinotate
2.3 IGV
1.introduction
Description of the lab
高通量測序最受歡迎的工具-IGV(Integrative Genomics Viewer)
伴隨本教程的文件
完成本次教程可實現(xiàn)以下工作
可視化各種基因組數(shù)據(jù)
快速導航基因組
可視化reads比對情況
肉眼驗證SNP/SNV
Requirements
Ability to run Java
Note that while most tutorials in this course are performed on the cloud, IGV will always be run on your local machine
Compatibility
本教程是為IGV v2.3準備的烤芦,可以在IGV下載頁面上找到。強烈建議使用這個版本析校。
Data Set for IGV
使用公開的來自HCC1143細胞系的Illumina序列數(shù)據(jù)构罗。HCC1143細胞系是從一名患有乳腺癌的52歲白人婦女體內(nèi)產(chǎn)生的。這個細胞系的附加信息可以在這里找到:HCC1143(tumor, TNM stage IIA, grade 3, primary ductal carcinoma)以及HCC1143/BL(matched normal EBV transformed lymphoblast cell line).
從細胞系HCC1143產(chǎn)生的reads比對到這個區(qū)域
Chromosome 21: 19,000,000-20,000,000
2. Getting familiar with IGV
Get familiar with the interface
載入一個基因組:
默認情況下智玻,IGV加載Human hg19遂唧。如果你研究的是另一個版本的人類基因組,或者另一種物種吊奢,你可以通過點擊左上角的下拉菜單來改變基因組盖彭。在這個教程中,我們將使用人類hg19。
也可以采用以下方式(File -> Load from Server...
):
- Ensembl genes (or your favourite source of gene annotations)
- GC Percentage
- dbSNP 1.3.1 or 1.3.7
Navigation:
在這個參考基因組中可以看到染色體列表召边,選擇1號染色體铺呵。
location字段(在界面的左上角)中輸入,導航到chr1:10 000- 11000隧熙,然后單擊Go片挂。這顯示了1號染色體的窗口寬1000個堿基對,從10000號位置開始贞盯。
IGV以顏色序列的形式顯示基因組中的堿基序列(例如A=綠色宴卖,C =藍色,等等)邻悬。這使得重復序列,比如在這個區(qū)域開始處發(fā)現(xiàn)的那些序列随闽,很容易識別父丰。放大一點使用+按鈕看到參考基因組序列的單個堿基。
你可以在基因組坐標所在的框中輸入你感興趣的基因掘宪,然后按Enter/Return鍵蛾扇。試試你最喜歡的基因,或者BRCA1魏滚。
基因用線和框表示镀首。線代表內(nèi)含子區(qū)域,框代表外顯子區(qū)域鼠次。箭頭表示該基因的轉(zhuǎn)錄方向/鏈更哄。當一個外顯子框變窄,這表示一個UTR腥寇。
Region Lists
有時成翩,保存當前位置或加載感興趣的區(qū)域真的很有用。為此赦役,IGV中有一個區(qū)域?qū)Ш狡髀榈小RL問它,單擊Regions > Region Navigator掂摔。在瀏覽基因組時术羔,可以隨時按Add按鈕保存一些書簽。
Loading Read Alignments
我們將使用乳腺癌細胞系HCC1143來可視化比對結(jié)果乙漓。在速度方面级历,只有一小部分chr21將裝載(19M:20M)。
HCC1143 Alignments to hg19:
復制文件到你的本地簇秒,并在IGV中選擇File > Load from File...
鱼喉,選擇bam文件,并單擊OK。注意扛禽,為了讓IGV正確地加載它們锋边,bam文件和索引文件必須在同一個目錄中。
Visualizing read alignments
選擇染色體位點:chr21:19,480,041-19,480,386
To start our exploration, right click on the track-name, and select the following options:
- Sort alignments by
start location
- Group alignments by
pair orientation
通過右鍵點擊比對界面和切換選項來試驗各種設(shè)置编曼。想想哪一種方法最適合特定的任務(wù)(例如豆巨,質(zhì)量控制、SNP調(diào)用掐场、CNV查找)往扔。
3.Inspecting SNPs, SNVs, and SVs
Two neighbouring SNPs
- Navigate to region
chr21:19,479,237-19,479,814
- Note two heterozygous variants, one corresponds to a known dbSNP (
G/T
on the right) the other does not (C/T
on the left) - Zoom in and center on the
C/T
SNV on the left, sort by base (windowchr21:19,479,321
is the SNV position) - Sort alignments by
base
- Color alignments by
read strand
Homopolymer region with indel
Navigate to position chr21:19,518,412-19,518,497
Coverage by GC
Navigate to position chr21:19,611,925-19,631,555
. Note that the range contains areas where coverage drops to zero in a few places.
**Example **
- Use
Collapsed
view - Use
Color alignments by
->insert size and pair orientation
- Load GC track
- See concordance of coverage with GC content
Heterozygous SNPs on different alleles
Navigate to region chr21:19,666,833-19,667,007
**Example **
- Sort by base (at position
chr21:19,666,901
)
對于這兩個snp,等位基因之間沒有聯(lián)系熊户,因為兩個snp的reads都只包含一個或另一個
4.Automating Tasks in IGV
我們可以使用Tools菜單調(diào)用運行批處理腳本萍膛。IGV網(wǎng)站描述了批處理腳本:
Batch file requirements: https://www.broadinstitute.org/igv/batch
Commands recognized in a batch script: https://www.broadinstitute.org/software/igv/PortCommands
We also need to provide sample attribute file as described here: http://www.broadinstitute.org/software/igv/?q=SampleInformation
下載數(shù)據(jù)集的批處理腳本和屬性文件:
- Batch script: Run_batch_IGV_snapshots.txt
- Attribute file: Igv_HCC1143_attributes.txt