GSE120974多組學(xué)分析復(fù)現(xiàn)

0.數(shù)據(jù)介紹

這是一篇發(fā)表在Cancer Cell 2019 Sep的文章——Single-Cell Transcriptomics in Medulloblastoma Reveals Tumor-Initiating Progenitors and Oncogenic Cascades during Tumorigenesis and Relapse(PMID: 31474569)癣蟋。本文揭示了OLIG2+細(xì)胞在髓母細(xì)胞瘤的發(fā)生發(fā)展起到重要作用呵晨,并且會以靜息狀態(tài)宗侦,存在于成熟的髓母細(xì)胞瘤中检吆,并在復(fù)發(fā)時被激活,可能是髓母細(xì)胞瘤對治療不敏感的原因之一庄吼。OLIG2能夠通過表觀調(diào)控丘喻,激活hippo-YAP和AUTOTA-A/MYCN的網(wǎng)絡(luò)促進(jìn)MB的生長屡立。

Single-Cell Transcriptomics in Medulloblastoma Reveals Tumor-Initiating Progenitors and Oncogenic Cascades during Tumorigenesis and Relapse.png

在曾老師推送的基礎(chǔ)上褂痰,想進(jìn)一步對文章的數(shù)據(jù)進(jìn)行復(fù)現(xiàn)亩进。常規(guī)轉(zhuǎn)錄差異建議都加上一個轉(zhuǎn)錄因子數(shù)據(jù) (qq.com)
本數(shù)據(jù)集包含了ChIP-seq,ATAC-seq缩歪,RNA-seq归薛。是非常好的練手機會。
背景知識:關(guān)于Medulloblastoma (MB):主要是有4個分子亞型:Wingless (WNT), Sonic hedgehog (SHH), Group 3, and Group 4匪蝙。其中主籍,SHH亞型的信號通路激活主要緣于PTCH1的功能缺失以及SMO的激活,其中GNAS是一種抑癌基因逛球,其功能缺陷與不良預(yù)后相關(guān)

1.數(shù)據(jù)預(yù)處理

現(xiàn)在SRA網(wǎng)站中獲取實驗設(shè)計的情況千元。

Run BioProject  BioSample   Assay Type  AvgSpotLen  Bases   Bytes   Experiment  GEO_Accession   LibraryLayout   LibrarySelection    LibrarySource   Sample Name source_name SRA Study   Age geneotype   Antibody    tissue
1   SRR7984671  PRJNA495715 SAMN10220650    ATAC-seq    75  1.09 G  535.44 Mb   SRX4815869  GSM3423042  SINGLE  other   GENOMIC GSM3423042  medulloblastoma tumor cells SRP165176   Postnatal day 20    hGFAPcre;Pthc/c 
2   SRR7984672  PRJNA495715 SAMN10220649    ATAC-seq    75  569.80 M    286.09 Mb   SRX4815870  GSM3423043  SINGLE  other   GENOMIC GSM3423043  medulloblastoma tumor cells SRP165176   Postnatal day 20    hGFAPcre;Pthc/c;Olig2-GFP   
3   SRR7984673  PRJNA495714 SAMN10220667    ChIP-Seq    75  1.10 G  665.72 Mb   SRX4815871  GSM3423044  SINGLE  ChIP    GENOMIC GSM3423044  Cerebellum  SRP165177           Olig2 (Millipore, ab9610)
4   SRR7984674  PRJNA495714 SAMN10220666    ChIP-Seq    75  2.97 G  1.32 Gb SRX4815872  GSM3423045  SINGLE  ChIP    GENOMIC GSM3423045  Medulloblastoma tissue  SRP165177           Olig2 (Millipore, ab9610)
5   SRR7984675  PRJNA495714 SAMN10220665    ChIP-Seq    75  1.47 G  753.94 Mb   SRX4815873  GSM3423046  SINGLE  ChIP    GENOMIC GSM3423046  Medulloblastoma tissue  SRP165177           H3k27ac (Abcam, ab4729)
6   SRR7984676  PRJNA495713 SAMN10220664    RNA-Seq 81  9.18 G  4.98 Gb SRX4815874  GSM3423057  PAIRED  cDNA    TRANSCRIPTOMIC  GSM3423057  medullablastoma tumor cells SRP165179   Postnatal day 24 to30   hGFAPcre;Ptchc/c    
7   SRR7984677  PRJNA495712 SAMN10220663    RNA-Seq 150 5.46 G  1.96 Gb SRX4815875  GSM3423058  PAIRED  cDNA    TRANSCRIPTOMIC  GSM3423058  cerebellum  SRP165178   Postnatal day 18    Lats1/2f/f  
8   SRR7984678  PRJNA495712 SAMN10220662    RNA-Seq 150 4.95 G  1.79 Gb SRX4815876  GSM3423059  PAIRED  cDNA    TRANSCRIPTOMIC  GSM3423059  cerebellum  SRP165178   Postnatal day 18    Lats1/2f/f  
9   SRR7984679  PRJNA495712 SAMN10220661    RNA-Seq 150 4.82 G  1.77 Gb SRX4815877  GSM3423060  PAIRED  cDNA    TRANSCRIPTOMIC  GSM3423060  cerebellum  SRP165178   Postnatal day 18    Lats1/2f/f  
10  SRR7984680  PRJNA495712 SAMN10220660    RNA-Seq 150 4.38 G  1.59 Gb SRX4815878  GSM3423061  PAIRED  cDNA    TRANSCRIPTOMIC  GSM3423061  Medullobalstoma SRP165178   Postnatal day 18    Lats1/2f/f; Math1cre    
11  SRR7984681  PRJNA495712 SAMN10220646    RNA-Seq 150 4.17 G  1.50 Gb SRX4815879  GSM3423062  PAIRED  cDNA    TRANSCRIPTOMIC  GSM3423062  Medullobalstoma SRP165178   Postnatal day 18    Lats1/2f/f; Math1cre    
12  SRR7984682  PRJNA495712 SAMN10220645    RNA-Seq 150 5.06 G  1.84 Gb SRX4815880  GSM3423063  PAIRED  cDNA    TRANSCRIPTOMIC  GSM3423063  Medullobalstoma SRP165178   Postnatal day 18    Lats1/2f/f; Math1cre    
13  SRR7984683  PRJNA495712 SAMN10220644    RNA-Seq 76  2.24 G  720.74 Mb   SRX4815881  GSM3423064  SINGLE  cDNA    TRANSCRIPTOMIC  GSM3423064  Medullobalstoma SRP165178   Postnatal day 20    Ptchf/f; hGFAPcre   
14  SRR7984684  PRJNA495712 SAMN10220643    RNA-Seq 76  2.17 G  700.20 Mb   SRX4815882  GSM3423065  SINGLE  cDNA    TRANSCRIPTOMIC  GSM3423065  Medullobalstoma SRP165178   Postnatal day 20    Ptchf/f; hGFAPcre   
15  SRR7984685  PRJNA495712 SAMN10220642    RNA-Seq 76  2.16 G  692.55 Mb   SRX4815883  GSM3423066  SINGLE  cDNA    TRANSCRIPTOMIC  GSM3423066  Medullobalstoma SRP165178   Postnatal day 20    Ptchf/f; Olig2c/c; hGFAPcre 
16  SRR7984686  PRJNA495712 SAMN10220641    RNA-Seq 76  2.21 G  710.51 Mb   SRX4815884  GSM3423067  SINGLE  cDNA    TRANSCRIPTOMIC  GSM3423067  Medullobalstoma SRP165178   Postnatal day 20    Ptchf/f; Olig2c/c; hGFAPcre 
17  SRR8037270  PRJNA495592 SAMN10228087    RNA-Seq 132 3.16 G  1.87 Gb SRX4867885  GSM3423041  PAIRED  cDNA    TRANSCRIPTOMIC  GSM3423041  medullablastoma tumor cells SRP165234   Postnatal day 10    hGFAPcre;Ptchc/c    
18  SRR8037271  PRJNA495592 SAMN10228087    RNA-Seq 132 2.85 G  1.69 Gb SRX4867885  GSM3423041  PAIRED  cDNA    TRANSCRIPTOMIC  GSM3423041  medullablastoma tumor cells SRP165234   Postnatal day 10    hGFAPcre;Ptchc/c    
19  SRR8037272  PRJNA495592 SAMN10228087    RNA-Seq 132 4.23 G  2.48 Gb SRX4867885  GSM3423041  PAIRED  cDNA    TRANSCRIPTOMIC  GSM3423041  medullablastoma tumor cells SRP165234   Postnatal day 10    hGFAPcre;Ptchc/c    
20  SRR8037273  PRJNA495592 SAMN10228087    RNA-Seq 132 2.80 G  1.67 Gb SRX4867885  GSM3423041  PAIRED  cDNA    TRANSCRIPTOMIC  GSM3423041  medullablastoma tumor cells SRP165234   Postnatal day 10    hGFAPcre;Ptchc/c    
21  SRR9667717  PRJNA495714 SAMN12249854    ChIP-Seq    75  2.34 G  1.08 Gb SRX6428312  GSM3936983  SINGLE  ChIP    GENOMIC GSM3936983  Cerebellum  SRP165177           Olig2 (Millipore, ab9610)
22  SRR9667718  PRJNA495714 SAMN12249853    ChIP-Seq    75  1.98 G  896.85 Mb   SRX6428313  GSM3936984  SINGLE  ChIP    GENOMIC GSM3936984  Medulloblastoma tissue  SRP165177           Olig2 (Millipore, ab9610)
23  SRR9667719  PRJNA495714 SAMN12249852    ChIP-Seq    75  1.82 G  838.77 Mb   SRX6428308  GSM3936985  SINGLE  ChIP    GENOMIC GSM3936985  Medulloblastoma tissue  SRP165177           Olig2 (Millipore, ab9610)
24  SRR9667720  PRJNA495714 SAMN12249850    ChIP-Seq    75  2.02 G  924.48 Mb   SRX6428309  GSM3936986  SINGLE  ChIP    GENOMIC GSM3936986  Medulloblastoma tissue  SRP165177           H3k27ac (Abcam, ab4729)
25  SRR9667721  PRJNA495714 SAMN12249849    ChIP-Seq    75  1.10 G  665.72 Mb   SRX6428310  GSM3936987  SINGLE  ChIP    GENOMIC GSM3936987  Cerebellum  SRP165177           none
26  SRR9667722  PRJNA495714 SAMN12249847    ChIP-Seq    75  2.64 G  1.16 Gb SRX6428311  GSM3936988  SINGLE  ChIP    GENOMIC GSM3936988  Medulloblastoma tissue  SRP165177           none
27  SRR9667723  PRJNA495713 SAMN12249846    RNA-Seq 83  9.24 G  3.78 Gb SRX6428314  GSM3936989  PAIRED  cDNA    TRANSCRIPTOMIC  GSM3936989  medullablastoma tumor cells SRP165179   Postnatal day 12    hGFAPcre;Ptchc/c    
28  SRR9667724  PRJNA495712 SAMN12249845    RNA-Seq 75  1.74 G  830.27 Mb   SRX6428315  GSM3936990  SINGLE  cDNA    TRANSCRIPTOMIC  GSM3936990  GFP+ cells  SRP165178   Postnatal day 15    hGFAPcre;Ptchc/c;Olig2-GFP  
29  SRR9667725  PRJNA495712 SAMN12249844    RNA-Seq 75  1.72 G  826.82 Mb   SRX6428316  GSM3936991  SINGLE  cDNA    TRANSCRIPTOMIC  GSM3936991  GFP+ cells  SRP165178   Postnatal day 15    hGFAPcre;Ptchc/c;Olig2-GFP  
30  SRR9667726  PRJNA495712 SAMN12249843    RNA-Seq 75  1.73 G  832.94 Mb   SRX6428317  GSM3936992  SINGLE  cDNA    TRANSCRIPTOMIC  GSM3936992  GFP+ cells  SRP165178   Postnatal day 15    hGFAPcre;Ptchc/c;Olig2-GFP  
31  SRR9667727  PRJNA495712 SAMN12249841    RNA-Seq 75  1.63 G  787.67 Mb   SRX6428318  GSM3936993  SINGLE  cDNA    TRANSCRIPTOMIC  GSM3936993  GFP- cells  SRP165178   Postnatal day 15    hGFAPcre;Ptchc/c;Olig2-GFP  
32  SRR9667728  PRJNA495712 SAMN12249840    RNA-Seq 75  1.63 G  782.61 Mb   SRX6428319  GSM3936994  SINGLE  cDNA    TRANSCRIPTOMIC  GSM3936994  GFP- cells  SRP165178   Postnatal day 15    hGFAPcre;Ptchc/c;Olig2-GFP  
33  SRR9667729  PRJNA495712 SAMN12249838    RNA-Seq 75  1.79 G  861.33 Mb   SRX6428320  GSM3936995  SINGLE  cDNA    TRANSCRIPTOMIC  GSM3936995  GFP- cells  SRP165178   Postnatal day 15    hGFAPcre;Ptchc/c;Olig2-GFP  

1.1 原始文件下載

mkdir SRRfile
prefetch --option-file SRR_Acc_List.txt

1.2 使用fasterq進(jìn)行解壓縮

ls .| while read id;do(fasterq-dump $id);done
mkdir ../fastq 
mv *.fastq ../fastq

忘記設(shè)置output的地址了,這下得把所有的fastq文件重新整理一下

cd fastq
cat ../SRR_Acc_List.txt | while read id;do(mkdir $id);done

1.3 測序數(shù)據(jù)的質(zhì)控

cat SRR_Acc_List.txt | while read id;do(
  fastqc -o ./fastqc -t 20 ./fastq/$id/*.fastq \
  && multiqc ./fastqc -o ./fastqc);
done

1.4 測序數(shù)據(jù)分類整理

由于原始數(shù)據(jù)涉及了不同的文件類型需忿,所以需要相應(yīng)進(jìn)行整理诅炉。在總文件夾中寫入不同數(shù)據(jù)類型所對應(yīng)的SRR序號蜡歹。shell腳本不是特別熟練屋厘,因此就在單機上,使用excel進(jìn)行篩選了月而。

# vi chip.txt
SRR7984673
SRR7984674
SRR7984675
SRR9667717
SRR9667718
SRR9667719
SRR9667720
SRR9667721
SRR9667722

# vi rna.txt
SRR7984676
SRR7984677
SRR7984678
SRR7984679
SRR7984680
SRR7984681
SRR7984682
SRR7984683
SRR7984684
SRR7984685
SRR7984686
SRR8037270
SRR8037271
SRR8037272
SRR8037273
SRR9667723
SRR9667724
SRR9667725
SRR9667726
SRR9667727
SRR9667728
SRR9667729

# vi atac.txt
SRR7984671
SRR7984672

RNA-seq共有22個測序文件汗洒;ATAC-seq共有2個測序文件;ChIP-seq共有9個測序文件父款。

2.ChIP-seq數(shù)據(jù)處理

2.1 使用trim_galore自動去接頭

這個數(shù)據(jù)貌似已經(jīng)經(jīng)過質(zhì)控了溢谤,因此這步貌似可做可不做瞻凤。G沒有去接頭時,比對上的比例還是能夠達(dá)到95%以上世杀,還是不錯的阀参。

mkdir clean
cat chip.txt | while read id;do\
  trim_galore --quality 20 --phred33 \
  --length 20 -j 20 \
  -o ./clean/$id.fastq.gz \
 ./fastq/$id/$id.fastq;
done

2.2 使用bowtie2進(jìn)行比對

mkdir aligned
cat chip.txt | while read id;do \
  bowtie2 -p 20 -3 5 --local -x ~/mm10/mm10 -U ./fastq/$id/$id.fastq | \
  samtools sort -O bam -o ./aligned/$id.bam; \
done

比對過程需要花很多時間。

# SRR7984673
# 14675638 reads; of these:
#   14675638 (100.00%) were unpaired; of these:
#     339654 (2.31%) aligned 0 times
#     8731707 (59.50%) aligned exactly 1 time
#     5604277 (38.19%) aligned >1 times
# 97.69% overall alignment rate
# [bam_sort_core] merging from 4 files and 1 in-memory blocks...
......

2.3 使用sambamba去重

比對的效率總體還是不錯的瞻坝。比對之后進(jìn)行PCR去重蛛壳。一開始選擇了picard工具進(jìn)行去重。

mkdir deduplicate
cat atac.txt |while read id
do(picard MarkDuplicates --REMOVE_DUPLICATES true -I aligned/${id}.bam \
-O ./deduplicate/${id}_deduplicate.bam -M ${id}.log);  
done 

然而占用的線程過多所刀,為了避免被管理員拉黑衙荐,選擇其他的去重方法sambamba

mkdir deduplicate
cat chip.txt |while read id;do
  sambamba markdup -r aligned/${id}.bam deduplicate/${id}_deduplicate.bam;
done

2.4 使用MACS2 call peak

在call peak之前,需要對明確對照組和實驗組浮创,根據(jù)文章的介紹忧吟,對GSM序號和SRR序號進(jìn)行配對。具體的對應(yīng)關(guān)系如下所示:

SRR7984673 GSM3423044 ChIP_CB_Olig2
SRR7984674 GSM3423045 ChIP_MB_Olig2
SRR7984675 GSM3423046 ChIP_MB_H3k27ac
SRR9667717 GSM3936983 ChIP_CB_H3k27ac
SRR9667718 GSM3936984 ChIP_MB_Olig2_repeat 2
SRR9667719 GSM3936985 ChIP_MB_Olig2_repeat 3
SRR9667720 GSM3936986 ChIP_MB_H3k27ac_repeat 2
SRR9667721 GSM3936987 CB_Input
SRR9667722 GSM3936988 MB_Input

因為涉及比較的組數(shù)不多斩披,所以可以手動進(jìn)行設(shè)置溜族,如需批量操作,可以自行構(gòu)建比較文檔垦沉。
MACS2的命令比較簡單斩祭,主要就是確定好對照組和實驗組。

mkdir macs2
# MB_H3K27ac_1
macs2 callpeak -c ./deduplicate/SRR9667722_deduplicate.bam \
  -t  ./deduplicate/SRR7984675_deduplicate.bam \
  -q 0.05 -f BAM -g mm --outdir macs2 \
  -n MB_H3K27ac_1_ 

# MB_H3K27ac_2
macs2 callpeak -c ./deduplicate/SRR9667722_deduplicate.bam \
  -t  ./deduplicate/SRR9667720_deduplicate.bam \
  -q 0.05 -f BAM -g mm --outdir macs2 \
  -n MB_H3K27ac_2_ 

# MB_OLIG2_1
macs2 callpeak -c ./deduplicate/SRR9667722_deduplicate.bam \
  -t  ./deduplicate/SRR7984674_deduplicate.bam \
  -q 0.05 -f BAM -g mm --outdir macs2 \
  -n MB_OLIG2_1_ 

# MB_OLIG2_2
macs2 callpeak -c ./deduplicate/SRR9667722_deduplicate.bam \
  -t  ./deduplicate/SRR9667718_deduplicate.bam \
  -q 0.05 -f BAM -g mm --outdir macs2 \
  -n MB_OLIG_2_ 

# MB_OLIG2_3
macs2 callpeak -c ./deduplicate/SRR9667722_deduplicate.bam \
  -t  ./deduplicate/SRR9667719_deduplicate.bam \
  -q 0.05 -f BAM -g mm --outdir macs2 \
  -n MB_OLIG_3_ 

# CB_OLIG2_1
macs2 callpeak -c ./deduplicate/SRR9667721_deduplicate.bam \
  -t  ./deduplicate/SRR7984673_deduplicate.bam \
  -q 0.05 -f BAM -g mm --outdir macs2 \
  -n CB_OLIG_1_ 

# CB_H3K27ac_1
macs2 callpeak -c ./deduplicate/SRR9667721_deduplicate.bam \
  -t  ./deduplicate/SRR9667717_deduplicate.bam \
  -q 0.05 -f BAM -g mm --outdir macs2 \
  -n CB_H3K27ac_1_ 

每次比較都會得到四個文件:NAMEpeaks.xls乡话,NAMEsummits.bed摧玫,NAME_model.r,NAMEpeaks.narrowPeak

NAMEpeaks.xls:peak信息绑青,和bed格式類似诬像,但是以1為基,而bed文件是以0為基.也就是說xls的坐標(biāo)都要減一才是bed文件的坐標(biāo)
NAMEpeaks.narrowPeak:后面4列表示為闸婴, integer score for display坏挠, fold-change,-log10pvalue邪乍,-log10qvalue降狠,relative summit position to peak start。內(nèi)容和NAMEpeaks.xls基本一致庇楞,適合用于導(dǎo)入R進(jìn)行分析榜配。
NAMEsummits.bed:記錄每個peak的peak summits,也就是記錄極值點的位置吕晌。MACS建議用該文件尋找結(jié)合位點的motif蛋褥。
NAME_model.r:能通過NAME_model.r作圖,得到是基于你提供數(shù)據(jù)的peak模型

2.5 deeptools可視化

peak的可視化有多種軟件可供選擇睛驳,包括deeptools烙心。也可以將數(shù)據(jù)導(dǎo)入R環(huán)境中膜廊,用Y叔的CHIPseeker包進(jìn)行可視化。

Figure6F-H.png

這是原文中Figure6F-H的數(shù)據(jù)淫茵。F圖是通過熱圖展示OLIG2 binding的位點上爪瓜,也往往有高強度的H3K27ac的信號,提示著一種OLIG2在增強相關(guān)基因表達(dá)的機制匙瘪。G圖是顯示在MB腫瘤中钥勋,OLIG2和H3K27ac的結(jié)合情況。H圖則是針對結(jié)合的位點進(jìn)行motif的分析辆苔,預(yù)測最有可能對目標(biāo)基因進(jìn)行調(diào)控的轉(zhuǎn)錄因子算灸。
原文中對方法學(xué)的描述是這樣的:

All sequencing data were mapped to mouse genome assembly mm10, and ChIP-seq peak calling was performed as previously described using model-based analysis of ChIP-seq (MACS, version 1.4.2; http://liulab.dfci.harvard.edu/MACS) with default parameters to get primary binding regions. To ensure that our data were of high quality and reproducibility, we called peaks with enrichment R10-fold over control (p % 10-9) and compared the peak sets using the ENCODE overlap rules. The identified primary regions were further filtered using the following criteria, to define a more stringent protein-DNA interactome: (1) the p value cutoff was set to %10-9; and (2) we required an enrichment of six-fold and peak height >5. The genome-wide distribution of protein-binding regions was determined by HOMER (http://homer.salk.edu/homer/index.html) in reference to UCSC mm10. For all ChIP-seq data sets, WIG files were generated with MACS, which were subsequently visualized using Mochiview v1.46. Occupancy was analyzed with Pearson’s correlation and ToppCluster (https://toppcluster.cchmc.org/). ChIP-seq heat maps were ordered by strength of binding. The heat maps were drawn using the Heatmap tools provided by Cistrome (http://cistrome.org/ap).

以下開始對這些圖進(jìn)行復(fù)現(xiàn)。

#首先將bam轉(zhuǎn)換為bw文件
effective_genome_size=1870000000
cat chip.txt |while read id;do
bamCoverage -p 25 --bam ./deduplicate/${id}_deduplicate.bam -o ./bw/${id}.bw \
    --binSize 10 --normalizeUsing RPGC --effectiveGenomeSize ${effective_genome_size} 
done

F圖

# conda install deeptools
mkdir deeptools
computeMatrix reference-point --referencePoint TSS  -p 20\
 -b 5000 -a 5000 -R ~/mm10.tss.bed \
 -S  ./bw/SRR7984673.bw ./bw/SRR7984674.bw ./bw/SRR7984675.bw \
--skipZeros -o ./deeptools/F.gz \
--outFileSortedRegions ./deeptools/F_Heatmap1sortedRegions.bed
cd deeptools
plotHeatmap -m F.gz -out F.TSS.Heatmap.pdf --plotFileFormat pdf
plotProfile -m F.gz -out F.TSS.Profile.pdf --plotFileFormat pdf --perGroup
cd ..
F-1.png

G圖

computeMatrix reference-point --referencePoint TSS  -p 20\
 -b 5000 -a 5000 -R ~/mm10.tss.bed \
 -S ./bw/SRR7984674.bw ./bw/SRR7984675.bw \
--skipZeros -o ./deeptools/G.gz \
--outFileSortedRegions ./deeptools/G_Heatmap1sortedRegions.bed
cd deeptools
plotHeatmap -m G.gz -out G.TSS.Heatmap.pdf --plotFileFormat pdf
plotProfile -m G.gz -out G.TSS.Profile.pdf --plotFileFormat pdf --perGroup
G-1.png

仍然與原文中存在一些差異驻啤。這里TSS的范圍沒有按照原文-±400b的范圍進(jìn)行可視化菲驴,另外,需要對識別到的peak進(jìn)行相應(yīng)的篩選骑冗。

2.6 homer尋找motif

閱讀原文的方法學(xué)部分赊瞬,作者通過homer進(jìn)行motif的分析。下面對相關(guān)motif進(jìn)行復(fù)現(xiàn)
本文是針對MB中OLIG2結(jié)合位點進(jìn)行motif分析贼涩,因此巧涧,我們通過macs2獲得的peak,對特征性的motif進(jìn)行識別遥倦。

awk '{print $4"t"$1"t"$2"t"$3"t "}'  ./macs2/MB_OLIG2_1__peaks.narrowPeak > ./macs2/MB_OLIG2_1.homer_peaks.tmp
# 尋找motif
mkdir homer
findMotifsGenome.pl ./macs2/MB_OLIG2_1__summits.bed ~/mm10.fa  ./homer/MB_OLIG2_1.homer.motif  -size 200 -len 8,10,12

針對結(jié)合的DNA序列谤绳,會返回很多識別到的motif序列。但貌似跟原文展示的最顯著的都不太一樣袒哥。

2.7 DiffBind找差異peak

由于本文中涉及多組進(jìn)行比較缩筛,因此還可以通過DiffBind的方法尋找差異peak。

3.ATAC-seq數(shù)據(jù)處理

3.1 使用bowtie2進(jìn)行序列比對

ATAC與ChIP類似堡称,還是需要先將測序的片段比對到基因組上去

cat atac.txt | while read id;do \
  bowtie2 -p 25 -3 5 --local -x ~/mm10/mm10 -U ./fastq/$id/$id.fastq | \
  samtools sort -O bam -o ./aligned/$id.bam; \
done
#14541510 reads; of these:
#  14541510 (100.00%) were unpaired; of these:
#    171629 (1.18%) aligned 0 times
#    9336347 (64.20%) aligned exactly 1 time
#    5033534 (34.61%) aligned >1 times
#98.82% overall alignment rate
#[bam_sort_core] merging from 4 files and 1 in-memory blocks...
#7597315 reads; of these:
#  7597315 (100.00%) were unpaired; of these:
#    95358 (1.26%) aligned 0 times
#    4675659 (61.54%) aligned exactly 1 time
#    2826298 (37.20%) aligned >1 times
#98.74% overall alignment rate
[bam_sort_core] merging from 2 files and 1 in-memory blocks...

注意瞎抛,這里得到的bam文件是已經(jīng)排過序的bam文件

3.2 去除PCR重復(fù)

同ChIP。

# mkdir deduplicate
cat atac.txt |while read id;do
  sambamba markdup -r aligned/${id}.bam deduplicate/${id}_deduplicate.bam;
done

3.3 去重之后使用macs2進(jìn)行call peak

同ChIP却紧。

macs2 callpeak -c ./deduplicate/SRR7984671_deduplicate.bam \
  -t  ./deduplicate/SRR7984672_deduplicate.bam \
  -q 0.05 -f BAM -g mm --outdir macs2 \
  --nomodel --shift -100 --extsize 200 -n ATAC_ 

這步之后也可以獲得narrowPeak文件和bed文件桐臊。

3.4 獲取bw文件

前面的bam文件,narrowPeak文件晓殊,bed文件都有了断凶,現(xiàn)在還差一個bw文件
bw文件可以獲得測序的覆蓋度

# mkdir bw
effective_genome_size=1870000000
cat atac.txt |while read id;do
bamCoverage -p 20 --bam ./deduplicate/${id}_deduplicate.bam -o ./bw/${id}.bw \
    --binSize 10 --normalizeUsing RPGC --effectiveGenomeSize ${effective_genome_size} 
done

這是原文中根據(jù)ATAC-seq展現(xiàn)的不同基因位點染色質(zhì)開放程度的差異。


Figure 7B.png

導(dǎo)出bw文件挺物,在igv中挑選了幾個基因進(jìn)行了可視化懒浮,的確可以看到OLIG2+組中飘弧,染色質(zhì)的開放程度是更高的


igv可視化.png

3.5 使用deeptools可視化

# mkdir deeptools
computeMatrix reference-point --referencePoint TSS  -p 20\
 -b 5000 -a 5000 -R ~/mm10.tss.bed \
 -S ./bw/SRR7984671.bw ./bw/SRR7984672.bw \
--skipZeros -o ./deeptools/ATAC_TSS.gz \
--outFileSortedRegions ./deeptools/ATAC_Heatmap1sortedRegions.bed
cd deeptools
plotHeatmap -m ATAC_TSS.gz -out TSS.Heatmap.pdf --plotFileFormat pdf
plotProfile -m ATAC_TSS.gz -out TSS.Profile.pdf --plotFileFormat pdf --perGroup

不同的參數(shù)所映射的含義也不相同识藤,可以通過這張圖進(jìn)行了解(來源:plob.org)砚著。

image.png

從本數(shù)據(jù)中得到的Profile圖中,也可以明顯看到Olig2+的樣本中染色體的可及性更高痴昧,與上述結(jié)論一致稽穆。
Profile.png

富集到的最顯著的motif.png

4.RNA-seq數(shù)據(jù)處理

RNA-seq數(shù)據(jù)的處理和分析已經(jīng)比較普及了,在此先略過赶撰。

5.寫在后面

雖然能把整個pepline順過來舌镶,但是對于部分參數(shù)的理解還是不夠深入,仍然需要對相關(guān)生物學(xué)背景和軟件有更深的認(rèn)識豪娜。感謝生信技能樹搭建這個生信愛好者的平臺餐胀,以及Jimmy大神無私分享的這么多資料,讓生信這條學(xué)習(xí)路線不再陡峭瘤载。與大家一起進(jìn)步否灾。

參考資料:

  1. ChIP-seq詳細(xì)分析流程
  2. UCSC官網(wǎng)下載TSS.bed文件
  3. Plob.org
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市鸣奔,隨后出現(xiàn)的幾起案子墨技,更是在濱河造成了極大的恐慌,老刑警劉巖挎狸,帶你破解...
    沈念sama閱讀 206,013評論 6 481
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件扣汪,死亡現(xiàn)場離奇詭異,居然都是意外死亡锨匆,警方通過查閱死者的電腦和手機崭别,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 88,205評論 2 382
  • 文/潘曉璐 我一進(jìn)店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來恐锣,“玉大人紊遵,你說我怎么就攤上這事〗拿桑” “怎么了暗膜?”我有些...
    開封第一講書人閱讀 152,370評論 0 342
  • 文/不壞的土叔 我叫張陵,是天一觀的道長鞭衩。 經(jīng)常有香客問我学搜,道長,這世上最難降的妖魔是什么论衍? 我笑而不...
    開封第一講書人閱讀 55,168評論 1 278
  • 正文 為了忘掉前任瑞佩,我火速辦了婚禮,結(jié)果婚禮上坯台,老公的妹妹穿的比我還像新娘炬丸。我一直安慰自己,他們只是感情好,可當(dāng)我...
    茶點故事閱讀 64,153評論 5 371
  • 文/花漫 我一把揭開白布稠炬。 她就那樣靜靜地躺著焕阿,像睡著了一般。 火紅的嫁衣襯著肌膚如雪首启。 梳的紋絲不亂的頭發(fā)上暮屡,一...
    開封第一講書人閱讀 48,954評論 1 283
  • 那天,我揣著相機與錄音毅桃,去河邊找鬼褒纲。 笑死,一個胖子當(dāng)著我的面吹牛钥飞,可吹牛的內(nèi)容都是我干的莺掠。 我是一名探鬼主播,決...
    沈念sama閱讀 38,271評論 3 399
  • 文/蒼蘭香墨 我猛地睜開眼读宙,長吁一口氣:“原來是場噩夢啊……” “哼汁蝶!你這毒婦竟也來了?” 一聲冷哼從身側(cè)響起论悴,我...
    開封第一講書人閱讀 36,916評論 0 259
  • 序言:老撾萬榮一對情侶失蹤掖棉,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后膀估,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體幔亥,經(jīng)...
    沈念sama閱讀 43,382評論 1 300
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 35,877評論 2 323
  • 正文 我和宋清朗相戀三年察纯,在試婚紗的時候發(fā)現(xiàn)自己被綠了帕棉。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 37,989評論 1 333
  • 序言:一個原本活蹦亂跳的男人離奇死亡饼记,死狀恐怖香伴,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情具则,我是刑警寧澤即纲,帶...
    沈念sama閱讀 33,624評論 4 322
  • 正文 年R本政府宣布,位于F島的核電站博肋,受9級特大地震影響低斋,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜匪凡,卻給世界環(huán)境...
    茶點故事閱讀 39,209評論 3 307
  • 文/蒙蒙 一膊畴、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧病游,春花似錦唇跨、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,199評論 0 19
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽改橘。三九已至,卻和暖如春政勃,著一層夾襖步出監(jiān)牢的瞬間唧龄,已是汗流浹背兼砖。 一陣腳步聲響...
    開封第一講書人閱讀 31,418評論 1 260
  • 我被黑心中介騙來泰國打工奸远, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留,地道東北人讽挟。 一個月前我還...
    沈念sama閱讀 45,401評論 2 352
  • 正文 我出身青樓懒叛,卻偏偏與公主長得像,于是被迫代替她去往敵國和親耽梅。 傳聞我的和親對象是個殘疾皇子薛窥,可洞房花燭夜當(dāng)晚...
    茶點故事閱讀 42,700評論 2 345

推薦閱讀更多精彩內(nèi)容