ATAC-seq分析實(shí)操生信技能樹(shù)健明教程

放在最前面的參考鏈接：給學(xué)徒的ATAC-seq數(shù)據(jù)實(shí)戰(zhàn)（附上收費(fèi)視頻)

數(shù)據(jù)來(lái)源的文章：

The landscape of accessible chromatin in mammalian preimplantation embryos. Nature 2016 Jun 30;534(7609):652-7. PMID: 27309802

image.png

數(shù)據(jù)的GEO號(hào)：GSE66581

鏈接：https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE66581
健明說(shuō): 在SRA數(shù)據(jù)庫(kù)可以下載原始測(cè)序數(shù)據(jù) , 從文章找到數(shù)據(jù)的ID： https://www.ncbi.nlm.nih.gov/sra?term=SRP055881 把下面的內(nèi)容保存到文件棚赔，命名為 srr.list 就可以使用prefetch這個(gè)函數(shù)來(lái)下載。

配置環(huán)境之軟件的安裝预皇，這里首推通過(guò)conda來(lái)創(chuàng)建一個(gè)project專屬的環(huán)境

可以無(wú)腦復(fù)制下面這段代碼

# https://mirrors.tuna.tsinghua.edu.cn/help/anaconda/
# https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/ 
wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
source ~/.bashrc 
## 安裝好conda后需要設(shè)置鏡像。
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda
conda config --set show_channel_urls yes

conda  create -n atac -y   python=2 bwa
conda info --envs
source activate atac
# 可以用search先進(jìn)行檢索
conda search trim_galore
## 保證所有的軟件都是安裝在 wes 這個(gè)環(huán)境下面
conda install -y sra-tools  
conda install -y trim-galore  samtools bedtools
conda install -y deeptools homer  meme
conda install -y macs2 bowtie bowtie2 
conda install -y  multiqc 
conda install -y  sambamba

數(shù)據(jù)的下載塞帐，由于原文數(shù)據(jù)太多纫溃，這里選取了四組數(shù)據(jù)來(lái)進(jìn)行練習(xí)

創(chuàng)建文件config.sra內(nèi)容如下(fastq-dump時(shí)候使用，以便將sra文件轉(zhuǎn)換成fastq文件時(shí)候加上我們所需要的樣品名稱):

2-cell-1 SRR2927015
2-cell-2 SRR2927016
2-cell-5 SRR3545580
2-cell-4 SRR2927018

創(chuàng)建srr.list文件（里面為我們所需要下載的數(shù)據(jù)的SRR號(hào)）

SRR2927015
SRR2927016
SRR3545580
SRR2927018

數(shù)據(jù)下載

source activate atac 
mkdir -p  ~/project/atac/
cd ~/project/atac/
mkdir {sra,raw,clean,align,peaks,motif,qc}
cd sra 
cat srr.list |while read id;do ( nohup  prefetch $id & );done
## 默認(rèn)下載目錄：~/ncbi/public/sra/

下載完成后數(shù)據(jù)大小如下：

-rw-r--r-- 1 4.2G Nov 20  2015 SRR2927015.sra
-rw-r--r-- 1 5.5G Nov 20  2015 SRR2927016.sra
-rw-r--r-- 1 2.0G Nov 20  2015 SRR2927018.sra
-rw-r--r-- 1 7.0G May 20  2016 SRR3545580.sra

第一步：將sra文件轉(zhuǎn)換成fastq文件

mv ~/ncbi/public/sra/SRR* sra/

創(chuàng)建sh文件01_fastq-dump.sh赁酝，內(nèi)容如下：

## 下面需要用循環(huán)
## cd ~/project/atac/
source activate atac
dump=fastq-dump
analysis_dir=raw
# mkdir -p $analysis_dir # 由于之前已經(jīng)創(chuàng)建過(guò)了犯祠，所以這里就無(wú)需創(chuàng)建了
## 下面用到的 config.sra 文件，就是上面自行制作的酌呆。

# $fastq-dump sra/SRR2927015.sra  --gzip --split-3  -A 2-cell-1 -O clean/
cat config.sra |while read id;
do echo $id
arr=($id) # 這里可以類似看成獲得矩陣
srr=${arr[1]} # 這里表示提取矩陣的第二列衡载，即SRR號(hào)
sample=${arr[0]} # 這里表示提取矩陣的第一列，即樣本名稱
#  測(cè)序數(shù)據(jù)的sra轉(zhuǎn)fasq
nohup $dump -A  $sample -O $analysis_dir  --gzip --split-3  sra/$srr.sra &
done

然后運(yùn)行, 有集群的切勿在登陸節(jié)點(diǎn)運(yùn)行隙袁，要切換到計(jì)算節(jié)點(diǎn)（不知道為什么我自己的不能通過(guò)提交任務(wù)qsub來(lái)運(yùn)行）

sh 01_fastq-dump.sh

第二步痰娱，測(cè)序數(shù)據(jù)的過(guò)濾

手動(dòng)創(chuàng)建一個(gè)包含fastq文件路徑的config.raw文本文件，第一列隨意填菩收，充數(shù)梨睁，第二列為fastq1的路徑，第二列為fastq2的路徑娜饵。

1 ~/project/atac/raw/2-cell-1_1.fastq.gz ~/project/atac/raw/2-cell-1_2.fastq.gz
2 ~/project/atac/raw/2-cell-2_1.fastq.gz ~/project/atac/raw/2-cell-2_2.fastq.gz
3 ~/project/atac/raw/2-cell-4_1.fastq.gz ~/project/atac/raw/2-cell-4_2.fastq.gz
4 ~/project/atac/raw/2-cell-5_1.fastq.gz ~/project/atac/raw/2-cell-5_2.fastq.gz

創(chuàng)建02_trim_galore.sh文件坡贺，內(nèi)容如下

cd ~/project/atac/
# mkdir -p clean 
source activate atac
# trim_galore -q 25 --phred33 --length 35 -e 0.1 --stringency 4 --paired -o clean/ raw/2-cell-1_1.fastq.gz raw/2-cell-1_2.fastq.gz
cat config.raw  |while read id;
do echo $id
arr=($id)
fq2=${arr[2]}
fq1=${arr[1]}
sample=${arr[0]}
nohup  trim_galore -q 25 --phred33 --length 35 -e 0.1 --stringency 4 --paired -o  clean  $fq1   $fq2  &
done
ps -ef |grep trim

第三步，數(shù)據(jù)質(zhì)量的檢測(cè)

創(chuàng)建03_fastqc_multiqc.sh文本文件箱舞，內(nèi)容如下

mkdir -p qc
cd ~/JMJ705_ChIP-seq/qc
mkdir -p clean
fastqc -t 5  ../clean/*gz -o clean/
mkdir -p raw
fastqc -t 5  ../raw/*gz -o raw/

使用multiqc進(jìn)行合并質(zhì)檢結(jié)果

cd raw/
multiqc ./*zip

cd clean/
multiqc ./*zip

第四步遍坟，比對(duì)

健明說(shuō)：比對(duì)需要的index，看清楚物種晴股，根據(jù)對(duì)應(yīng)的軟件來(lái)構(gòu)建愿伴，這里直接用bowtie2進(jìn)行比對(duì)和統(tǒng)計(jì)比對(duì)率, 需要提前下載參考基因組然后使用命令構(gòu)建索引，或者直接就下載索引文件：下載小鼠參考基因組的索引和注釋文件, 這里用常用的mm10
下載索引文件

# 索引大小為3.2GB队魏， 不建議自己下載基因組構(gòu)建公般，可以直接下載索引文件，代碼如下：
mkdir referece && cd reference
wget -4 -q ftp://ftp.ccb.jhu.edu/pub/data/bowtie2_indexes/mm10.zip
unzip mm10.zip

解壓后文件大小

-rw-r--r-- 1  848M May  3  2012 mm10.rev.1.bt2
-rw-r--r-- 1  633M May  3  2012 mm10.rev.2.bt2
-rw-r--r-- 1  848M May  2  2012 mm10.1.bt2
-rw-r--r-- 1  633M May  2  2012 mm10.2.bt2
-rw-r--r-- 1  6.0K May  2  2012 mm10.3.bt2
-rw-r--r-- 1  633M May  2  2012 mm10.4.bt2

取少量數(shù)據(jù)進(jìn)行比對(duì)胡桨，測(cè)試流程

zcat clean/2-cell-1_1_val_1.fq.gz |head -10000 > test_file/test1.fq
zcat clean/2-cell-1_2_val_2.fq.gz |head -10000 > test_file/test2.fq

bowtie2 -x ~/project/atac/referece/mm10 -1 test1.fq  -2 test2.fq | samtools sort -@ 5 -O bam -o test.bam
# 三種不同的去重復(fù)軟件
# 這里選用sambamba來(lái)去重復(fù)
sambamba markdup -r test.bam  test.sambamba.rmdup.bam
samtools flagstat test.sambamba.rmdup.bam

samtools flagstat test.sambamba.rmdup.bam
samtools flagstat test.bam

## 接下來(lái)只保留兩條reads要比對(duì)到同一條染色體(Proper paired) 官帘，還有高質(zhì)量的比對(duì)結(jié)果(Mapping quality>=30)
## 順便過(guò)濾 線粒體reads
samtools view -h -f 2 -q 30  test.sambamba.rmdup.bam |grep -v chrM| samtools sort  -O bam  -@ 5 -o - > test.last.bam
bedtools bamtobed -i test.last.bam  > test.bed

測(cè)試無(wú)錯(cuò)誤，可進(jìn)行下一步分析, 創(chuàng)建04_bowtie2_align_sambamba_markdup.sh文件昧谊，基本copy健明的就行刽虹，改改文件路徑，內(nèi)容如下：

cd ~/project/atac/align

ls ~/project/atac/clean/*_1.fq.gz > 1
ls ~/project/atac/clean/*_2.fq.gz > 2
ls ~/project/atac/clean/*_2.fq.gz |cut -d"/" -f 8|cut -d"_" -f 1  > 0 ## 這里最好自己逐步分開(kāi)運(yùn)行一下檢測(cè)0里面的結(jié)果是否是你的sample名稱
paste 0 1 2  > config.clean ## 供mapping使用的配置文件

## 相對(duì)目錄需要理解
bowtie2_index=~/project/atac/referece/mm10
## 一定要搞清楚自己的bowtie2軟件安裝在哪里呢诬，以及自己的索引文件在什么地方Ｓ空堋Ｅ昼汀！
#source activate atac 
cat config.clean |while read id;
do echo $id
arr=($id)
fq2=${arr[2]}
fq1=${arr[1]}
sample=${arr[0]}
## 比對(duì)過(guò)程15分鐘一個(gè)樣本
bowtie2  -p 5  --very-sensitive -X 2000 -x  $bowtie2_index -1 $fq1 -2 $fq2 |samtools sort  -O bam  -@ 5 -o - > ${sample}.raw.bam
samtools index ${sample}.raw.bam
bedtools bamtobed -i ${sample}.raw.bam  > ${sample}.raw.bed
samtools flagstat ${sample}.raw.bam  > ${sample}.raw.stat
# https://github.com/biod/sambamba/issues/177
sambamba markdup --overflow-list-size 600000  --tmpdir='./'  -r ${sample}.raw.bam  ${sample}.rmdup.bam
samtools index   ${sample}.rmdup.bam

## ref:https://www.biostars.org/p/170294/ 
## Calculate %mtDNA:
mtReads=$(samtools idxstats  ${sample}.rmdup.bam | grep 'chrM' | cut -f 3)
totalReads=$(samtools idxstats  ${sample}.rmdup.bam | awk '{SUM += $3} END {print SUM}')
echo '==> mtDNA Content:' $(bc <<< "scale=2;100*$mtReads/$totalReads")'%'

samtools flagstat  ${sample}.rmdup.bam > ${sample}.rmdup.stat
samtools view  -h  -f 2 -q 30    ${sample}.rmdup.bam   |grep -v chrM |samtools sort  -O bam  -@ 5 -o - > ${sample}.last.bam
samtools index   ${sample}.last.bam
samtools flagstat  ${sample}.last.bam > ${sample}.last.stat
bedtools bamtobed -i ${sample}.last.bam  > ${sample}.bed
done

其中bowtie2比對(duì)加入了-X 2000 參數(shù)阀圾，是最大插入片段哪廓，寬泛的插入片段范圍(10-1000bp)
得到的bam文件如下：

-rw-r--r-- 1  523M Oct 12 07:15 ./2-cell-5.last.bam
-rw-r--r-- 1  899M Oct 12 07:14 ./2-cell-5.rmdup.bam
-rw-r--r-- 1  5.5G Oct 12 06:51 ./2-cell-5.raw.bam
-rw-r--r-- 1  427M Oct 12 03:23 ./2-cell-4.last.bam
-rw-r--r-- 1  586M Oct 12 03:23 ./2-cell-4.rmdup.bam
-rw-r--r-- 1  1.8G Oct 12 03:17 ./2-cell-4.raw.bam
-rw-r--r-- 1  678M Oct 12 02:22 ./2-cell-2.last.bam
-rw-r--r-- 1  1.1G Oct 12 02:20 ./2-cell-2.rmdup.bam
-rw-r--r-- 1  4.6G Oct 12 02:00 ./2-cell-2.raw.bam
-rw-r--r-- 1  490M Oct 11 23:02 ./2-cell-1.last.bam
-rw-r--r-- 1  776M Oct 11 23:01 ./2-cell-1.rmdup.bam
-rw-r--r-- 1  3.7G Oct 11 22:48 ./2-cell-1.raw.bam

上述腳本的步驟都可以拆分運(yùn)行，比如bam文件構(gòu)建index或者轉(zhuǎn)為bed的:

ls *.last.bam|xargs -i samtools index {} 
ls *.last.bam|while read id;do (bedtools bamtobed -i $id >${id%%.*}.bed) ;done
ls *.raw.bam|while read id;do (nohup bedtools bamtobed -i $id >${id%%.*}.raw.bed & ) ;done

最后得到的bed文件是

-rw-r--r-- 1  254M Oct 12 07:16 ./2-cell-5.bed
-rw-r--r-- 1  203M Oct 12 03:24 ./2-cell-4.bed
-rw-r--r-- 1  338M Oct 12 02:23 ./2-cell-2.bed
-rw-r--r-- 1  237M Oct 11 23:03 ./2-cell-1.bed

第五步初烘，使用macs2找peaks

# macs2 callpeak -t 2-cell-1.bed  -g mm --nomodel --shift -100 --extsize 200  -n 2-cell-1 --outdir ../peaks/
cd ~/project/atac/peaks/
ls *.bed | while read id ;do (macs2 callpeak -t $id  -g mm --nomodel --shift  -100 --extsize 200  -n ${id%%.*} --outdir ./) ;done

macs2軟件說(shuō)明書(shū)詳見(jiàn)：http://www.reibang.com/p/21e8c51fca23
得到如下結(jié)果

-rw-r--r-- 1  1.1M Oct 12 14:35 2-cell-5_peaks.narrowPeak
-rw-r--r-- 1  690K Oct 12 14:35 2-cell-5_summits.bed
-rw-r--r-- 1  1.2M Oct 12 14:35 2-cell-5_peaks.xls
-rw-r--r-- 1  368K Oct 12 14:35 2-cell-4_peaks.narrowPeak
-rw-r--r-- 1  418K Oct 12 14:35 2-cell-4_peaks.xls
-rw-r--r-- 1  247K Oct 12 14:35 2-cell-4_summits.bed
-rw-r--r-- 1  1.2M Oct 12 14:35 2-cell-2_peaks.narrowPeak
-rw-r--r-- 1  1.4M Oct 12 14:35 2-cell-2_peaks.xls
-rw-r--r-- 1  805K Oct 12 14:35 2-cell-2_summits.bed
-rw-r--r-- 1  634K Oct 12 14:34 2-cell-1_peaks.narrowPeak
-rw-r--r-- 1  720K Oct 12 14:34 2-cell-1_peaks.xls
-rw-r--r-- 1  425K Oct 12 14:34 2-cell-1_summits.bed

第六步涡真，計(jì)算插入片段長(zhǎng)度，F(xiàn)RiP值肾筐，IDR計(jì)算重復(fù)情況

非冗余非線粒體能夠比對(duì)的fragment哆料、比對(duì)率、NRF吗铐、PBC1东亦、PBC2、peak數(shù)唬渗、無(wú)核小體區(qū)NFR典阵、TSS富集、FRiP 镊逝、IDR重復(fù)的一致性萄喳！
名詞解釋：https://www.encodeproject.org/data-standards/terms/
參考：https://www.encodeproject.org/atac-seq/

統(tǒng)計(jì)indel插入長(zhǎng)度的分布

看 bam文件第9列，在R里面統(tǒng)計(jì)繪圖 bam文件詳解

image.png
提取bam文件的第九列, 創(chuàng)建一個(gè)config.last_bam文件蹋半，里面內(nèi)容包含bam文件的名稱

2-cell-1.last.bam 2-cell-1.last
2-cell-2.last.bam 2-cell-2.last
2-cell-4.last.bam 2-cell-4.last
2-cell-5.last.bam 2-cell-5.last

然后創(chuàng)建了提取bam文件的第九列indel插入長(zhǎng)度信息的sh文件 indel_length.sh，內(nèi)容如下：

cat config.last_bam |while read id;
do
arr=($id)
sample=${arr[0]}
sample_name=${arr[1]}
samtools view $sample |awk '{print $9}'  > ${sample_name}_length.txt
done

然后我們得到四個(gè)bam文件的indel插入長(zhǎng)度信息

-rw-r--r-- 1   24M Oct 12 15:27 2-cell-5.last_length.txt
-rw-r--r-- 1   19M Oct 12 15:27 2-cell-4.last_length.txt
-rw-r--r-- 1   32M Oct 12 15:26 2-cell-2.last_length.txt
-rw-r--r-- 1   22M Oct 12 15:26 2-cell-1.last_length.txt

cmd=commandArgs(trailingOnly=TRUE); 
input=cmd[1]; output=cmd[2]; 
a=abs(as.numeric(read.table(input)[,1])); 
png(file=output);
hist(a,
main="Insertion Size distribution",
ylab="Read Count",xlab="Insert Size",
xaxt="n",
breaks=seq(0,max(a),by=10)
); 

axis(side=1,
at=seq(0,max(a),by=100),
labels=seq(0,max(a),by=100)
);

dev.off()

準(zhǔn)備一個(gè)用于R語(yǔ)言批量繪制indel分布的文本輸入文件config.indel_length_distribution

2-cell-1.last_length.txt 2-cell-1.last_length
2-cell-2.last_length.txt 2-cell-2.last_length
2-cell-4.last_length.txt 2-cell-4.last_length
2-cell-5.last_length.txt 2-cell-5.last_length

有了上面的文件就可以批量檢驗(yàn)bam文件進(jìn)行出圖充坑。創(chuàng)建批量運(yùn)行的shell腳本indel_length_distribution.sh

cat config.indel_length_distribution  |while read id;
do
arr=($id)
input=${arr[0]}
output=${arr[1]}
Rscript indel_length_distribution.R $input $output
done

image.png

FRiP值的計(jì)算：fraction of reads in called peak regions

單個(gè)舉例

bedtools intersect -a 2-cell-1.bed -b 2-cell-1_peaks.narrowPeak |wc -l
148210
wc  -l 2-cell-1.bed
5105850 
# 故2-cell-1的FRiP為
148210/5105850 = 0.0292

批量計(jì)算 FRiP, 創(chuàng)建sh文件01_FRiP.sh

cd ~/project/atac/peaks
ls *narrowPeak|while  read id;
do
echo $id
bed=$(basename $id "_peaks.narrowPeak").bed
#ls  -lh $bed 
Reads=$(bedtools intersect -a $bed -b $id |wc -l|awk '{print $1}')
totalReads=$(wc -l $bed|awk '{print $1}')
echo $Reads  $totalReads 
echo '==> FRiP value:' $(bc <<< "scale=2;100*$Reads/$totalReads")'%'
done

運(yùn)行結(jié)果為

$ sh 01_FRiP.sh 
2-cell-1_peaks.narrowPeak
148210 5105850
==> FRiP value: 2.90%
2-cell-2_peaks.narrowPeak
320407 7292154
==> FRiP value: 4.39%
2-cell-4_peaks.narrowPeak
90850 4399720
==> FRiP value: 2.06%
2-cell-5_peaks.narrowPeak
258988 5466482
==> FRiP value: 4.73%

健明記錄：
- Fraction of reads in peaks (FRiP) - Fraction of all mapped reads that fall into the called peak regions, i.e. usable reads in significantly enriched peaks divided by all usable reads. In general, FRiP scores correlate positively with the number of regions. (Landt et al, Genome Research Sept. 2012, 22(9): 1813–1831)
- 文章其它指標(biāo)：https://www.nature.com/articles/sdata2016109/tables/4

可以使用R包看不同peaks文件的overlap情況

注意這里由于R版本是3.5.1减江，所以需要GCC版本要大于等于4.8，由于本服務(wù)器系統(tǒng)版本為4.7捻爷，所以需更改GCC版本辈灼，通過(guò)module命令調(diào)用更高的GCC版本
裝包不成功，暫時(shí)不管

source /public/home/software/.bashrc
module load GCC/5.4.0-2.26
source activate atac

將narrowPeak文件傳入到本地也榄，使用本地R進(jìn)行可視化

options(BioC_mirror="https://mirrors.ustc.edu.cn/bioc/") 
options("repos" = c(CRAN="https://mirrors.tuna.tsinghua.edu.cn/CRAN/"))
source("http://bioconductor.org/biocLite.R") 
library('BiocInstaller')
# biocLite("ChIPpeakAnno")
# biocLite("ChIPseeker")
library(ChIPseeker)
library(ChIPpeakAnno)
setwd("E://desktop/sept/ATAC-seq_practice/find_peaks_overlaping/")
list.files('./',"*.narrowPeak")
tmp = lapply(list.files('./',"*.narrowPeak"),function(x){
  return(readPeakFile(file.path('./', x)))
  })
tmp
ol <- findOverlapsOfPeaks(tmp[[1]],tmp[[2]]) # 這里選取的是第一個(gè)文件和第二個(gè)文件巡莹，即cell.1_peak_1和cell.2_peak
png('overlapVenn.png')
makeVennDiagram(ol)
dev.off()

image.png

也可以使用專業(yè)軟件，IDR 來(lái)進(jìn)行計(jì)算出來(lái)甜紫，同時(shí)考慮peaks間的overlap降宅，和富集倍數(shù)的一致性

conda  create -n py3 -y   python=3 idr
conda activate py3
idr -h 
idr --samples  2-cell-1_peaks.narrowPeak 2-cell-2_peaks.narrowPeak  --plot

第七步，deeptools的可視化

具體見(jiàn)：https://mp.weixin.qq.com/s/a4qAcKE1DoukpLVV_ybobA 在ChiP-seq 講解囚霸。
首先把bam文件轉(zhuǎn)為bw文件腰根，詳情：http://www.bio-info-trainee.com/1815.html

cd  ~/project/atac/deeptools_result
#source activate atac # 由于原本電腦存在deeptools所以就沒(méi)必要激活了
#ls  *.bam  |xargs -i samtools index {} 
ls *last.bam |while read id;do
nohup bamCoverage -p 5 --normalizeUsingRPKM -b $id -o ${id%%.*}.last.bw &
done

# cd dup 
# ls  *.bam  |xargs -i samtools index {} 
# ls *.bam |while read id;do
# nohup bamCoverage --normalizeUsing CPM -b $id -o ${id%%.*}.rm.bw & 
# done

mm10的Refgene文件的下載，
- 第一種參考鏈接：ChIP-seq基礎(chǔ)入門學(xué)習(xí)

curl 'http://genome.ucsc.edu/cgi-bin/hgTables?hgsid=646311755_P0RcOBvAQnWZSzQz2fQfBiPPSBen&boolshad.hgta_printCustomTrackHeaders=0&hgta_ctName=tb_ncbiRefSeq&hgta_ctDesc=table+browser+query+on+ncbiRefSeq&hgta_ctVis=pack&hgta_ctUrl=&fbQual=whole&fbUpBases=200&fbExonBases=0&fbIntronBases=0&fbDownBases=200&hgta_doGetBed=get+BED' >mm10.refseq.bed

第二種：從http://genome.ucsc.edu/cgi-bin/hgTables下載

image.png
這里選取了第二種方法得到的bed文件進(jìn)行后續(xù)可視化

查看TSS附件信號(hào)強(qiáng)度：創(chuàng)建07_deeptools_TSS.sh

depptools 使用說(shuō)明

## both -R and -S can accept multiple files 
mkdir -p  ~/project/atac/tss
cd   ~/project/atac/tss 
# source activate atac # 由于我這里自己系統(tǒng)有就沒(méi)調(diào)用了
computeMatrix reference-point  --referencePoint TSS  -p 15  \
-b 10000 -a 10000    \
-R ~/project/atac/mm10_Refgene/Refseq.bed  \
-S ~/project/atac/deeptools_result/*.bw  \
--skipZeros  -o matrix1_test_TSS.gz  \
--outFileSortedRegions regions1_test_genes.bed

##     both plotHeatmap and plotProfile will use the output from   computeMatrix
plotHeatmap -m matrix1_test_TSS.gz  -out test_Heatmap.png
plotHeatmap -m matrix1_test_TSS.gz  -out test_Heatmap.pdf --plotFileFormat pdf  --dpi 720  
plotProfile -m matrix1_test_TSS.gz  -out test_Profile.png
plotProfile -m matrix1_test_TSS.gz  -out test_Profile.pdf --plotFileFormat pdf --perGroup --dpi 720

出圖展示

image.png

image.png

image.png
查看基因body的信號(hào)強(qiáng)度,創(chuàng)建07_deeptools_Body.sh

#source activate atac
mkdir Body
cd ~/project/atac/Body
computeMatrix scale-regions  -p 15  \
-R ~/project/atac/mm10_Refgene/Refseq.bed  \
-S ~/project/atac/deeptools_result/*.bw  \
-b 10000 -a 10000  \
--skipZeros -o matrix1_test_body.gz
# plotHeatmap -m matrix1_test_body.gz  -out ExampleHeatmap1.png
plotHeatmap -m matrix1_test_body.gz  -out test_body_Heatmap.png
plotProfile -m matrix1_test_body.gz  -out test_body_Profile.png
plotProfile -m matrix1_test_body.gz -out test_Body_Profile.pdf --plotFileFormat pdf --perGroup --dpi 720

出圖展示拓型，注意下圖可以看出body區(qū)明顯的要短于兩側(cè)额嘿，如果要調(diào)整寬度瘸恼，可自行調(diào)整以下參數(shù)
- --regionBodyLength
- --binSize
- 參考官網(wǎng)參數(shù)含義：https://deeptools.readthedocs.io/en/develop/content/tools/computeMatrix.html
  
  image.png
  
  image.png
  
  image.png
ngsplot也是一個(gè)畫(huà)profiler圖的利器。

第八步：peaks注釋

參考老版本教程鏈接：CS3: peak注釋
peaks區(qū)間注釋分布

options(BioC_mirror="https://mirrors.ustc.edu.cn/bioc/") 
options("repos" = c(CRAN="https://mirrors.tuna.tsinghua.edu.cn/CRAN/"))
source("http://bioconductor.org/biocLite.R") 
library('BiocInstaller')
biocLite("ChIPpeakAnno")
library(ChIPpeakAnno)
setwd("E://desktop/sept/ATAC-seq_practice/peaks_annotaion/")
biocLite("TxDb.Mmusculus.UCSC.mm10.knownGene")
biocLite("org.Mm.eg.db")
txdb <- TxDb.Mmusculus.UCSC.mm10.knownGene
promoter <- getPromoters(TxDb=txdb, 
                         upstream=3000, downstream=3000)
files = list(cell_1_summits = "2-cell-1_summits.bed", cell_2_summits = "2-cell-2_summits.bed",
                   cell_4_summits = "2-cell-4_summits.bed", cell_5_summits = "2-cell-5_summits.bed")
peakAnno <- annotatePeak(files[[1]], # 分別改成2或者3或者4即可册养，分別對(duì)應(yīng)四個(gè)文件
                         tssRegion=c(-3000, 3000),
                         TxDb=txdb, annoDb="org.Hs.eg.db")
plotAnnoPie(peakAnno)

image.png

plotAnnoBar來(lái)展示

plotAnnoBar(peakAnno)

image.png

vennpie來(lái)展示

vennpie(peakAnno)

image.png

upsetplot來(lái)展示

upsetplot(peakAnno)
upsetplot(peakAnno, vennpie=TRUE)

image.png

vennpie=TRUE

ChIPseeker還注釋了最近的基因东帅，peak離最近基因的距離分布是什么樣子的？ChIPseeker提供了plotDistToTSS函數(shù)來(lái)畫(huà)這個(gè)分布：

plotDistToTSS(peakAnno,
              title="Distribution of transcription factor-binding loci\nrelative to TSS")

image.png

plotAnnoBar和plotDistToTSS這兩個(gè)柱狀圖都支持多個(gè)數(shù)據(jù)同時(shí)展示球拦，方便比較靠闭，比如：

peakAnnoList <- lapply(files, annotatePeak, 
                       TxDb=txdb,tssRegion=c(-3000, 3000))
plotAnnoBar(peakAnnoList)

image.png

批量繪制距離TSS的百分比圖

plotDistToTSS(peakAnnoList)

image.png

ChIPseeker還提供了一個(gè)vennplot函數(shù)，比如我想看注釋的最近基因在不同樣本中的overlap：

genes <- lapply(peakAnnoList, function(i) 
    as.data.frame(i)$geneId)
vennplot(genes[2:4], by='Vennerable')

ChIPseeker還提供了一個(gè)vennplot函數(shù)刘莹，比如我想看注釋的最近基因在不同樣本中的overlap：

genes <- lapply(peakAnnoList, function(i) 
    as.data.frame(i)$geneId)
vennplot(genes[2:4], by='Vennerable')

新版本ChIPpeakAnno()可視化后續(xù)再補(bǔ)上
- 參考鏈接：The ChIPpeakAnno user’s guide
  
  image.png

使用hommer進(jìn)行注釋

perl ~/miniconda3/envs/atac/share/homer-4.9.1-6/configureHomer.pl  -install mm10  # 此不能再計(jì)算節(jié)點(diǎn)運(yùn)行阎毅，需先在登陸節(jié)點(diǎn)運(yùn)行下載
## 保證數(shù)據(jù)庫(kù)下載是OK
# ls -lh  ~/miniconda3/envs/atac/share/homer-4.9.1-5/data/genomes  
# source activate atac
mkdir hommer_anno
cp peaks/*narrowPeak hommer_anno/
cd   ~/project/atac/hommer_anno  
ls *.narrowPeak |while read id;
do 
echo $id
awk '{print $4"\t"$1"\t"$2"\t"$3"\t+"}' $id >${id%%.*}.homer_peaks.tmp
annotatePeaks.pl  {id%%.*}.homer_peaks.tmp mm10  1>${id%%.*}.peakAnn.xls
  2>${id%%.*}.annLog.txt
done

然后將peakAnn.xls文件導(dǎo)入到本地，通過(guò)excel透視功能進(jìn)行可視化
用到的函數(shù)COUNTIF(), SUM()
這里還用到一個(gè)用來(lái)上傳gif文件点弯，生成鏈接的在線網(wǎng)頁(yè) http://thyrsi.com/
操作gif動(dòng)圖

image.png

第九步扇调，motif尋找及注釋

創(chuàng)建08_hommer_motif.sh 文件

# mkdir -p  ~/project/atac/motif
cd   ~/project/atac/motif
# source activate atac
ls ../peaks/*.narrowPeak |while read id;
do
file=$(basename $id )
sample=${file%%.*}
echo $sample 
awk '{print $4"\t"$1"\t"$2"\t"$3"\t+"}' $id > ${sample}.homer_peaks.tmp
nohup findMotifsGenome.pl ${sample}.homer_peaks.tmp  mm10 ${sample}_motifDir -len 8,10,12  &
done

homerResults

knownResults

第十步，差異peaks分析

diffbind
DESeq2等后續(xù)進(jìn)行分析

寫(xiě)在最后的話抢肛，通過(guò)此流程的實(shí)踐狼钮，再次學(xué)到很多新的知識(shí)點(diǎn)以及腳本操作。

批量運(yùn)行命令以及可視化
簡(jiǎn)單R腳本程序的編寫(xiě)

最后編輯于：2019.03.15 16:57:32

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末捡絮，一起剝皮案震驚了整個(gè)濱河市熬芜，隨后出現(xiàn)的幾起案子，更是在濱河造成了極大的恐慌福稳，老刑警劉巖涎拉，帶你破解...
沈念sama閱讀 219,427評(píng)論 6贊 508
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件，死亡現(xiàn)場(chǎng)離奇詭異的圆，居然都是意外死亡鼓拧，警方通過(guò)查閱死者的電腦和手機(jī)，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 93,551評(píng)論 3贊 395
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門越妈，熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)季俩，“玉大人，你說(shuō)我怎么就攤上這事梅掠∽米。” “怎么了？”我有些...
開(kāi)封第一講書(shū)人閱讀 165,747評(píng)論 0贊 356
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵阎抒，是天一觀的道長(zhǎng)酪我。經(jīng)常有香客問(wèn)我，道長(zhǎng)且叁，這世上最難降的妖魔是什么祭示？我笑而不...
開(kāi)封第一講書(shū)人閱讀 58,939評(píng)論 1贊 295
?港島之戀（遺憾婚禮）
正文為了忘掉前任，我火速辦了婚禮，結(jié)果婚禮上质涛，老公的妹妹穿的比我還像新娘稠歉。我一直安慰自己，他們只是感情好汇陆，可當(dāng)我...
茶點(diǎn)故事閱讀 67,955評(píng)論 6贊 392
惡毒庶女頂嫁案：這布局不是一般人想出來(lái)的
文/花漫我一把揭開(kāi)白布怒炸。她就那樣靜靜地躺著，像睡著了一般毡代。火紅的嫁衣襯著肌膚如雪阅羹。梳的紋絲不亂的頭發(fā)上，一...
開(kāi)封第一講書(shū)人閱讀 51,737評(píng)論 1贊 305
城市分裂傳說(shuō)
那天教寂，我揣著相機(jī)與錄音捏鱼，去河邊找鬼。笑死酪耕，一個(gè)胖子當(dāng)著我的面吹牛导梆，可吹牛的內(nèi)容都是我干的。我是一名探鬼主播迂烁，決...
沈念sama閱讀 40,448評(píng)論 3贊 420
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開(kāi)眼看尼，長(zhǎng)吁一口氣：“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼！你這毒婦竟也來(lái)了盟步？” 一聲冷哼從身側(cè)響起藏斩，我...
開(kāi)封第一講書(shū)人閱讀 39,352評(píng)論 0贊 276
萬(wàn)榮殺人案實(shí)錄
序言：老撾萬(wàn)榮一對(duì)情侶失蹤，失蹤者是張志新（化名）和其女友劉穎却盘，沒(méi)想到半個(gè)月后狰域，有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體，經(jīng)...
沈念sama閱讀 45,834評(píng)論 1贊 317
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡黄橘，尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 37,992評(píng)論 3贊 338
?白月光啟示錄
正文我和宋清朗相戀三年北专，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片旬陡。...
茶點(diǎn)故事閱讀 40,133評(píng)論 1贊 351
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡，死狀恐怖语婴，靈堂內(nèi)的尸體忽然破棺而出描孟，到底是詐尸還是另有隱情，我是刑警寧澤砰左，帶...
沈念sama閱讀 35,815評(píng)論 5贊 346
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布匿醒，位于F島的核電站，受9級(jí)特大地震影響缠导，放射性物質(zhì)發(fā)生泄漏廉羔。R本人自食惡果不足惜，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 41,477評(píng)論 3贊 331
男人毒藥：我在死后第九天來(lái)索命
文/蒙蒙一僻造、第九天我趴在偏房一處隱蔽的房頂上張望憋他。院中可真熱鬧孩饼，春花似錦、人聲如沸竹挡。這莊子的主人今日做“春日...
開(kāi)封第一講書(shū)人閱讀 32,022評(píng)論 0贊 22
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽(yáng)揪罕。三九已至梯码，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間好啰，已是汗流浹背轩娶。一陣腳步聲響...
開(kāi)封第一講書(shū)人閱讀 33,147評(píng)論 1贊 272
情欲美人皮
我被黑心中介騙來(lái)泰國(guó)打工，沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留框往，地道東北人鳄抒。一個(gè)月前我還...
沈念sama閱讀 48,398評(píng)論 3贊 373
代替公主和親
正文我出身青樓，卻偏偏與公主長(zhǎng)得像搅窿，于是被迫代替她去往敵國(guó)和親嘁酿。傳聞我的和親對(duì)象是個(gè)殘疾皇子，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 45,077評(píng)論 2贊 355

ATAC-seq分析實(shí)操生信技能樹(shù)健明教程

數(shù)據(jù)來(lái)源的文章：

數(shù)據(jù)的GEO號(hào)：GSE66581

配置環(huán)境之軟件的安裝预皇，這里首推通過(guò)conda來(lái)創(chuàng)建一個(gè)project專屬的環(huán)境

數(shù)據(jù)的下載塞帐，由于原文數(shù)據(jù)太多纫溃，這里選取了四組數(shù)據(jù)來(lái)進(jìn)行練習(xí)

數(shù)據(jù)下載

第一步：將sra文件轉(zhuǎn)換成fastq文件

第二步痰娱，測(cè)序數(shù)據(jù)的過(guò)濾

第三步，數(shù)據(jù)質(zhì)量的檢測(cè)

第四步遍坟，比對(duì)

第五步初烘，使用macs2找peaks

第六步涡真，計(jì)算插入片段長(zhǎng)度，F(xiàn)RiP值肾筐，IDR計(jì)算重復(fù)情況

第七步，deeptools的可視化

第八步：peaks注釋

第九步扇调，motif尋找及注釋

第十步，差異peaks分析

寫(xiě)在最后的話抢肛，通過(guò)此流程的實(shí)踐狼钮，再次學(xué)到很多新的知識(shí)點(diǎn)以及腳本操作。

推薦閱讀更多精彩內(nèi)容