encode eclip 數(shù)據(jù)分析流程

# 準(zhǔn)備:

測(cè)序數(shù)據(jù):fastq格式幌羞;RBFOX2 HepG2測(cè)序數(shù)據(jù) - RBFOX2

STAR的基因組索引:UCSC下載fasta格式

STAR的重復(fù)元件索引:RepBase下載fasta格式

barcodes文件:fasta格式

chrom.sizes文件:UCSC下載,tab分割硼一,第一列染色體名扯夭,第二列染色體長(zhǎng)度鳍贾,hg19 chrom.sizes

# 流程概述:

  1. eclipdemux用于PE測(cè)序的樣本拆分和提取UMI;umi_tools用于SE測(cè)序的UMI提取

  2. cutadapt修剪adapters

  3. STAR比對(duì)到重復(fù)元件和篩選

  4. 比對(duì)篩選之后的測(cè)序數(shù)據(jù)到基因組

  5. 去除PCR產(chǎn)生的重復(fù):SE測(cè)序使用umi_tools交洗,常規(guī)工具可以使用barcodecollapsepe.py

  6. (paired-end only) Merges multiple inline barcodes and filters R1 (uses only R2 for peak calling)

  7. Calls enriched peak regions (peak clusters) with CLIPPER

  8. Uses size-matched input sample to normalize and calculate fold-change enrichment within enriched peak regions with custom perl scripts (overlap_peakfi_with_bam_PE.pl, peakscompress.pl)

## 1. eclipdemux用于PE測(cè)序的樣本拆分和提取UMI;umi_tools用于SE測(cè)序的UMI提取

# SE 鑒定UMI
umi_tools extract \
--random-seed 1 \
--bc-pattern NNNNNNNNNN \
--log EXAMPLE_SE.rep1_clip.---.--.metrics \ --stdin file_R1.fastq.gz \
--stdout EXAMPLE_SE.rep1.umi.r1.fq

# PE 樣本拆分和鑒定UMI
eclipdemux \
--metrics EXAMPLE_PE.rep1_clip.---.--.metrics \ --expectedbarcodeida C01 \ --expectedbarcodeidb D8f \
--fastq_1 file_R1.fastq.gz \
--fastq_2 file_R2.fastq.gz \
--newname rep2_clip \
--dataset EXAMPLE_PE \
--barcodesfile yeolabbarcodes_20170101.fasta \ --length 5

## ## 2. Cutadapt 去除adapters

因?yàn)榭赡馨l(fā)生兩次adapter連接事件橡淑,所以進(jìn)行兩次Cutadapt剪切

Cutadapt round 1:

cutadapt \
-f fastq \
--match-read-wildcards \
--times 1 \
-e 0.1 \
-O 1 \
--quality-cutoff 6 \
-m 18 \
-a NNNNNAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC \ -g CTTCCGATCTACAAGTT \
-g CTTCCGATCTTGGTCCT \
-A AACTTGTAGATCGGA \
-A AGGACCAAGATCGGA \
-A ACTTGTAGATCGGAA \
-A GGACCAAGATCGGAA \
-A CTTGTAGATCGGAAG \
-A GACCAAGATCGGAAG \
-A TTGTAGATCGGAAGA \
-A ACCAAGATCGGAAGA \
-A TGTAGATCGGAAGAG \
-A CCAAGATCGGAAGAG \
-A GTAGATCGGAAGAGC \
-A CAAGATCGGAAGAGC \
-A TAGATCGGAAGAGCG \
-A AAGATCGGAAGAGCG \
-A AGATCGGAAGAGCGT \
-A GATCGGAAGAGCGTC \
-A ATCGGAAGAGCGTCG \
-A TCGGAAGAGCGTCGT \
-A CGGAAGAGCGTCGTG \
-A GGAAGAGCGTCGTGT \
-o EXAMPLE_PE.rep2_clip.C01.r1.fqTr.fq \
-p EXAMPLE_PE.rep2_clip.C01.r2.fqTr.fq \ EXAMPLE_PE.rep2_clip.C01.r1.fq.gz \ EXAMPLE_PE.rep2_clip.C01.r2.fq.gz

Fastqc round 1

fastqc -t 2 --extract -k 7 EXAMPLE_PE.rep2_clip.C01.r1.fqTr.fq -o . fastqc -t 2 --extract -k 7 EXAMPLE_PE.rep2_clip.C01.r2.fqTr.fq –o .

Cutadapt round 2

cutadapt \
-f fastq \
--match-read-wildcards \
--times 1 \
-e 0.1 \
-O 5 \
--quality-cutoff 6 \
-m 18 \
-A AACTTGTAGATCGGA \
-A AGGACCAAGATCGGA \
-A ACTTGTAGATCGGAA \
-A GGACCAAGATCGGAA \
-A CTTGTAGATCGGAAG \
-A GACCAAGATCGGAAG \
-A TTGTAGATCGGAAGA \
-A ACCAAGATCGGAAGA \
-A TGTAGATCGGAAGAG \
-A CCAAGATCGGAAGAG \
-A GTAGATCGGAAGAGC \
-A CAAGATCGGAAGAGC \
-A TAGATCGGAAGAGCG \
-A AAGATCGGAAGAGCG \
-A AGATCGGAAGAGCGT \
-A GATCGGAAGAGCGTC \
-A ATCGGAAGAGCGTCG \
-A TCGGAAGAGCGTCGT \
-A CGGAAGAGCGTCGTG \
-A GGAAGAGCGTCGTGT \
-o EXAMPLE_PE.rep2_clip.C01.r1.fqTrTr.fq \ -p EXAMPLE_PE.rep2_clip.C01.r2.fqTrTr.fq \ EXAMPLE_PE.rep2_clip.C01.r1.fqTr.fq \ EXAMPLE_PE.rep2_clip.C01.r2.fqTr.fq

Fastqc round 2

fastqc -t 2 --extract -k 7 EXAMPLE_PE.rep2_clip.C01.r1.fqTrTr.fq -o . fastqc -t 2 --extract -k 7 EXAMPLE_PE.rep2_clip.C01.r2.fqTrTr.fq –o .

Fastq-sort

fastq-sort --id EXAMPLE_PE.rep2_clip.C01.r1.fqTrTr.fq > EXAMPLE_PE.rep2_clip.C01.r1.fqTrTr.sorted.fq
fastq-sort --id EXAMPLE_PE.rep2_clip.C01.r2.fqTrTr.fq > EXAMPLE_PE.rep2_clip.C01.r2.fqTrTr.sorted.fq

## 3. STAR比對(duì)到重復(fù)元件和篩選

#STAR rmRe
STAR \
--runMode alignReads \
--runThreadN 8 \
--genomeDir homo_sapiens_repbase_v2 \
--genomeLoad NoSharedMemory \
--alignEndsType EndToEnd \
--outSAMunmapped Within \
--outFilterMultimapNmax 30 \
--outFilterMultimapScoreRange 1 \
--outFileNamePrefix EXAMPLE_PE.rep2_clip.C01.r1.fqTrTr.sorted.STAR \ --outSAMtype BAM Unsorted \
--outFilterType BySJout \
--outBAMcompression 10 \
--outReadsUnmapped Fastx \
--outFilterScoreMin 10 \
--outSAMattrRGline ID:foo \
--outSAMattributes All \
--outSAMmode Full \
--outStd Log \
--readFilesIn EXAMPLE_PE.rep2_clip.C01.r1.fqTrTr.sorted.fq EXAMPLE_PE.rep2_clip.C01.r2.fqTrTr.sorted.fq

#Re-name files: re-name repeat-mapped outputs
mv EXAMPLE_PE.rep2_clip.C01.r1.fqTrTr.sorted.STARAligned.out.bam
EXAMPLE_PE.rep2_clip.C01.r1.fq.repeat-mapped.bam
mv EXAMPLE_PE.rep2_clip.C01.r1.fqTrTr.sorted.STARUnmapped.out.mate1 EXAMPLE_PE.rep2_clip.C01.r1.fq.repeat-unmapped.fq
mv EXAMPLE_PE.rep2_clip.C01.r1.fqTrTr.sorted.STARUnmapped.out.mate2 EXAMPLE_PE.rep2_clip.C01.r2.fq.repeat-unmapped.fq

# Fastq-sort
fastq-sort --id EXAMPLE_PE.rep2_clip.C01.r1.fq.repeat-unmapped.fq > EXAMPLE_PE.rep2_clip.C01.r1.fq.repeat-unmapped.sorted.fq fastq-sort --id EXAMPLE_PE.rep2_clip.C01.r2.fq.repeat-unmapped.fq > EXAMPLE_PE.rep2_clip.C01.r2.fq.repeat-unmapped.sorted.fq

## 4. 比對(duì)篩選之后的測(cè)序數(shù)據(jù)到基因組

#STAR genome mapping: Takes output from STAR rmRep. Maps unique reads to the human genome

STAR \
--runMode alignReads \
--runThreadN 8 \
--genomeDir /stage/hg19_star_sjdb \
--genomeLoad NoSharedMemory \
--readFilesIn \
EXAMPLE_PE.rep2_clip.C01.r1.fq.repeat-unmapped.sorted.fq \ EXAMPLE_PE.rep2_clip.C01.r2.fq.repeat-unmapped.sorted.fq \
--outSAMunmapped Within \
--outFilterMultimapNmax 1 \
--outFilterMultimapScoreRange 1 \
--outFileNamePrefix EXAMPLE_PE.rep2_clip.C01.r1.fq.repeat-unmapped.sorted.STAR \
--outSAMattributes All \
--outSAMtype BAM Unsorted \
--outFilterType BySJout \
--outReadsUnmapped Fastx \
--outFilterScoreMin 10 \
--outSAMattrRGline ID:foo \
--outStd Log \
--alignEndsType EndToEnd \
--outBAMcompression 10 \
--outSAMmode Full

# Re-name BAM: rename genome-mapped outputs
mv EXAMPLE_PE.rep2_clip.C01.r1.fq.repeat-unmapped.sorted.STARAligned.out.bam EXAMPLE_PE.rep2_clip.C01.r1.fq.genome-mapped.bam

# Name sort BAM: sort output from STAR by name to ensure read pairs are adjacent. samtools sort -n -o EXAMPLE_PE.rep2_clip.C01.r1.fq.genome-mappedSo.bam
EXAMPLE_PE.rep2_clip.C01.r1.fq.genome-mapped.bam

## 5. 去除PCR產(chǎn)生的重復(fù):SE測(cè)序使用umi_tools构拳,常規(guī)工具可以使用barcodecollapsepe.py

# Barcode_collapse_pe (PE): takes output from STAR genome mapping. 
barcodecollapsepe.py \
-o EXAMPLE_PE.rep2_clip.C01.r1.fq.genome-mappedSo.rmDup.bam \
-m EXAMPLE_PE.rep2_clip.C01.r1.fq.genome-mappedSo.rmDup.metrics \ -b EXAMPLE_PE.rep2_clip.C01.r1.fq.genome-mappedSo.bam

# Position sort BAM: Takes output from barcode collapse PE (or from SE namesort bam). Sorts resulting bam file for use downstream.
samtools sort -o EXAMPLE_PE.rep2_clip.C01.r1.fq.genome- mappedSo.rmDupSo.bam EXAMPLE_PE.rep2_clip.C01.r1.fq.genome- mappedSo.rmDup.bam

# Barcode_collapse_se (SE): takes output from STAR genome mapping. Use umi_tools dedup to identify the extracted random-mer from the previous step and perform PCR duplicate removal.
umi_tools dedup \
--random-seed 1 \
---I EXAMPLE_SE.rep1_clip.umi.r1.fq.genome-mappedSoSo.bam \
--method unique \
--output-stats EXAMPLE_SE.rep1_clip.umi.r1.fq.genome-mappedSoSo.txt \ -S EXAMPLE_SE.rep1_clip.umi.r1.fq.genome-mappedSoSo.rmDup.bam

#Samtools index: Takes output from sortSam, makes bam index for use downstream.
samtools index EXAMPLE_PE.rep2_clip.C01.r1.fq.genome-mappedSo.rmDupSo.bam

## 6. (paired-end only) Merges multiple inline barcodes and filters R1 (uses only R2 for peak calling)

# Samtools merge (PE only): Takes inputs from multiple final bam files. Merges the two technical replicates for further downstream analysis.
samtools merge EXAMPLE_PE.rep2_clip.C01.r1.fq.genome- mappedSo.rmDupSo.merged.bam EXAMPLE_PE.rep2_clip.C01.r1.fq.genome- mappedSo.rmDupSo.bam EXAMPLE_PE.rep2_clip.D8f.r1.fq.genome- mappedSo.rmDupSo.bam

# Samtools index: Takes output from sortSam, makes bam index for use downstream. samtools index EXAMPLE_PE.rep2_clip.C01.r1.fq.genome- mappedSo.rmDupSo.merged.bam

## 7.Calls enriched peak regions (peak clusters) with CLIPPER

# Samtools view (PE only): Takes output from samtools merge. Only outputs the second read in each pair for use with a single stranded peak caller. This is the final bam file to perform analysis on.
# -f 128: 提取read pair的read 2
samtools view -f 128 -b -o EXAMPLE_PE.rep2_clip.C01.r1.fq.genome- mappedSo.rmDupSo.merged.r2.bam EXAMPLE_PE.rep2_clip.C01.r1.fq.genome- mappedSo.rmDupSo.merged.bam

# Make normalized read density bigwig files. Use --direction f for SE clip as reads are not reversed.
makebigwigfiles \
--bw_pos EXAMPLE_PE.rep2_clip.C01.r1.fq.genome- mappedSo.rmDupSo.merged.r2.norm.pos.bw \
--bw_neg EXAMPLE_PE.rep2_clip.C01.r1.fq.genome- mappedSo.rmDupSo.merged.r2.norm.neg.bw \
--bam EXAMPLE_PE.rep2_clip.C01.r1.fq.genome-mappedSo.rmDupSo.merged.r2.bam \ --genome hg19.chrom.sizes \
--direction r

# Clipper: Takes results from samtools view. Calls peaks on those files.
clipper \
--species hg19 \
--bam EXAMPLE_PE.rep2_clip.C01.r1.fq.genome-mappedSo.rmDupSo.merged.r2.bam \ --save-pickle \
--outfile EXAMPLE_PE.rep2_clip.C01.r1.fq.genome- mappedSo.rmDupSo.merged.r2.peakClusters.bed

## 8. Uses size-matched input sample to normalize and calculate fold-change enrichment within enriched peak regions with custom perl scripts (overlap_peakfi_with_bam_PE.pl, peakscompress.pl)

# Input normalization: Compares the number of reads within the IP sample to the number of reads within the size-matched INPUT sample across Clipper-called peak clusters. This step is performed both within this pipeline as well as within the merge_peaks pipeline using the same perl scripts.

samtools view -cF 4 EXAMPLE_PE.rep2_clip.C01.r1.fq.genome- mappedSo.rmDupSo.merged.r2.bam > ip_mapped_readnum.txt samtools view -cF 4 EXAMPLE_PE.rep2_input.NIL.r1.fq.genome- mappedSo.rmDupSo.r2.bam > input_mapped_readnum.txt

overlap_peakfi_with_bam_PE.pl \ EXAMPLE_PE.rep2_clip.C01.r1.fq.genome-mappedSo.rmDupSo.merged.r2.bam \ EXAMPLE_PE.rep2_input.NIL.r1.fq.genome-mappedSo.rmDupSo.r2.bam \ EXAMPLE_PE.rep2_clip.C01.r1.fq.genome- mappedSo.rmDupSo.merged.r2.peakClusters.bed \
ip_mapped_readnum.txt \
input_mapped_readnum.txt \
EXAMPLE_PE.rep2_clip.C01.r1.fq.genome- mappedSo.rmDupSo.merged.r2.peakClusters.normed.bed

perl compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl \ EXAMPLE_PE.rep2_clip.C01.r1.fq.genome- mappedSo.rmDupSo.merged.r2.peakClusters.normed.bed \ EXAMPLE_PE.rep2_clip.C01.r1.fq.genome- mappedSo.rmDupSo.merged.r2.peakClusters.normed.compressed.bed

原文

eCLIP pipeline
eCLIP_analysisSOP_v2.2

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個(gè)濱河市,隨后出現(xiàn)的幾起案子置森,更是在濱河造成了極大的恐慌斗埂,老刑警劉巖,帶你破解...
    沈念sama閱讀 217,826評(píng)論 6 506
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件凫海,死亡現(xiàn)場(chǎng)離奇詭異呛凶,居然都是意外死亡,警方通過查閱死者的電腦和手機(jī)行贪,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 92,968評(píng)論 3 395
  • 文/潘曉璐 我一進(jìn)店門漾稀,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人建瘫,你說我怎么就攤上這事崭捍。” “怎么了啰脚?”我有些...
    開封第一講書人閱讀 164,234評(píng)論 0 354
  • 文/不壞的土叔 我叫張陵殷蛇,是天一觀的道長(zhǎng)。 經(jīng)常有香客問我橄浓,道長(zhǎng)粒梦,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 58,562評(píng)論 1 293
  • 正文 為了忘掉前任荸实,我火速辦了婚禮谍倦,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘泪勒。我一直安慰自己昼蛀,他們只是感情好,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,611評(píng)論 6 392
  • 文/花漫 我一把揭開白布圆存。 她就那樣靜靜地躺著叼旋,像睡著了一般。 火紅的嫁衣襯著肌膚如雪沦辙。 梳的紋絲不亂的頭發(fā)上夫植,一...
    開封第一講書人閱讀 51,482評(píng)論 1 302
  • 那天,我揣著相機(jī)與錄音油讯,去河邊找鬼详民。 笑死,一個(gè)胖子當(dāng)著我的面吹牛陌兑,可吹牛的內(nèi)容都是我干的沈跨。 我是一名探鬼主播,決...
    沈念sama閱讀 40,271評(píng)論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼兔综,長(zhǎng)吁一口氣:“原來是場(chǎng)噩夢(mèng)啊……” “哼饿凛!你這毒婦竟也來了狞玛?” 一聲冷哼從身側(cè)響起,我...
    開封第一講書人閱讀 39,166評(píng)論 0 276
  • 序言:老撾萬榮一對(duì)情侶失蹤涧窒,失蹤者是張志新(化名)和其女友劉穎心肪,沒想到半個(gè)月后,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體纠吴,經(jīng)...
    沈念sama閱讀 45,608評(píng)論 1 314
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡硬鞍,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,814評(píng)論 3 336
  • 正文 我和宋清朗相戀三年,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了戴已。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片固该。...
    茶點(diǎn)故事閱讀 39,926評(píng)論 1 348
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡,死狀恐怖恭陡,靈堂內(nèi)的尸體忽然破棺而出蹬音,到底是詐尸還是另有隱情,我是刑警寧澤休玩,帶...
    沈念sama閱讀 35,644評(píng)論 5 346
  • 正文 年R本政府宣布著淆,位于F島的核電站,受9級(jí)特大地震影響拴疤,放射性物質(zhì)發(fā)生泄漏永部。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,249評(píng)論 3 329
  • 文/蒙蒙 一呐矾、第九天 我趴在偏房一處隱蔽的房頂上張望苔埋。 院中可真熱鬧,春花似錦蜒犯、人聲如沸组橄。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,866評(píng)論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽玉工。三九已至,卻和暖如春淘菩,著一層夾襖步出監(jiān)牢的瞬間遵班,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 32,991評(píng)論 1 269
  • 我被黑心中介騙來泰國(guó)打工潮改, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留狭郑,地道東北人。 一個(gè)月前我還...
    沈念sama閱讀 48,063評(píng)論 3 370
  • 正文 我出身青樓汇在,卻偏偏與公主長(zhǎng)得像翰萨,于是被迫代替她去往敵國(guó)和親。 傳聞我的和親對(duì)象是個(gè)殘疾皇子趾疚,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 44,871評(píng)論 2 354

推薦閱讀更多精彩內(nèi)容