生物信息常用工具和網(wǎng)站

最近接二連三帶了不少實(shí)習(xí)生和輪轉(zhuǎn)生馁启，可以預(yù)見后面幾年實(shí)驗(yàn)室再有實(shí)習(xí)或者輪轉(zhuǎn)的十有八九應(yīng)該都是我?guī)А?br> 這一篇列舉一些生物信息部分常用工具和幾個(gè)神奇網(wǎng)站驾孔。基本上每個(gè)工具都給出一兩句（或中文或英文）簡(jiǎn)要功能介紹和官網(wǎng)地址惯疙。
師妹翠勉，你要的，都在這里了霉颠。

生物信息學(xué)常用工具

fastq格式相關(guān)

SRAtoolkit
- SRA數(shù)據(jù)庫(kù)下載公用數(shù)據(jù)時(shí)的工具
- https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc
fastx toolkit
- a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing
- 有各種各樣的小功能对碌，比如提取反向互補(bǔ)序列等等。
- http://hannonlab.cshl.edu/fastx_toolkit/
fastqc
- A quality control tool for high throughput sequence data
- 評(píng)估測(cè)序數(shù)據(jù)質(zhì)量
- https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
MultQC
- Aggregate results from bioinformatics analyses across many samples into a single report
- 一次同時(shí)生成多個(gè)數(shù)據(jù)質(zhì)量報(bào)告掉分，省時(shí)省力方便對(duì)比俭缓，支持fastqc
- https://github.com/ewels/MultiQC; http://multiqc.info/docs/
Trim Galore
- around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files,
- with some extra functionality for MspI-digested RRBS-type (Reduced Representation Bisufite-Seq) libraries
- 和fastqc出自一家，可以和fastqc結(jié)合使用酥郭，用來清洗原始數(shù)據(jù)华坦。
- https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/
Trimmomatic
- A flexible read trimming tool for Illumina NGS data
- 專門清洗illumina測(cè)序數(shù)據(jù)的工具
- http://www.usadellab.org/cms/index.php?page=trimmomatic
khmer
- working with DNA shotgun sequencing data from genomes, transcriptomes, metagenomes, and single cells.
- 可以對(duì)原始測(cè)序數(shù)據(jù)進(jìn)行過濾等
- http://khmer.readthedocs.io/en/v2.1.1/user/scripts.htm

BED格式相關(guān)

bedops
- 玩轉(zhuǎn)bed格式文件，速度比bedtools快
- the fast, highly scalable and easily-parallelizable genome analysis toolkit
- https://bedops.readthedocs.io/en/latest/index.html
bedtools
- 最知名的bed文件相關(guān)工具不从，但是和samtools并非出自一家
- a powerful toolset for genome arithmetic
- http://bedtools.readthedocs.io/en/latest/index.html

SAM/BAM

samtools
- 有這一個(gè)就夠了
- Utilities for the Sequence Alignment/Map (SAM) format
- http://www.htslib.org/doc/samtools.html

SNP（VCF/BCF）格式相關(guān)

GATK
- 使用率最高的軟件
- https://software.broadinstitute.org/gatk/documentation/
bcftools
- 對(duì)vcf格式的文件進(jìn)行各種操作
- utilities for variant calling and manipulating VCFs and BCFs
- http://www.htslib.org/doc/bcftools.html
vcftools
- 和bcftools類似
- https://vcftools.github.io/man_latest.html
snpEFF
- Genetic variant annotation and effect prediction toolbox
- 適合用來進(jìn)行snp注釋
- 用法 http://snpeff.sourceforge.net/SnpEff_manual.html
- http://snpeff.sourceforge.net/
- 也可以注釋ChIP-seq
- 支持非編碼注釋惜姐，如組蛋白修飾
samtools mpileup
- Utilities for the Sequence Alignment/Map (SAM) format
- http://www.htslib.org/doc/samtools.html

ChIP-seq/motif

peak calling

MACS
- Model-based Analysis of ChIP-Seq
- 主要用于組蛋白修飾產(chǎn)生的narrow peaks(H3K4me3 and H3K9/27ac)
- transcription factors which are usually associated with sharp and solated peaks
- http://liulab.dfci.harvard.edu/MACS/README.html
MACS2
- MACS的升級(jí)版本，也可以用來找broad peak
- https://github.com/taoliu/MACS
SICER
- 出來懟MACS椿息，主要用來找一些比較寬的peak,類似于H3K9me3 和 H3K36me3歹袁。
- highly recommended for a practical ChIP-seq experiment design and can be used to account for local biases resulting from read mappability, DNA repeats, local GC content
- https://www.genomatix.de/online_help/help_regionminer/sicer.html
后續(xù)分析可能會(huì)用到的工具

img
MAnorm
- http://bioinfo.sibs.ac.cn/shaolab/MAnorm/MAnorm.htm

large sequences alignment

長(zhǎng)序列比對(duì)常用的幾個(gè)軟件

MUMer
- rapid alignment of very large DNA and amino acid sequences
- http://mummer.sourceforge.net/examples/
- http://mummer.sourceforge.net/manual/
GMAP
- GMAP: A Genomic Mapping and Alignment Program for mRNA and EST Sequences
- http://research-pub.gene.com/gmap/
BLAT
- Blat produces two major classes of alignments:at the DNA level between two sequences that are of 95% or greater identity, but which may include large inserts；at the protein or translated DNA level between sequences that are of 80% or greater identity and may also include large inserts.
- https://genome.ucsc.edu/goldenpath/help/blatSpec.html

short reads alignment

短序列比寝优，二代測(cè)序數(shù)據(jù)比對(duì)

BWA
- Burrows-Wheeler Alignment Tool
- mapping low-divergent sequences against a large reference genome
- It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM.
- http://bio-bwa.sourceforge.net/bwa.shtml
- https://github.com/lh3/bwa
GSNAP:
- Genomic Short-read Nucleotide Alignment Program
- http://research-pub.gene.com/gmap/
Bowtie
- works best when aligning short reads to large genomes
- not yet report gapped alignments
- http://bowtie-bio.sourceforge.net/manual.shtml
Bowtie2
- 和上一代的區(qū)別在于支持gapped alignments
- ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences
- supports gapped, local, and paired-end alignment modes
- http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#reporting
HISAT2
- Tophat的繼任者条舔，基于HISAT和Bowtie2
- HISAT2的速度比STAR快一些
- http://ccb.jhu.edu/software/hisat2/manual.shtml
STAR
- Spliced Transcripts Alignment to a Reference
- https://github.com/alexdobin/STAR
- https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf

genome guide assemble

stringtie
- highly efficient assembler of RNA-Seq alignments into potential transcripts
- 對(duì)于可變剪切的發(fā)現(xiàn)相對(duì)準(zhǔn)確
- https://ccb.jhu.edu/software/stringtie/
Cufflinks
- 基本不用了
IDP
- Isoform Detection and Prediction tool
- gmap+hisat2,也就是長(zhǎng)短序列比對(duì)相結(jié)合，效果不錯(cuò)
- https://www.healthcare.uiowa.edu/labs/au/IDP/IDP_manual.asp

de novo assemble/gene prediction

下面幾個(gè)軟件結(jié)合起來就是一個(gè)從組裝到注釋再到計(jì)算拼接效率的過程

拼接

trintiy
- 傾向于預(yù)測(cè)長(zhǎng)的可變剪接
- 新版本從之前的過度預(yù)測(cè)越來越傾向于有所保留
- 比較耗資源乏矾，一般1個(gè)CPU最好分配6G-10G
- 可以有參或者無參轉(zhuǎn)錄組拼接
- https://github.com/trinityrnaseq/trinityrnaseq/wiki
oases
- 通常得到的N50比較高
- 檢測(cè)低表達(dá)的基因有一定優(yōu)勢(shì)
- De novo transcriptome assembler for very short reads
- https://github.com/dzerbino/oases

注釋

PASA（內(nèi)包括BLAT和GMAP）
- 得到拼接好的fasta文件后可以用pasa進(jìn)行基因結(jié)構(gòu)預(yù)測(cè)
- Gene Structure Annotation and Analysis Using PASA
- http://pasapipeline.github.io/
Maker
- 基因預(yù)測(cè)
- can be used for de novo annotation of newly sequenced genomes, for updating existing annotations to reflect new evidence, or just to combine annotations, evidence, and quality control statistics
- http://www.yandell-lab.org/software/maker.html

質(zhì)量檢測(cè)

TransRate
- 專業(yè)的拼接質(zhì)量評(píng)估軟件孟抗，有三種評(píng)估模式。
- reference free quality assessment of de novo transcriptome assemblies
- http://hibberdlab.com/transrate/
DETONATE
- DE novo TranscriptOme rNa-seq Assembly with or without the Truth Evaluation
- https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0553-5
BUSCO
- 它的評(píng)估模式和上面兩個(gè)不太一樣
- based on evolutionarily-informed expectations of gene content from near-universal single-copy orthologs

Estimating transcript abundance

可以分為基于比對(duì)和不基于比對(duì)兩種钻心，其中RSEM和eXpress是基于比對(duì)的凄硼，另外兩種是基于比對(duì)的。

RSEM
- RNA-Seq by Expectation-Maximization
- https://deweylab.github.io/RSEM/README.html
eXpress
- quantifying the abundances of a set of target sequences from sampled subsequences
- https://pachterlab.github.io/eXpress/overview.html
kallisto
- 快到飛起
- 豐度估計(jì)中樣本特異性和讀長(zhǎng)偏好性低
- quantifying abundances of transcripts from RNA-Seq data
- https://pachterlab.github.io/kallisto/
salmon
- 也是很快
- quantifying the expression of transcripts using RNA-seq data
- https://combine-lab.github.io/salmon/

Read count

htseq-count
- 數(shù)read, 有它就夠了
- http://htseq.readthedocs.io/en/release_0.9.1/

Difference expression

和之前的步驟對(duì)應(yīng)捷沸，這里也可以分為基于read數(shù)和基于組裝以及不急于比對(duì)三類工具摊沉。

limma
- 用于分析芯片數(shù)據(jù)
- Linear Models for Microarray Data
- http://bioconductor.org/packages/release/bioc/html/limma.html
DEseq
- http://bioconductor.org/packages/release/bioc/html/DESeq.html
DEseq2
- 效果在幾個(gè)工具中相對(duì)好
- http://bioconductor.org/packages/release/bioc/html/DESeq2.html
DEGseq
- Identify Differentially Expressed Genes from RNA-seq data
- http://www.bioconductor.org/packages/2.6/bioc/html/DEGseq.html
edgeR
- Empirical Analysis of Digital Gene Expression Data in R
- http://www.bioconductor.org/packages/release/bioc/html/edgeR.html
Ballgown
- 準(zhǔn)確度有時(shí)不是很好
- facilitate flexible differential expression analysis of RNA-Seq data
- organize, visualize, and analyze the expression measurements for your transcriptome assembly.
- https://github.com/alyssafrazee/ballgown
sleuth
- 用來配合kallisto使用
- https://pachterlab.github.io/sleuth/about

Data visualization

數(shù)據(jù)可視化的工具可以分為本地版本和在線版本

IGV
- 本地展示分析結(jié)果的不二選擇
- Integrative Genomics Viewer
- http://software.broadinstitute.org/software/igv/
jbrowse
- 公開展示數(shù)據(jù)或者給合作者分享時(shí)的不二選擇，快且好看痒给。
- http://jbrowse.org/code/JBrowse-1.10.2/docs/tutorial/
DEIVA
- 差異表達(dá)的可視化在線工具
- Interactive Visual Analysis of differential gene expression test results
- http://hypercubed.github.io/DEIVA/
Heatmapper
- 用來話各種熱圖的在線工具
- expression-based heat maps
- pairwise distance maps
- correlation maps
- http://www.heatmapper.ca/
START
- 基于shinny的一套R(shí)NA-seq數(shù)據(jù)可視化工具
- visualize RNA-seq data starting with count data
- https://kcvi.shinyapps.io/START/

幾個(gè)神奇的網(wǎng)站

biostars
- https://www.biostars.org
R book
- http://r4ds.had.co.nz/
python guide
- http://docs.python-guide.org/en/latest/
bioptyhon
- http://biopython.org/DIST/docs/tutorial/Tutorial.html
Rosalind
- http://rosalind.info/problems/list-view/
bioinformatics tools
- https://omictools.com/
- https://bioinformatics.ca/links_directory/
data visualistion catalogue
- http://datavizcatalogue.com/index.html

暫時(shí)就寫這么多说墨，還有一些自己平時(shí)也很少用的就不放進(jìn)來給他人增加負(fù)擔(dān)了骏全，后面再補(bǔ)充。

最后編輯于：2018.07.08 21:38:31

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末婉刀，一起剝皮案震驚了整個(gè)濱河市吟温，隨后出現(xiàn)的幾起案子，更是在濱河造成了極大的恐慌突颊，老刑警劉巖鲁豪，帶你破解...
沈念sama閱讀 206,013評(píng)論 6贊 481
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件，死亡現(xiàn)場(chǎng)離奇詭異律秃，居然都是意外死亡爬橡，警方通過查閱死者的電腦和手機(jī)，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 88,205評(píng)論 2贊 382
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門棒动，熙熙樓的掌柜王于貴愁眉苦臉地迎上來糙申，“玉大人，你說我怎么就攤上這事船惨」衤悖” “怎么了？”我有些...
開封第一講書人閱讀 152,370評(píng)論 0贊 342
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵粱锐，是天一觀的道長(zhǎng)疙挺。經(jīng)常有香客問我，道長(zhǎng)怜浅，這世上最難降的妖魔是什么铐然？我笑而不...
開封第一講書人閱讀 55,168評(píng)論 1贊 278
?港島之戀（遺憾婚禮）
正文為了忘掉前任，我火速辦了婚禮恶座，結(jié)果婚禮上搀暑，老公的妹妹穿的比我還像新娘。我一直安慰自己跨琳，他們只是感情好自点，可當(dāng)我...
茶點(diǎn)故事閱讀 64,153評(píng)論 5贊 371
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布。她就那樣靜靜地躺著脉让，像睡著了一般樟氢。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發(fā)上侠鳄，一...
開封第一講書人閱讀 48,954評(píng)論 1贊 283
城市分裂傳說
那天，我揣著相機(jī)與錄音死宣，去河邊找鬼伟恶。笑死，一個(gè)胖子當(dāng)著我的面吹牛毅该，可吹牛的內(nèi)容都是我干的博秫。我是一名探鬼主播潦牛，決...
沈念sama閱讀 38,271評(píng)論 3贊 399
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長(zhǎng)吁一口氣：“原來是場(chǎng)噩夢(mèng)啊……” “哼挡育！你這毒婦竟也來了巴碗？” 一聲冷哼從身側(cè)響起，我...
開封第一講書人閱讀 36,916評(píng)論 0贊 259
萬榮殺人案實(shí)錄
序言：老撾萬榮一對(duì)情侶失蹤即寒，失蹤者是張志新（化名）和其女友劉穎橡淆，沒想到半個(gè)月后，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體母赵，經(jīng)...
沈念sama閱讀 43,382評(píng)論 1贊 300
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡逸爵，尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 35,877評(píng)論 2贊 323
?白月光啟示錄
正文我和宋清朗相戀三年，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了凹嘲。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片师倔。...
茶點(diǎn)故事閱讀 37,989評(píng)論 1贊 333
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡，死狀恐怖周蹭，靈堂內(nèi)的尸體忽然破棺而出趋艘，到底是詐尸還是另有隱情，我是刑警寧澤凶朗，帶...
沈念sama閱讀 33,624評(píng)論 4贊 322
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布瓷胧，位于F島的核電站，受9級(jí)特大地震影響俱尼，放射性物質(zhì)發(fā)生泄漏抖单。R本人自食惡果不足惜，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 39,209評(píng)論 3贊 307
男人毒藥：我在死后第九天來索命
文/蒙蒙一遇八、第九天我趴在偏房一處隱蔽的房頂上張望矛绘。院中可真熱鬧，春花似錦刃永、人聲如沸货矮。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,199評(píng)論 0贊 19
一樁弒父案斯够，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽(yáng)囚玫。三九已至，卻和暖如春读规，著一層夾襖步出監(jiān)牢的瞬間抓督，已是汗流浹背。一陣腳步聲響...
開封第一講書人閱讀 31,418評(píng)論 1贊 260
情欲美人皮
我被黑心中介騙來泰國(guó)打工束亏，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留铃在，地道東北人。一個(gè)月前我還...
沈念sama閱讀 45,401評(píng)論 2贊 352
代替公主和親
正文我出身青樓，卻偏偏與公主長(zhǎng)得像定铜，于是被迫代替她去往敵國(guó)和親阳液。傳聞我的和親對(duì)象是個(gè)殘疾皇子，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 42,700評(píng)論 2贊 345

生物信息常用工具和網(wǎng)站

生物信息學(xué)常用工具

fastq格式相關(guān)

BED格式相關(guān)

SAM/BAM

SNP（VCF/BCF）格式相關(guān)

ChIP-seq/motif

large sequences alignment

short reads alignment

genome guide assemble

de novo assemble/gene prediction

Estimating transcript abundance

Read count

Difference expression

Data visualization

幾個(gè)神奇的網(wǎng)站

推薦閱讀更多精彩內(nèi)容