Compatible Software
The following software packages are known to be compatible with PacBio? data, in addition to PacBio's own SMRT? Analysis suite. All packages are believed to be open source or freely available for non-commercial use. See the individual project sites for up-to-date license information. A separate page lists commercial software.
Know of any other open source software for PacBio data? Email us.
軟件目錄:
- 組裝部分De novo assembly
- 結(jié)構(gòu)變異檢測Structural Variations Detection
- 有參比對Reference-based alignment
- 變異檢測?Consensus and variant calling
- RNA 分析 RNA analysis
- 表觀修飾及甲基化 Epigenetic base modifications and methylation
- 基因組瀏覽器 Genome Browsers
De novo assembly
Detailed information on Large Genome Assembly with PacBio Long Reads is published here
- Falcon: An experimental diploid assembler, tested on ~100 Mb genomes
- Canu: Canu is a fork of the Celera Assembler designed for high-noise single-molecule sequencing
- wtdbg2: A fuzzy Bruijn graph approach to long noisy reads assembly
- MHAP: This is a reference implementation of a probabilistic sequence overlapping algorithm. Designed to efficiently detect all overlaps between noisy long-read sequence data. It efficiently estimates Jaccard similarity by compressing sequences to their representative fingerprints composed on min-mers (minimum k-mer).
- HGAP: hierarchical genome assembler for PacBio long reads only. Bundled in SMRT Analysis since v1.4
- HBAR-DTK: Hierarchical-Based AssembleR Development ToolKit, recommended for advanced users only
- ALLORA: a long read assembler for PacBio long reads alone. Available only in SMRT Analysis. Since v1.0.
- Celera? Assembler: Celera? Assembler 8.1 now offers a way to directly assemble subreads
- Sprai: A preassembly-based assembler that aims to generate longer contigs
- PBcR self-correction: A mode within PBcR (aka pacBioToCA) to do self-correction in the same style as HGAP. Celera? Assembler 8.2 uses the MHAP algorithm for faster overlap calculation during the self-correction phase.
- pacBioToCA + Celera? Assembler: A scalable hybrid assembly to combine PacBio long reads with Illumina?, 454, Sanger, Ion Torrent or CCS. Bundled in SMRT Analysis from v1.3.3
- ECTools: A set of tools for hybrid assembly. It that contigs instead of short reads for correction.
- SPAdes: True hybrid assembler, PacBio with Illumina or Ion Torrent; small(er) genomes only
- Cerulean: Ceruleanis a hybrid assembly. It starts with an assembly graph from Abyss and extends contigs by resolving bubbles in the graph using PacBio long reads. Was successfully run on genomes <100 Mb.
- dbg2olc: dbg2olc is a hybrid assembly which uses Illumina contigs as anchors to build an overlap graph with PacBio reads, allowing very fast performance
- ALLPATHS-LG: hybrid assembler for PacBio long reads plus Illumina mate pairs plus Illumina jumping libraries
- AHA: A hybrid assembler to scaffold existing contigs and fill gaps. Available only in SMRT Analysis. Since v1.0
- PBJelly 2: Gap filling and scaffolding for large genomes
- MIRA: de novo assembler
Structural Variations Calling
Sniffles: Calls all types of structural variants using evidence from split-read alignments, high-mismatch regions, and coverage analysis.
SMRT-SV: Calls insertions, deletions, and inversions using a local assembly approach.
RNA Analysis
Iso-Con: for targeted Iso-Seq only. IsoCon is a tool for deriving finished transcripts from Iso-Seq reads. Input is a set of full-length-non-chimeric reads in fasta format and the CCS base call values as a bam file. The output is a set of predicted transcripts.
Cupcake: accompanying scripts for official Iso-Seq1, 2, and 3 output analysis.
TAMA: suite of downstream analysis scripts, including collapsing and merging transcript data. See TAMA wiki for more details.
SQANTI, a Iso-Seq QC and analysis software that can take long read output from either Iso-Seq, IDP, TAPIS, etc, and combine with short read, reference genome, annotations, to give a comprehensive description of the dataset. preprint
TAPPAS for isoform analysis and visualization, to be used after data has been cleaned up with SQANTI.
lncRNA Discovery Pipeline: Python scripts for using two ncRNA classifiers (CPAT and PLEK) for discovering long ncRNAs in Iso-Seq data.
ANGEL: Python library for doing both error-free and error-tolerant Open Reading Frame prediction
Cogent: Genome Reconstruction using Iso-Seq data only, without a reference genome.
SpliceMap-LSC-IDP pipeline: developed by Kin Fai Au's lab, a hybrid (long + short read) error correction and quantification software for transcriptome data.
IDP-fusion: a fusion detection finder using both long & short reads (hybrid).
Reference-based alignment
- bwa-sw: Burrows-Wheeler aligner with Smith-Waterman
Consensus and variant calling
- GATK: Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping
- DeepVariant: DeepVariant is an analysis pipeline developed by Google that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
- LoFreq: Low frequency variant caller. Recommended to switch off BAQ computation with -B. Calls all known mutations in the HBV amplicons data-set without false positives starting from 1.15% AF.
Epigenetic base modifications and methylation
- R-kinetics: R package for kinetic analysis
- MotifMaker: bundled in SMRT Analysis since v1.3.3
- motif-finding: R code for motif analysis
- kineticsTools: Python code for kinetic analysis
Genome Browsers
- IGV: Integrative Genome Viewer from the Broad Institute
- SMRT View: PacBio's Genome Browser for SMRT Sequencing data. Explore and interact with Resequencing, De novo, Base Modification and Identification, Motif Analysis, cDNA, Single Molecule and Barcoding experiment results
- Tablet: Next Generation Sequence Assembly Visualization
Visit the PacBio Developer's Network Website for the most up-to-date links to downloads, documentation and more. Terms of Use | Trademarks | Contact Us