htseq-count最經(jīng)常使用的用途
htseq-count: counting reads within features — HTSeq 2.0.3 documentation
In the case of RNA-Seq, the features are typically genes, where each gene is considered here as the union of all its exons. One may also consider each exon as a feature, e.g., in order to check for alternative splicing. For comparative ChIP-Seq, the features might be binding region from a pre-determined list.
Important: The default for strandedness is yes. If your RNA-Seq data has not been made with a strand-specific protocol, this causes half of the reads to be lost. Hence, make sure to set the option --stranded=no unless you have strand-specific data!
Important: For paired-end reads, although position-sorted BAM files are supported, unsorted BAM files (i.e. in which the two reads of the pair are in consecutive lines of the BAM file) are highly recommended for htseq-count. If you are having trouble or unexpected results, sort your BAM file by name and try again.
注意-t參數(shù)
Feature type (3rd column in GTF file) to be used, all features of other type are ignored (default, suitable for RNA-Seq analysis using an Ensembl GTF file: exon
)
--add-chromosome-info
注意-i參數(shù)
The default, suitable for RNA-Seq analysis using an Ensembl GTF file, is gene_id.
Quality Assessment with htseq-qa
Given a FASTQ or SAM file, this script produces a PDF file with plots depicting the base calls and base-call qualities by position in the read. This is useful to assess the technical quality of a sequencing run.
htseq-count: counting reads within features
Given one/multiple SAM/BAM/CRAM files with alignments and a GTF file with genomic features, this script counts how many reads map to each feature. This script is especially popular for bulk and single-cell RNA-Seq analysis.
htseq-count-barcodes: counting reads with cell barcodes and UMIs
Similar to htseq-count, but for a single SAM/BAM/CRAM file containing reads with cell and molecular barcodes (e.g. 10X Genomics cellranger output). This script enables customization of single-cell RNA-Seq pipelines, e.g. to quantify exon-level expression or simply to obtain a count matrix that contains chromosome information additional feature metadata.