Processing NGS data with HTSeq
HTSeq (https://htseq.readthedocs.io) is an alternative library that’s used for processing NGS data. Most of the functionality made available by HTSeq is actually available in other libraries covered in this book, but you should be aware of it as an alternative way of processing NGS data. HTSeq supports, among others, FASTA, FASTQ, SAM (via pysam
), VCF, General Feature Format (GFF), and Browser Extensible Data (BED) file formats. It also includes a set of abstractions for processing (mapped) genomic data, encompassing concepts such as genomic positions and intervals or alignments. A complete examination of the features of this library is beyond our scope, so we will concentrate on a small subset of features. We will take this opportunity to also introduce the BED file format.
The BED format allows for the specification of features for annotations’ tracks. It has many uses, but it’s common to load BED files into genome browsers to...