Working with alignment data
After you receive your data from the sequencer, you will normally use a tool such as Burrows-Wheeler Aligner (bwa
) to align your sequences to a reference genome. Most users will have a reference genome for their species. You can read more on reference genomes in Chapter 5, Working with Genomes.
The most common representation for aligned data is the Sequence Alignment Map (SAM) format. Due to the massive size of most of these files, you will probably work with its compressed version (BAM). The compressed format is indexable for extremely fast random access (for example, to speedily find alignments to a certain part of a chromosome). Note that you will need to have an index for your BAM file, which is normally created by the tabix
utility of SAMtools. SAMtools is probably the most widely used tool for manipulating SAM/BAM files.
Getting ready
As discussed in the previous recipe, we will use data from the 1,000 Genomes Project. We will use the exome...