Aligning genomic length sequences with DECIPHER
Aligning sequences longer than genes and proteins, such as contigs from assembly projects, chromosomes, or whole genomes is a tricky task. For such tasks, we need different techniques than those for short sequences. The longer sequences get, the harder they are to align. Long alignments are especially costly in terms of the computational time taken. The algorithms that are effective on short sequences take up exponentially more time with increasing sequence length. Performing longer alignments generally starts with finding short anchor alignments and working the long alignment out from there. We typically end up with blocks of synteny regions that match well between the different sequences.
In this recipe, we’ll look at the DECIPHER
package for genome length alignments. We’ll use some chloroplast genomes – small organelle genomes of about 150 Kbp that are pretty well conserved as our sequences of interest.