Aligning sequences longer than genes and proteins, such as contigs from assembly projects, chromosomes, or whole genomes is a tricky task and one for which we need different techniques than those for short sequences. The longer sequences get, the harder they are to align. Long alignments are especially costly in terms of the computational time taken, since the algorithms that are effective on short sequences take up exponentially more time with increasing sequence length. Performing longer alignments generally starts with finding short anchor alignments and working the alignment out from there. We typically end up with blocks of synteny—regions that match well between the different genome alignments.
In this recipe, we'll look at the DECIPHER package for genome length alignments. We'll use some chloroplast genomes...