Working with high-quality reference genomes
In this recipe, you will learn about a few general techniques to manipulate reference genomes. As an illustrative example, we will study the GC content – the fraction of the genome that is based on guanine-cytosine in Plasmodium falciparum, the most important parasite species that causes malaria. Reference genomes are normally made available as FASTA files.
Getting ready
Organism genomes come in widely different sizes, ranging from viruses such as HIV, which is 9.7 kbp, to bacteria such as E. coli, to protozoans such as Plasmodium falciparum, which has a 22 Mbp spread across 14 chromosomes, mitochondrion, and apicoplast, to the fruit fly with three autosomes, a mitochondrion, and X/Y sex chromosomes, to humans with their three Gbp pairs spread across 22 autosomes, X/Y chromosomes, and mitochondria, all the way up to Paris japonica, a plant with 150 Gbp of the genome. Along the way, you have different ploidy and sex chromosome...