In this recipe, we will look at how to extract a gene sequence with the help of an annotation file to get its coordinates against a reference FASTA. We will use the Anopheles gambiae genome, along with its annotation file (as per the previous two recipes). We will first extract the voltage-gated sodium channel (VGSC) gene, which is involved in resistance to insecticides.
Extracting genes from a reference using annotations
Getting ready
If you have followed the previous two recipes, you will be ready. If not, download the Anopheles gambiae FASTA file, along with the GTF file. You also need to prepare the gffutils database:
import gffutils
import sqlite3
try:
db = gffutils.create_db('gambiae.gff.gz', 'ag.db&apos...