Extracting genes from a reference using annotations
In this recipe, we will learn how to extract a gene sequence with the help of an annotation file to get its coordinates against a reference FASTA. We will use the Anopheles gambiae genome, along with its annotation file (as per the previous two recipes). First, we will extract the voltage-gated sodium channel (VGSC) gene, which is involved in resistance to insecticides.
Getting ready
If you have followed the previous two recipes, you will be ready. If not, download the Anopheles gambiae FASTA file, along with the GTF file. You also need to prepare the gffutils
database:
import gffutils import sqlite3 try: db = gffutils.create_db('gambiae.gff.gz', 'ag.db') except sqlite3.OperationalError: db = gffutils.FeatureDB('ag.db')
As usual, you will find all of this in the Chapter05/Getting_Gene.py
notebook file.
How to do it...
Follow these steps...