Analyzing Gene Annotations
Large-scale model organism sequencing projects, such as the Human Genome Project or the 1001 plant genomes sequencing projects, have made a huge amount of genomics data publicly available. Likewise, open-access data sharing by individual laboratories has made the raw sequencing data of genomes and transcriptomes widely available too. Working with this data programmatically can mean having to parse or bring locally some seriously large or complicated files. Much effort has gone into making these resources as accessible as possible through APIs and other queryable interfaces, such as BioMart. In this chapter, we will look at some recipes that will allow us to search annotations without having to download whole genome files and find relevant information across databases. We’ll look at how to analyze those annotations for biologically meaningful patterns in gene or protein sets derived from things such as differential expression and proteomics analysis...