Finding protein domains with PFAM and bio3d
Discovering the function of a protein sequence is a key task. We can do this in many ways, including by conducting whole sequence similarity searches against databases of known proteins using tools such as BLAST. If we want more informative and granular information, we can instead look for individual functional domains within a sequence. Databases such as PFAM
and tools such as hmmer
make this possible. PFAM
encodes protein domains as Hidden Markov Models, which hmmer
uses to scan sequences and report any likely occurrences of the domains. Often, genome annotation projects will carry out the searches for us, meaning that finding the PFAM
domains in our sequence is a question of searching a database. Bioconductor does a great job of packaging up the data in these databases in particular packages, usually with names ending in .db
. In this recipe, we’ll look at how to work out whether a package contains PFAM
domain information, how to...