Interfacing with R via rpy2
If there is some functionality that you need and you cannot find it in a Python library, your first port of call is to check whether it’s been implemented in R. For statistical methods, R is still the most complete framework; moreover, some bioinformatics functionalities are only available in R and are probably offered as a package belonging to the Bioconductor project.
rpy2 provides a declarative interface from Python to R. As you will see, you will be able to write very elegant Python code to perform the interfacing process. To show the interface (and to try out one of the most common R data structures, the DataFrame, and one of the most popular R libraries, ggplot2
), we will download its metadata from the Human 1,000 Genomes Project (http://www.1000genomes.org/). This is not a book on R, but we want to provide interesting and functional examples.
Getting ready
You will need to get the metadata file from the 1,000 Genomes sequence index...