R package maintainers
Another similarly straightforward data source might be the list of R package maintainers. We can download the names and e-mail addresses of the package maintainers from a public page of CRAN, where this data is stored in a nicely structured HTML table that is extremely easy to parse:
> packages <- readHTMLTable(paste0('http://cran.r-project.org', + '/web/checks/check_summary.html'), which = 2)
Extracting the names from the Maintainer
column can be done via some quick data cleansing and transformations, mainly using regular expressions. Please note that the column name starts with a space—that's why we quoted the column name:
> maintainers <- sub('(.*) <(.*)>', '\\1', packages$' Maintainer') > maintainers <- gsub(' ', ' ', maintainers) > str(maintainers) chr [1:6994] "Scott Fortmann-Roe" "Gaurav Sood" "Blum Michael" ...