In a blog post that I penned showcasing the performance of this task under various optimization methods, I took it for granted that calculating the distances on the full dataset with the unparallelized/un-Rcpp-ed code would be a multi-hour affair, but I was seriously mistaken.
Shortly after publishing the post, a clever R programmer commented on it stating that they were able to slightly rework the code so that the serial/pure-R code took less than 20 seconds to complete, with all the 13,429 observations. How? Vectorization. The following code illustrates the technique:
single.core.improved <- function(airport.locs){ numrows <- nrow(airport.locs) running.sum <- 0 for (i in 1:(numrows-1)) { this.dist <- sum(haversine(airport.locs[i,2], airport.locs[i, 3], airport...