We began this chapter by explaining some of the reasons why large datasets sometimes present a problem for unoptimized R code, such as no auto-parallelization and no native support for out-of-memory data. For the rest of the chapter, we discussed specific routes to optimizing R code in order to tackle large data.
First, you learned of the dangers of optimizing code too early. Next, we saw (much to the relief of slackers everywhere) that taking the lazy way out (and buying or renting a more powerful machine) is often the more cost-effective solution.
After that, we saw that a little knowledge about the dynamics of memory allocation and vectorization in R can often go a long way in performance gains.
The next two sections focused less on changing our R code, and more on changing how we use our code. Specifically, we discovered that there are often performance gains to be...