Learning about big data
The expression "big data" does not mean a specific amount of data, neither in the number of examples nor in the number of gigabytes, terabytes, or petabytes taken up by the data. It means the following:
We have had data growing faster than the processing power
Some of the methods and techniques that worked well in the past now need to be redone, as they do not scale well
Your algorithms cannot assume that the entire data is in RAM
Managing data becomes a major task in itself
Using computer clusters or multicore machines becomes a necessity and not a luxury
This chapter will focus on this last piece of the puzzle: how to use multiple cores (either on the same machine or on separate machines) to speed up and organize your computations. This will also be useful in other medium-sized data tasks.