The amount of data stored in the world is increasing in a quasi-exponential fashion. Nowadays, for a data scientist, having to process a few terabytes of data a day is not an unusual request anymore and, to make things even more complex, this implies having to deal with data that comes from many different heterogeneous systems. In addition, in spite of the size of the data you have to deal with, the expectation of business is constantly to produce a model within a short time, as you were simply operating on a toy dataset.
In conclusion of our journey around the essentials of data science, we cannot elude such a key necessity in data science. Therefore, we are going to introduce you to a new way of processing large amounts of data, scaling through multiple computers in order to acquire data, processing it, and building effective machine learning algorithms. Dealing...