Dataset reorganizing
In this section, we will cover dataset reorganization techniques. Then, we will discuss some of Spark's special features for data reorganizing and also some of R's special methods for data reorganizing that can be used with the Spark notebook.
After this section, we will be able to reorganize datasets for various machine learning needs.
Dataset reorganizing tasks
Reorganizing datasets sounds easy but could be very challenging and also often very time consuming.
Two common data reorganizing tasks are—firstly, to obtain a subset of the data for modeling and, secondly, to aggregate data to a higher level. For example, we have students' data, but we need to have a dataset at the classroom level. For this, we will need to calculate some attributes for students and then reorganize it into new data.
To work with data reorganizing, data scientists and machine learning professionals often utilize their familiar SQL or R programming tools. Fortunately within...