Chapter 3. Data Wrangling
I assume that by now you are at ease with importing datasets from various sources and exploring the look and feel of the data. Handling missing values, creating dummy variables and plots are some tasks that an analyst (predictive modeller) does with almost all the datasets to make them model-worthy. So, for an aspiring analyst it will be better to master these tasks, as well.
Next in the line of items to master in order to juggle data like a pro is data wrangling. Put simply, it is just a fancy word for the slicing and dicing of data. If you compare the entire predictive modelling process to a complex operation/surgery to be performed on a patient, then the preliminary analysis with a stethoscope and diagnostic checks on the patient is the data cleaning and exploration process, zeroing down on the ailing area and deciding which body part to operate on is data wrangling, and performing the surgery/operation is the modelling process.
Surgery/operation | ... |
---|