Data is never clean – it always contains missing values, errors, incorrect formats, and other problems that make it impossible to feed to a machine learning model without preprocessing. This is what data cleansing is all about – correcting all these problems before starting the real analysis.
As an example of how to clean a dataset, we will use the Titanic passengers dataset. We will repeat the procedure described in the Importing data from another Excel workbook section of the previous chapter, to import data from an Excel workbook. We will use real data from the Titanic passengers and demonstrate how you can prepare it for analysis.
To clean a dataset, perform the necessary steps, as follows:
- Navigate to Data | From File | From Workbook, as shown in the following screenshot:
- After selecting the titanic.xlsx file and the Passenger data worksheet...