Preparing and Exploring Our Data
Data preparation is a common theme in data science, extending beyond its association with the machine learning pipeline. It takes on various monikers such as data wrangling, data cleaning, and data preprocessing for feature engineering.
Here, we emphasize that significant time will be invested in data cleaning, feature engineering, and exploratory analysis, and we recognize the positive impact of robust preprocessing on outcomes, whether for a presentation for business stakeholders or its integration to a machine learning model.
Data cleaning encompasses tasks focused on identifying and rectifying data issues, particularly errors and artifacts. Errors result from data loss in the acquisition pipeline, while artifacts arise from the system that generates the data. Cleaning involves addressing missing data, handling outliers, removing duplicates, and performing necessary translations for data readability and conversion.
Data preparation spans...