Collection of data from the real world is fraught with massive challenges. The raw data collected is plagued with a lot of issues, so much so that we need to adopt ways to sanitize the data to make it suitable for use in further studies.
Data preprocessing
Why process raw data?
Raw data as collected from the field is rigged with human error. Data entry is a major source of error when collecting data. Even technological methods of collecting data are not spared. Inaccurate reading of devices, faulty gadgetry, and changes in environmental factors can introduce significant margins of errors as data is collected.
The data collected may also be inconsistent with other records collected over time. The existence of duplicate entries...