Introduction to the case studies and datasets
Data cleaning and preparation usually take up to 80% of the time in a data analytics life cycle. Transactional datasets can have multiple failure modes, some of the prominent ones being missing data points, incompatible formats, variability in data types, incorrect spellings in data, and unwanted characters and white spaces in data.
These are just some examples of how data can be messy. The success of a data analyst will depend on how well they are able to traverse these quagmires of messy data and transform the data into the required format. A sure-shot way to be adept at this all-too-important process is to get hands-on experience with multiple real-world datasets. In this chapter, you will analyze four different datasets, with each analysis focusing on different facets of data wrangling. The following list offers a snapshot of the datasets we will be dealing with in this chapter and the different techniques we will be applying to...