Chapter 5: Cleaning and Visualizing Data
According to Anaconda's latest State of Data Science Report (https://bit.ly/3F2D8YM), 39% of your time as a data scientist will be spent on either data preparation or cleaning. This might come as no surprise, but being able to set up a problem correctly is vital to being able to get good answers from your data.
Rarely will data come to you in a perfect form, and even then, you might want to manipulate it to answer different questions from it. Being able to quickly find general statistics, discovering and removing bad columns, and altering fields in place will all be needed.
After it's in the right form, visualization is a key tool to be able to not only present your findings to those that might care about it but also as a guide for yourself at this data exploration stage. Cleaning and visualization go hand in hand, and many times you'll see that certain aspects of data need to be adjusted after seeing them. This chapter...