Summary
In this chapter, we learned how EDA is an important stage in the data science project process and provides the means to understand the characteristics and limitations of data, as well as to find insightful patterns within the data before machine learning or statistical models can be developed based on it.
This initial analysis also allows teams to present results and train models with more confidence since they have a deeper understanding of the data they are working with and the issues it may present.
In this chapter, we covered a large range of methods that can be used for EDA. Not all of them are always necessary, but hopefully, these tools will allow you to analyze data for yourself and give you the knowledge to interpret these visualizations and analyses when they are presented to you.
In the next chapter, we’ll learn how to test business hypotheses with statistics. This technique, known as significance testing, is critical in validating the findings from...