Summary
In this chapter, we learned how to quickly view important metrics to understand the contents of your data by generating summary statistics with data profiling tools (Column quality, Column distribution, and Column profile) in the Power Query Editor. We then discussed the many different visualizations that can be used to explore your data, such as line, bar, column, and scatter charts. We used a Python visual and learned how to create histograms and box plots with the matplotlib
library. All these tools will help you to understand your data, to learn whether it is a representative dataset that you should continue to use, and as you get your first insights from your data, you will be able to judge how it can be used for different AI projects.
Now that we understand the content of our data, we know that there are some problems to fix before we move on to AI. To ensure data quality, we need to fix our outliers, missing data, and imbalanced data that can negatively influence...