Data Quality Assurance and Exploration
So far, we remedied two data quality issues just by asking basic questions or by looking at the .info()
summary. Let's now take a look at the first few columns of data. Before we get to the historical bill payments, we have the credit limits of the LIMIT_BAL
accounts, and the SEX
, EDUCATION
, MARRIAGE
, and AGE
demographic features. Our business partner has reached out to us, to let us know that gender should not be used to predict credit-worthiness, as this is unethical by their standards. So we keep this in mind for future reference. Now we'll explore the rest of these columns, making any corrections that are necessary.
In order to further explore the data, we will use histograms. Histograms are a good way to visualize data that is on a continuous scale, such as currency amounts and ages. A histogram groups similar values into bins and shows the number of data points in these bins as a bar graph.
To plot histograms, we will start...