The U.S. census data problem
The U.S. census data problem uses features of the U.S. population to predict whether a person will earn more than USD 50K or not.
In this chapter, we will use WIT_Model_Comparison_Ethical.ipynb
, a notebook derived from Google's notebook.
The initial dataset, adult.data
, was extracted from the U.S. Census Bureau database: https://web.archive.org/web/20021205224002/https://www.census.gov/DES/www/welcome.html
Each record contains the features of a person. Several ML programs were designed to predict the income of that person. The goal was to classify the population into groups for those earning more than USD 50K and those earning less.
The adult.names
file contains more information on this data and the methods.
The ML program's probabilities, as stated in the adult.names
file, achieved the following accuracy:
Class probabilities for adult.all file
| Probability for the label '>50K' : 23.93% / 24.78% (without...