The dataset
The dataset to be used in the next exercises can be found in the UCI dataset repository. It was pulled from the popular datasets tab in the repository, and it is named Adults
, but it is also known as Census Income dataset (from https://archive.ics.uci.edu/ml/datasets/Adult).
The variables we will be dealing with are listed next and I also invite you to read the adult.names
file provided with the dataset in the UCI repository or in the GitHub page for this chapter’s codes (https://tinyurl.com/ywpjj329).
- Demographics: age, sex, race, marital-status, relationship status, native country.
- Education: education level and education-num (years of study).
- Work related: work class, occupation, hours per week.
- Financial: capital gain and capital loss.
- fnlwgt: This means final weight, which is a scoring calculation from the Census Bureau based on socio-economic and demographic data. People with similar demographic information should have similar weights...