Classifying income data using Support Vector Machines
We will build a Support Vector Machine classifier to predict the income bracket of a given person based on 14 attributes. Our goal is to see whether the income is higher or lower than $50,000 per year. Hence this is a binary classification problem. We will be using the census income dataset available at https://archive.ics.uci.edu/ml/datasets/Census+Income. One item to note in this dataset is that each datapoint is a mixture of words and numbers. We cannot use the data in its raw format, because the algorithms don't know how to deal with words. We cannot convert everything using a label encoder because numerical data is valuable. Hence, we need to use a combination of label encoders and raw numerical data to build an effective classifier.
Create a new Python file and import the following packages:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn.svm import...