4. Supervised Learning Algorithms: Predicting Annual Income
Activity 4.01: Training a Naïve Bayes Model for Our Census Income Dataset
Solution:
- In a Jupyter Notebook, import all the required elements to load and split the dataset, as well as to train a Naïve Bayes algorithm:
import pandas as pd from sklearn.model_selection import train_test_split from sklearn.naive_bayes import GaussianNB
- Load the pre-processed Census Income dataset. Next, separate the features from the target by creating two variables,
X
andY
:data = pd.read_csv("census_income_dataset_preprocessed.csv") X = data.drop("target", axis=1) Y = data["target"]
Note that there are several ways to achieve the separation of
X
andY
. Use the one that you feel most comfortable with. However, take into account thatX
should contain the features of all instances, whileY
should contain the class labels of all instances. - Divide the dataset into training, validation, and...