Data preparation
In the previous chapters, we discussed Python libraries such as NumPy, Pandas, Matplotlib, and Seaborn for processing and visualizing data. Let’s start with simply importing the libraries:
import numpy as np import pandas as pd import matplotlib.pyplot as plt
We will use a simple dataset that has only 4 columns and 10 rows.
Notice that some of the columns are categorical and others are numerical, and some of them have missing values that we need to fix. The dataset .csv
file is uploaded to Google Colab.
Using the pandas
library and the read_csv
function, we read the data and save it to a variable dataset, assign the first three columns (Country, Age, and Salary) as features, assign the feature dataset to X
, and assign the last column dataset to y
, as the prediction:
dataset = pd.read_csv('Data.csv') X = dataset.iloc[:,:-1].values y = dataset.iloc[:, -1].values print(X) [['France...