2. Unsupervised Learning – Real-Life Applications
Activity 2.01: Using Data Visualization to Aid the Pre-processing Process
Solution:
- Import all the required elements to load the dataset and pre-process it:
import pandas as pd import matplotlib.pyplot as plt import numpy as np
- Load the previously downloaded dataset by using pandas'
read_csv()
function. Store the dataset in a pandas DataFrame nameddata
:data = pd.read_csv("wholesale_customers_data.csv")
- Check for missing values in your DataFrame. Using the
isnull()
function plus thesum()
function, count the missing values of the entire dataset at once:data.isnull().sum()
The output is as follows:
Channel 0 Region 0 Fresh 0 Milk 0 Grocery 0 Frozen 0 Detergents_Paper 0 Delicassen 0 dtype: int64
As you can see from the preceding screenshot, there are no missing values in the dataset.
- Check for outliers...