Now that you understand how the k-means algorithm works internally, we can proceed to implement it in scikit-learn. We are going to work with the same fraud detection dataset that we used in all of the previous chapters. The key difference is that we are going to drop the target feature, which contains the labels, and identify the two clusters that are used to detect fraud.
Implementing the k-means algorithm in scikit-learn
Creating the base k-means model
In order to load the dataset into our workspace and drop the target feature with the labels, we use the following code:
import pandas as pd
#Reading in the dataset
df = pd.read_csv('fraud_prediction.csv')
#Dropping the target feature & the index
df = df.drop([&apos...