In this recipe, we'll apply clustering methods in order to find groups of customers for marketing purposes. We'll look at the German Credit Risk dataset, and we'll try to identify different segments of customers; ideally, we'd want to find the groups that are most profitable and different at the same time, so we can target them with advertising.
Getting ready
For this recipe, we'll be using a dataset of credit risk, usually referred to in full as the German Credit Risk dataset. Each row describes a person who took a loan, gives us a few attributes about the person, and tells us whether the person paid the loan back (that is, whether the credit was a good or bad risk).
We'll need to download and load up the German credit data as follows:
import pandas as pd
!wget http://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.data
names = ['existingchecking', 'duration', 'credithistory'...