Data Preprocessing
Before proceeding onto univariate analysis, let's look at the unique values in the columns. The motive behind looking at the unique values in a column is to identify the subcategory in each column. By knowing the subcategory in each column, we would be in a position to understand which subcategory has a higher count or vice versa. For example, let's take the EDUCATION
column. We are interested in finding what the different subcategories in the EDUCATION
column are and which subcategory has the higher count; that is, do our customers have their highest education as College
or University
?
This step acts as a precursor before we build a profile of our customers.
Let's now find unique values in the SEX
column.
We'll print the unique values in the SEX
column and sort them in ascending order:
print('SEX ' + str(sorted(df['SEX'].unique())))
The output will be as follows:
SEX [1, 2]
The following code prints...