15.4 Feature selection and reduction
Recall that our df
DataFrame has 5
features/columns/dimensions. Do we need all these features for our analysis, or are any of them
redundant?
In our df
DataFrame, we do not need both the F
and M
columns. When a column entry in one is 0
, the entry in the other is
1
, and vice versa. This is easy to see, but other and more subtle relationships
may exist among columns.
For example, if a, b, and c are floating-point numbers and X, Y, and Z are columns, we might have z = a x + b y + c for each value x in X, y in Y, and z in Z in the same row. In column notation, Z = a X + b Y + c.
Exercise 15.11
Show that F
= – M
+ 1.
Exercise 15.12
Interpret this correlation coefficient matrix for the F
and
M
columns:
df[['F', 'M']].corr(method="pearson")
F M
F...