Comparing categorical values with categorical values
In this section, we will focus on dealing with multiple categorical values. One thing to keep in mind is that continuous columns can be converted into categorical columns by binning the values.
In this section, we will look at makes and vehicle class.
How to do it…
- Lower the cardinality. Limit the
VClass
column to six values, in a simple class column,SClass
. Only use Ford, Tesla, BMW, and Toyota:>>> def generalize(ser, match_name, default): ... seen = None ... for match, name in match_name: ... mask = ser.str.contains(match) ... if seen is None: ... seen = mask ... else: ... seen |= mask ... ser = ser.where(~mask, name) ... ser = ser.where(seen, default) ... return ser >>> makes = ["Ford", "Tesla", "BMW", "Toyota"] >>> data = fueleco[fueleco.make.isin(makes...