Feature splitting
Feature split is a technique that consists of splitting values from one column to create new ones. A good example could be to split first names and last names that have been saved in a single column into two separate ones, or splitting a date into three columns with separate values for days of the month, months, and years. The main goal of splitting a feature is to give a machine learning algorithim data in small packages that it can interpret better and, by the end, improve the machine learning model's performance.
For featuring splitting, we can use the unnest
method, which we looked at in Chapter 3. However, there, we focused on how we can produce features to feed our machine learning algorithm.
First, let's start with a dataframe that contains some string values:
df = op.create.dataframe({"A":["Argenis Leon","Luis Aguirre","Favio Vasquez",np.nan]}) print(df.cols.unnest("A"," ", drop...