Machine learning (ML) is a field of study that aims at using machines (computers) to understand world phenomena and predict their behavior. In order to build an ML model, all our data needs to be numeric. Since almost all of our features are categorical, we need to transform our features. In this recipe, we will learn how to use a hashing trick and dummy encoding.
Transforming the data
Getting ready
To execute this recipe, you need to have a working Spark environment. You would have already gone through the Loading the data recipe where we loaded the census data into a DataFrame.
No other prerequisites are required.