Jumping straight into modeling the data is a misstep almost every new data scientist makes; we get too eager to get to the reward stage, so we forget about the fact that most of the time is actually spent doing the boring stuff of cleaning up our data and getting familiar with it. In this recipe, we will explore the census dataset.
Exploring the data
Getting ready
To execute this recipe, you need to have a working Spark environment. You should have already gone through the previous recipe where we loaded the census data into a DataFrame.
No other prerequisites are required.