Missing observations are pretty much the second-most-common issue in datasets. These arise for many reasons, as we have already alluded to in the introduction. In this recipe, we will learn how to deal with them.
Handling missing observations
Getting ready
To execute this recipe, you need to have a working Spark environment. Also, we will be working off of the new_id DataFrame we created in the previous recipe, so we assume you have followed the steps to remove the duplicated records.
No other prerequisites are required.
How to do it...
Since our data has two dimensions...