Using Spark built-in functions
Spark has built-in functions that can be used to augment manual profiling. It can replace some manual data profiling queries. This recipe will teach you about two basic built-in functions that, when used in conjunction with manual profiling, provide useful information to both data modelers and data scientists. In this recipe, we will use the DataFrame created in the first recipe and call Spark built-in functions.
Getting ready
This recipe uses Azure Databricks. If you are using a trial Azure subscription, you will need to upgrade it to a Pay-As-You-Go subscription. Azure Databricks requires eight cores of computing resources. The trial Azure subscription has only four computing resource cores. If you are using an Enterprise or MSDN Azure subscription, it should contain enough resources for Azure Databricks.
How to do it…
Now let's see how we can profile data with Spark built-in functions. We will be adding the code for this recipe...