Manipulating DataFrames
In the previous recipe, we saw how to create a DataFrame. The next natural step, after creating DataFrames, is to play with the data inside them. Other than the numerous functions that help us to do that, we also find other interesting functions that help us sample the data, print the schema of the data, and so on. We'll take a look at them one by one in this recipe.
Note
The code and the sample file for this recipe could be found at https://github.com/arunma/ScalaDataAnalysisCookbook/blob/master/chapter1-spark-csv/src/main/scala/com/packt/scaladata/spark/csv/DataFrameCSV.scala.
How to do it...
Now, let's see how we can manipulate DataFrames using the following subrecipes:
- Printing the schema of the DataFrame
- Sampling data in the DataFrame
- Selecting specific columns in the DataFrame
- Filtering data by condition
- Sorting data in the frame
- Renaming columns
- Treating the DataFrame as a relational table to execute SQL queries
- Saving the DataFrame as a file