Writing Output from Spark DataFrames
Spark gives us the ability to write the data stored in Spark DataFrames into a local pandas DataFrame, or write them into external structured file formats such as CSV. However, before converting a Spark DataFrame into a local pandas DataFrame, make sure that the data would fit in the local driver memory.
In the following exercise, we will explore how to convert the Spark DataFrame to a pandas DataFrame.
Exercise 27: Converting a Spark DataFrame to a Pandas DataFrame
In this exercise, we will use the pre-created Spark DataFrame of the Iris dataset in the previous exercise, and convert it into a local pandas DataFrame. We will then store this DataFrame into a CSV file. Perform the following steps:
Convert the Spark DataFrame into a pandas DataFrame using the following command:
import pandas as pd df.toPandas()
Now use the following command to write the pandas DataFrame to a CSV file:
df.toPandas().to_csv('iris.csv')
Note
Writing the contents of a Spark DataFrame...