Saving your data
While distributed computational jobs are a lot of fun, they are much more useful when the results are stored in a useful place. While the methods for loading an RDD are largely found in the SparkContext
class, the methods for saving an RDD are defined on the RDD classes. In Scala, implicit conversions exist so that an RDD, which can be saved as a sequence file, could be converted to the appropriate type; in Java, explicit conversions must be used.
Here are the different ways to save an RDD.
Here's the code for Scala:
rddOfStrings.saveAsTextFile("out.txt") keyValueRdd.saveAsObjectFile("sequenceOut")
Here's the code for Java:
rddOfStrings.saveAsTextFile("out.txt") keyValueRdd.saveAsObjectFile("sequenceOut")
Here's the code for Python:
rddOfStrings.saveAsTextFile("out.txt")
Tip
In addition, users can save the RDD as a compressed text file using the following function:
saveAsTextFile(path: String, codec: Class[_ <: CompressionCodec])