Interaction with external storage systems
As we know, Spark is a processing engine that can help to process a humongous amount of data; however, to process the data it should be read from external systems. In this section, we will learn how to store/read data in Spark from/to different storage systems.
We will start with the local filesystem and then will implement Spark with some popular storage systems used in the big data world.
Interaction with local filesystem
It is very straightforward and easy to read data from a local filesystem in Spark. Let's discuss this with examples, as follows:
Let's put first things first. First, create (or reuse) the Maven project described in the previous chapter and create a Java class (with main method) for our application. We will start by creating a JavaSparkContext
:
SparkConf conf =new SparkConf().setMaster("local").setAppName("Local File system Example"); JavaSparkContext jsc=new JavaSparkContext(conf);
To read a text file in Spark, the textFile
method...