Some basic exercises using Spark shell
Note that Spark shell is available only in the Scala language. However, we have kept examples easy to understand by Java developers.
Checking Spark version
Execute the following command to check the Spark version using spark-shell
:
scala>sc.version res0: String = 2.1.1
It is shown in the following screenshot:
Creating and filtering RDD
Let's start by creating an RDD of strings:
scala>val stringRdd=sc.parallelize(Array("Java","Scala","Python","Ruby","JavaScript","Java")) stringRdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at <console>:24
Now, we will filter this RDD to keep only those strings that start with the letter J
:
scala>valfilteredRdd = stringRdd.filter(s =>s.startsWith("J")) filteredRdd: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[2] at filter at <console>:26
In the first chapter, we learnt that if an operation on RDD returns an RDD then it is a transformation, or else it is an action.
The...