Creating a Spark application using Scala
To create data engineering pipelines in Scala, we need to leverage the Spark framework and create a Spark application. Spark provides various types of APIs to work with data, each with pros and cons. Regardless of which API we use, we need to encapsulate them in a Spark application. Let’s create one now.
Each Spark application written in Scala needs a SparkSession. The SparkSession is an object that provides the entry point to the Spark APIs.
In order to use the SparkSession, we need to create a Scala object. The object is an implementation of the singleton pattern. We use objects because each Spark application needs a single instance of Spark, which we can guarantee with an object. Let’s create a Scala object with some commonly used imports for our first Spark application:
package com.packt.descala.scalaplayground import org.apache.spark.sql.{ DataFrame, Dataset, Row, SparkSession } import org.apache.spark.sql.functions...