Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Data Engineering with Scala and Spark

You're reading from   Data Engineering with Scala and Spark Build streaming and batch pipelines that process massive amounts of data using Scala

Arrow left icon
Product type Paperback
Published in Jan 2024
Publisher Packt
ISBN-13 9781804612583
Length 300 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (3):
Arrow left icon
Rupam Bhattacharjee Rupam Bhattacharjee
Author Profile Icon Rupam Bhattacharjee
Rupam Bhattacharjee
David Radford David Radford
Author Profile Icon David Radford
David Radford
Eric Tome Eric Tome
Author Profile Icon Eric Tome
Eric Tome
Arrow right icon
View More author details
Toc

Table of Contents (21) Chapters Close

Preface 1. Part 1 – Introduction to Data Engineering, Scala, and an Environment Setup
2. Chapter 1: Scala Essentials for Data Engineers FREE CHAPTER 3. Chapter 2: Environment Setup 4. Part 2 – Data Ingestion, Transformation, Cleansing, and Profiling Using Scala and Spark
5. Chapter 3: An Introduction to Apache Spark and Its APIs – DataFrame, Dataset, and Spark SQL 6. Chapter 4: Working with Databases 7. Chapter 5: Object Stores and Data Lakes 8. Chapter 6: Understanding Data Transformation 9. Chapter 7: Data Profiling and Data Quality 10. Part 3 – Software Engineering Best Practices for Data Engineering in Scala
11. Chapter 8: Test-Driven Development, Code Health, and Maintainability 12. Chapter 9: CI/CD with GitHub 13. Part 4 – Productionalizing Data Engineering Pipelines – Orchestration and Tuning
14. Chapter 10: Data Pipeline Orchestration 15. Chapter 11: Performance Tuning 16. Part 5 – End-to-End Data Pipelines
17. Chapter 12: Building Batch Pipelines Using Spark and Scala 18. Chapter 13: Building Streaming Pipelines Using Spark and Scala 19. Index 20. Other Books You May Enjoy

Understanding objects, classes, and traits

In this section, we are going to look at classes, traits, and objects. If you have used Java before, then some of the topics covered in this section will look familiar. However, there are several differences too. For example, Scala provides singleton objects, which automatically create a class and a single instance of that class in one go. Another example is Scala has case classes, which provide great support for pattern matching, allow you to create instances without the new keyword, and provide a default toString implementation that is quite handy when printing to the console.

We will first look at classes, followed by objects, and then wrap this section up with a quick tour of traits.

Classes

A class is a blueprint for objects, which are instances of that class. For example, we can create a Point class using the following code:

class Point(val x: Int, val y: Int) {
  def add(that: Point): Point = new Point(x + that.x, y + that.y)
  override def toString: String = s"($x, $y)"
}

Example 1.5

The Point class has four members—two immutable variables, x and y, as well as two methods, add and toString. We can create instances of the Point class as follows:

scala> val p1 = new Point(1,1)
p1: Point = (1, 1)
scala> val p2 = new Point(2,3)
p2: Point = (2, 3)

Example 1.6

We can then create a new instance, p3, by adding p1 and p2, as follows:

scala> val p3 = p1 add p2
p3: Point = (3, 4)

Example 1.7

Scala supports the infix notation, characterized by the placement of operators between operands, and automatically converts p1 add p2 to p1.add(p2). Another way to define the Point class is using a case class, as shown here:

case class Point(x: Int, y: Int) {
  def add(that: Point): Point = new Point(x + that.x, y + that.y)
}

Example 1.8

A case class automatically adds a factory method with the name of the class, which enables us to leave out the new keyword when creating an instance. A factory method is used to create instances of a class without requiring us to explicitly call the constructor method. Refer to the following example:

scala> val p1 = Point(1,1)
p1: Point = Point(1,1)
scala> val p2 = Point(2,3)
p2: Point = Point(2,3)

Example 1.9

The compiler also adds default implementations of various methods such as toString and hashCode, which the regular class definition lacks. So, we did not have to override the toString method, as was done earlier, and yet both p1 and p2 were printed neatly on the console (Example 1.9).

All arguments in the parameter list of a case class automatically get a val prefix, which makes them parametric fields. A parametric field is a shorthand that defines a parameter and a field with the same name.

To better understand the difference, let’s look at the following example:

scala> case class Point1(x: Int, y: Int) //x and y are parametric fields
defined class Point1
scala> class Point2(x: Int, y: Int) //x and y are regular parameters
defined class Point2
scala> val p1 = Point1(1, 2)
p1: Point1 = Point1(1,2)
scala> val p2 = new Point2(3, 4)
p2: Point2 = Point2@203ced18

Example 1.10

If we now try to access p1.x, it will work because x is a parametric field, whereas trying to access p2.x will result in an error. Example 1.11 illustrates this:

scala> println(p1.x)
1
scala> println(p2.x)
<console>:13: error: value x is not a member of Point2
       println(p2.x)
                  ^

Example 1.11

Trying to access p2.x will result in a compile error, value x is not a member of Point2. Case classes also have excellent support for pattern matching, as we will see in the Understanding pattern matching section.

Scala also provides an abstract class, which, unlike a regular class, can contain abstract methods. For example, we can define the following hierarchy:

abstract class Animal
abstract class Pet extends Animal {
  def name: String
}
class Dog(val name: String) extends Pet {
  override def toString = s"Dog($name)"
}
scala> val pluto = new Dog("Pluto")
pluto: Dog = Dog(Pluto)

Example 1.12

Animal is the base class. Pet extends Animal and declares an abstract method, name. Dog extends Pet and uses a parametric field, name (it is both a parameter as well as a field). Because Scala uses the same namespace for fields and methods, this allows the field name in the Dog class to provide a concrete implementation of the abstract method name in Pet.

Object

Unlike Java, Scala does not support static members in classes; instead, it has singleton objects. A singleton object is defined using the object keyword, as shown here:

class Point(val x: Int, val y: Int) {
  // new keyword is not required to create a Point object
  // apply method from companion object is invoked
  def add(that: Point): Point = Point(x + that.x, y + that.y)
  override def toString: String = s"($x, $y)"
}
object Point {
  def apply(x: Int, y: Int) = new Point(x, y)
}

Example 1.13

In this example, the Point singleton object shares the same name with the class and is called that class’s companion object. The class is called the companion class of the singleton object. For an object to qualify as a companion object of a given class, it needs to be in the same source file as the class itself.

Please note that the add method does not use the new keyword on the right-hand side. Point(x1, y1) is de-sugared into Point.apply(x1, y1), which returns a Point instance.

Singleton objects are also used to write an entrypoint for Scala applications. One option is to provide an explicit main method within the singleton object, as shown here:

object SampleScalaApplication {
  def main(args: Array[String]): Unit = {
    println(s"This is a sample Scala application")
  }
}

Example 1.14

The other option is to extend the App trait, which provides a main method implementation. We will cover traits in the next section. You can also refer to the Further reading section (the third point) for more information:

 object SampleScalaApplication extends App {
  println(s"This is a sample Scala application")
}

Example 1.15

Trait

Scala also has traits, which are used to define rich interfaces as well as stackable modifications. You can read more stackable modifications in the Further reading section (the fourth point) Unlike class inheritance, where each class inherits from just one super class, a class can mix in any number of traits. A trait can have abstract as well as concrete members. Here is a simplified example of the Ordered trait from the Scala standard library:

trait Ordered[T] {
  // compares receiver (this) with argument of the same type
  def compare(that: T): Int
  def <(that: T): Boolean = (this compare that) < 0
  def >(that: T): Boolean = (this compare that) > 0
  def <=(that: T): Boolean = (this compare that) <= 0
  def >=(that: T): Boolean = (this compare that) >= 0
}

Example 1.16

The Ordered trait takes a type parameter, T, and has an abstract method, compare. All of the other methods are defined in terms of that method. A class can add the functionalities defined by <, >, and so on, just by defining the compare method. The compare method should return a negative integer if the receiver is less than the argument, positive if the receiver is greater than the argument, and 0 if both objects are the same.

Going back to our Point example, we can define a rule to say that a point, p1, is greater than p2 if the distance of p1 from the origin is greater than that of p2:

case class Point(x: Int, y: Int) extends Ordered[Point] {
  def add(that: Point): Point = new Point(x + that.x, y + that.y)
  def compare(that: Point) = (x ^ 2 + y ^ 2) ^ 1 / 2 - (that.x ^ 2 + that.y ^ 2) ^ 1 / 2
}

Example 1.17

With the definition of compare now in place, we can perform a comparison between two arbitrary points, as follows:

scala> val p1 = Point(1,1)
p1: Point = Point(1,1)
scala> val p2 = Point(2,2)
p2: Point = Point(2,2)
scala> println(s"p1 is greater than p2: ${p1 > p2}")
p1 is greater than p2: false
example 1.18

In this section, we looked at objects, classes, and traits. In the next section, we are going to look at HOFs.

You have been reading a chapter from
Data Engineering with Scala and Spark
Published in: Jan 2024
Publisher: Packt
ISBN-13: 9781804612583
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image