Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases now! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Data Engineering with Scala and Spark

You're reading from   Data Engineering with Scala and Spark Build streaming and batch pipelines that process massive amounts of data using Scala

Arrow left icon
Product type Paperback
Published in Jan 2024
Publisher Packt
ISBN-13 9781804612583
Length 300 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (3):
Arrow left icon
Rupam Bhattacharjee Rupam Bhattacharjee
Author Profile Icon Rupam Bhattacharjee
Rupam Bhattacharjee
David Radford David Radford
Author Profile Icon David Radford
David Radford
Eric Tome Eric Tome
Author Profile Icon Eric Tome
Eric Tome
Arrow right icon
View More author details
Toc

Table of Contents (21) Chapters Close

Preface 1. Part 1 – Introduction to Data Engineering, Scala, and an Environment Setup
2. Chapter 1: Scala Essentials for Data Engineers FREE CHAPTER 3. Chapter 2: Environment Setup 4. Part 2 – Data Ingestion, Transformation, Cleansing, and Profiling Using Scala and Spark
5. Chapter 3: An Introduction to Apache Spark and Its APIs – DataFrame, Dataset, and Spark SQL 6. Chapter 4: Working with Databases 7. Chapter 5: Object Stores and Data Lakes 8. Chapter 6: Understanding Data Transformation 9. Chapter 7: Data Profiling and Data Quality 10. Part 3 – Software Engineering Best Practices for Data Engineering in Scala
11. Chapter 8: Test-Driven Development, Code Health, and Maintainability 12. Chapter 9: CI/CD with GitHub 13. Part 4 – Productionalizing Data Engineering Pipelines – Orchestration and Tuning
14. Chapter 10: Data Pipeline Orchestration 15. Chapter 11: Performance Tuning 16. Part 5 – End-to-End Data Pipelines
17. Chapter 12: Building Batch Pipelines Using Spark and Scala 18. Chapter 13: Building Streaming Pipelines Using Spark and Scala 19. Index 20. Other Books You May Enjoy

Understanding pattern matching

Scala has excellent support for pattern matching. The most prominent use is the match expression, which takes the following form:

selector match { alternatives }

selector is the expression that the alternatives will be tried against. Each alternative starts with the case keyword and includes a pattern, an arrow symbol =>, and one or more expressions, which will be evaluated if the pattern matches. The patterns can be of various types, such as the following:

  • Wildcard patterns
  • Constant patterns
  • Variable patterns
  • Constructor patterns
  • Sequence patterns
  • Tuple patterns
  • Typed patterns

Before going through each of these pattern types, let’s define our own custom List:

trait List[+A]
case class Cons[+A](head: A, tail: List[A]) extends List[A]
case object Nil extends List[Nothing]
object List {
  def apply[A](as: A*): List[A] = if (as.isEmpty) Nil else Cons(as.head, apply(as.tail: _*))
}

Example 1.56

Wildcard patterns

The wildcard pattern (_) matches any object and is used as a default, catch-all alternative. Consider the following example:

scala> def emptyList[A](l: List[A]): Boolean = l match {
     |   case Nil => true
     |   case _   => false
     | }
emptyList: [A](l: List[A])Boolean
scala> emptyList(List(1, 2))
res8: Boolean = false

Example 1.57

A wildcard can also be used to ignore parts of an object that we do not care about. Refer to the following code:

scala> def threeElements[A](l: List[A]): Boolean = l match {
     |   case Cons(_, Cons(_, Cons(_, Nil))) => true
     |   case _                            => false
     | }
threeElements: [A](l: List[A])Boolean
scala> threeElements(List(true, false))
res11: Boolean = false
scala> threeElements(Nil)
res12: Boolean = false
scala> threeElements(List(1, 2, 3))
res13: Boolean = true
scala> threeElements(List("a", "b", "c", "d"))
res14: Boolean = false

Example 1.58

In the preceding example, the threeElements method checks whether a given list has exactly three elements. The values themselves are not needed and are thus discarded in the pattern match.

Constant patterns

A constant pattern matches only itself. Any literal can be used as a constant – 1, true, and hi are all constant patterns. Any val or singleton object can also be used as a constant. The emptyList method from the previous example uses Nil to check whether the list is empty.

Variable patterns

Like a wildcard, a variable pattern matches any object and is bound to it. We can then use this variable to refer to the object:

scala> val ints = List(1, 2, 3, 4)
ints: List[Int] = Cons(1,Cons(2,Cons(3,Cons(4,Nil))))
scala> ints match {
     |   case Cons(_, Cons(_, Cons(_, Nil))) => println("A three element list")
     |   case l => println(s"$l is not a three element list")
     | }
Cons(1,Cons(2,Cons(3,Cons(4,Nil)))) is not a three element list

Example 1.59

In the preceding example, l is bound to the entire list, which then is printed to the console.

Constructor patterns

A constructor pattern looks like Cons(_, Cons(_, Cons(_, Nil))). It consists of the name of a case class (Cons), followed by a number of patterns in parentheses. These extra patterns can themselves be constructor patterns, and we can use them to check arbitrarily deep into an object. In this case, checks are performed at four levels.

Sequence patterns

Scala allows us to match against sequence types such as Seq, List, and Array among others. It looks similar to a constructor pattern. Refer to the following:

scala> def thirdElement[A](s: Seq[A]): Option[A] = s match {
     |   case Seq(_, _, a, _*) => Some(a)
     |   case _            => None
     | }
thirdElement: [A](s: Seq[A])Option[A]
scala> val intSeq = Seq(1, 2, 3, 4)
intSeq: Seq[Int] = List(1, 2, 3, 4)
scala> thirdElement(intSeq)
res16: Option[Int] = Some(3)
scala> thirdElement(Seq.empty[String])
res17: Option[String] = None

Example 1.60

As the example illustrates, thirdElement returns a value of type Option[A]. If a sequence has three or more elements, it will return the third element, whereas for any sequence with less than three elements, it will return None. Seq(_, _, a, _*) binds a to the third element if present. The _* pattern matches any number of elements.

Tuple patterns

We can pattern match against tuples too:

scala> val tuple3 = (1, 2, 3)
tuple3: (Int, Int, Int) = (1,2,3)
scala> def printTuple(a: Any): Unit = a match {
     |   case (a, b, c) => println(s"Tuple has $a, $b, $c")
     |   case _     =>
     | }
printTuple: (a: Any)Unit
scala> printTuple(tuple3)
Tuple has 1, 2, 3

Example 1.61

Running the preceding program will print Tuple has 1, 2, 3 to the console.

Typed patterns

A typed pattern allows us to check types in the pattern match and can be used for type tests and type casts:

scala> def getLength(a: Any): Int =
     |   a match {
     |     case s: String    => s.length
     |     case l: List[_]   => l.length //this is List from Scala collection library
     |     case m: Map[_, _] => m.size
     |     case _            => -1
     |   }
getLength: (a: Any)Int
scala> getLength("hello, world")
res3: Int = 12
scala> getLength(List(1, 2, 3, 4))
res4: Int = 4
scala> getLength(Map.empty[Int, String])
res5: Int = 0

Example 1.62

Please note that the argument a of type Any does not support methods such as length or size in the result expression. Scala automatically applies a type test and a type cast to match the target type. For example, case s: String => s.length is equivalent to the following snippet:

if (s.isInstanceOf[String]) {
  val x = s.asInstanceOf[String]
  x.length
}

Example 1.63

One important thing to note, though, is that Scala does not maintain type arguments during runtime. So, there is no way to check whether list has all integer elements or not. For example, the following will print A list of String to the console. The compiler will emit a warning to alert about the runtime behavior. Arrays are the only exception because the element type is stored with the array value:

scala> List.fill(5)(0) match {
     |   case _: List[String] => println("A list of String")
     |   case _           =>
     | }
<console>:13: warning: fruitless type test: a value of type List[Int] cannot also be a List[String] (the underlying of List[String]) (but still might match its erasure)
         case _: List[String] => println("A list of String")
                 ^
A list of String

Example 1.64

You have been reading a chapter from
Data Engineering with Scala and Spark
Published in: Jan 2024
Publisher: Packt
ISBN-13: 9781804612583
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime