Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases now! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Data Engineering with Scala and Spark
Data Engineering with Scala and Spark

Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala

Arrow left icon
Profile Icon Eric Tome Profile Icon David Radford Profile Icon Rupam Bhattacharjee
Arrow right icon
$20.98 $29.99
Full star icon Full star icon Full star icon Full star icon Half star icon 4.2 (5 Ratings)
eBook Jan 2024 300 pages 1st Edition
eBook
$20.98 $29.99
Paperback
$36.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon Eric Tome Profile Icon David Radford Profile Icon Rupam Bhattacharjee
Arrow right icon
$20.98 $29.99
Full star icon Full star icon Full star icon Full star icon Half star icon 4.2 (5 Ratings)
eBook Jan 2024 300 pages 1st Edition
eBook
$20.98 $29.99
Paperback
$36.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
$20.98 $29.99
Paperback
$36.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
Table of content icon View table of contents Preview book icon Preview Book

Data Engineering with Scala and Spark

Scala Essentials for Data Engineers

Welcome to the world of data engineering with Scala. But why Scala? The following are some of the reasons for learning Scala:

  • Scala provides type safety
  • Big corporations such as Netflix and Airbnb have a lot of data pipelines written in Scala
  • Scala is native to Spark
  • Scala allows data engineers to adopt a software engineering mindset

Scala is a high-level general-purpose programming language that runs on a standard Java platform. It was created by Martin Odersky in 2001. The name Scala stands for scalable language, and it provides excellent support for both object-oriented and functional programming styles.

This chapter is meant as a quick introduction to concepts that the subsequent chapters build upon. Specifically, this chapter covers the following topics:

  • Understanding functional programming
  • Understanding objects, classes, and traits
  • Higher-order functions (HOFs)
  • Examples of HOFs from the Scala collection library
  • Understanding polymorphic functions
  • Variance
  • Option types
  • Collections
  • Pattern matching
  • Implicits in Scala

Technical requirements

This chapter is long and contains lots of examples to explain the concepts that are introduced. All of the examples are self-contained, and we encourage you to try them yourself as you move through the chapter. You will need a working Scala environment to run these examples.

You can choose to configure it by following the steps outlined in Chapter 2 or use an online Scala playground such as Scastie (https://scastie.scala-lang.org/). We will use Scala 2.12 as the language version.

Understanding functional programming

Functional programming is based on the principle that programs are constructed using only pure functions. A pure function does not have any side effects and only returns a result. Some examples of side effects are modifying a variable, modifying a data structure in place, and performing I/O. We can think of a pure function as just like a regular algebraic function.

An example of a pure function is the length function on a string object. It only returns the length of the string and does nothing else, such as mutating a variable. Similarly, an integer addition function that takes two integers and returns an integer is a pure function.

Two important aspects of functional programming are referential transparency (RT) and the substitution model. An expression is referentially transparent if all of its occurrences can be substituted by the result of the expression without altering the meaning of the program.

In the following example, Example 1.1, we set x and then use it to set r1 and r2, both of which have the same value:

scala> val x: String = "hello"
x: String = hello
scala> val r1 = x + " world!"
r1: String = hello world!
scala> val r2 = x + " world!"
r2: String = hello world!

Example 1.1

Now, if we replace x with the expression referenced by x, r1 and r2 will be the same. In other words, the expression hello is referentially transparent.

Example 1.2 shows the output from a Scala interpreter:

scala> val r1 = "hello" + " world!"
r1: String = hello world!
scala> val r2 = "hello" + " world!"
r2: String = hello world!

Example 1.2

Let’s now look at the following example, Example 1.3, where x is an instance of StringBuilder instead of String:

scala> val x = new StringBuilder("who")
x: StringBuilder = who
scala> val y = x.append(" am i?")
y: StringBuilder = who am i?
scala> val r1 = y.toString
r1: String = who am i?
scala> val r2 = y.toString
r2: String = who am i?

Example 1.3

If we substitute y with the expression it refers to (val y = x.append(" am i?")), r1 and r2 will no longer be equal:

scala> val x = new StringBuilder("who")
x: StringBuilder = who
scala> val r1 = x.append(" am i?").toString
r1: String = who am i?
scala> val r2 = x.append(" am i?").toString
r2: String = who am i? am i?

Example 1.4

So, the expression x.append(" am i?") is not referentially transparent.

One of the advantages of the functional programming style is it allows you to apply local reasoning without having to worry about whether it updates any globally accessible mutable state. Also, since no variable in the global scope is updated, it considerably simplifies building a multi-threaded application.

Another advantage is pure functions are also easier to test as they do not depend on any state apart from the inputs supplied, and they generate the same output for the same input values.

We won’t delve deep into functional programming as it is outside of the scope of this book. Please refer to the Further reading section for additional material on functional programming. In the rest of this chapter, we will provide a high-level tour of some of the important language features that the subsequent chapters build upon.

In this section, we looked at a very high-level introduction to functional programming. Starting with the next section, we will look at Scala language features that enable both functional and object-oriented programming styles.

Understanding objects, classes, and traits

In this section, we are going to look at classes, traits, and objects. If you have used Java before, then some of the topics covered in this section will look familiar. However, there are several differences too. For example, Scala provides singleton objects, which automatically create a class and a single instance of that class in one go. Another example is Scala has case classes, which provide great support for pattern matching, allow you to create instances without the new keyword, and provide a default toString implementation that is quite handy when printing to the console.

We will first look at classes, followed by objects, and then wrap this section up with a quick tour of traits.

Classes

A class is a blueprint for objects, which are instances of that class. For example, we can create a Point class using the following code:

class Point(val x: Int, val y: Int) {
  def add(that: Point): Point = new Point(x + that.x, y + that.y)
  override def toString: String = s"($x, $y)"
}

Example 1.5

The Point class has four members—two immutable variables, x and y, as well as two methods, add and toString. We can create instances of the Point class as follows:

scala> val p1 = new Point(1,1)
p1: Point = (1, 1)
scala> val p2 = new Point(2,3)
p2: Point = (2, 3)

Example 1.6

We can then create a new instance, p3, by adding p1 and p2, as follows:

scala> val p3 = p1 add p2
p3: Point = (3, 4)

Example 1.7

Scala supports the infix notation, characterized by the placement of operators between operands, and automatically converts p1 add p2 to p1.add(p2). Another way to define the Point class is using a case class, as shown here:

case class Point(x: Int, y: Int) {
  def add(that: Point): Point = new Point(x + that.x, y + that.y)
}

Example 1.8

A case class automatically adds a factory method with the name of the class, which enables us to leave out the new keyword when creating an instance. A factory method is used to create instances of a class without requiring us to explicitly call the constructor method. Refer to the following example:

scala> val p1 = Point(1,1)
p1: Point = Point(1,1)
scala> val p2 = Point(2,3)
p2: Point = Point(2,3)

Example 1.9

The compiler also adds default implementations of various methods such as toString and hashCode, which the regular class definition lacks. So, we did not have to override the toString method, as was done earlier, and yet both p1 and p2 were printed neatly on the console (Example 1.9).

All arguments in the parameter list of a case class automatically get a val prefix, which makes them parametric fields. A parametric field is a shorthand that defines a parameter and a field with the same name.

To better understand the difference, let’s look at the following example:

scala> case class Point1(x: Int, y: Int) //x and y are parametric fields
defined class Point1
scala> class Point2(x: Int, y: Int) //x and y are regular parameters
defined class Point2
scala> val p1 = Point1(1, 2)
p1: Point1 = Point1(1,2)
scala> val p2 = new Point2(3, 4)
p2: Point2 = Point2@203ced18

Example 1.10

If we now try to access p1.x, it will work because x is a parametric field, whereas trying to access p2.x will result in an error. Example 1.11 illustrates this:

scala> println(p1.x)
1
scala> println(p2.x)
<console>:13: error: value x is not a member of Point2
       println(p2.x)
                  ^

Example 1.11

Trying to access p2.x will result in a compile error, value x is not a member of Point2. Case classes also have excellent support for pattern matching, as we will see in the Understanding pattern matching section.

Scala also provides an abstract class, which, unlike a regular class, can contain abstract methods. For example, we can define the following hierarchy:

abstract class Animal
abstract class Pet extends Animal {
  def name: String
}
class Dog(val name: String) extends Pet {
  override def toString = s"Dog($name)"
}
scala> val pluto = new Dog("Pluto")
pluto: Dog = Dog(Pluto)

Example 1.12

Animal is the base class. Pet extends Animal and declares an abstract method, name. Dog extends Pet and uses a parametric field, name (it is both a parameter as well as a field). Because Scala uses the same namespace for fields and methods, this allows the field name in the Dog class to provide a concrete implementation of the abstract method name in Pet.

Object

Unlike Java, Scala does not support static members in classes; instead, it has singleton objects. A singleton object is defined using the object keyword, as shown here:

class Point(val x: Int, val y: Int) {
  // new keyword is not required to create a Point object
  // apply method from companion object is invoked
  def add(that: Point): Point = Point(x + that.x, y + that.y)
  override def toString: String = s"($x, $y)"
}
object Point {
  def apply(x: Int, y: Int) = new Point(x, y)
}

Example 1.13

In this example, the Point singleton object shares the same name with the class and is called that class’s companion object. The class is called the companion class of the singleton object. For an object to qualify as a companion object of a given class, it needs to be in the same source file as the class itself.

Please note that the add method does not use the new keyword on the right-hand side. Point(x1, y1) is de-sugared into Point.apply(x1, y1), which returns a Point instance.

Singleton objects are also used to write an entrypoint for Scala applications. One option is to provide an explicit main method within the singleton object, as shown here:

object SampleScalaApplication {
  def main(args: Array[String]): Unit = {
    println(s"This is a sample Scala application")
  }
}

Example 1.14

The other option is to extend the App trait, which provides a main method implementation. We will cover traits in the next section. You can also refer to the Further reading section (the third point) for more information:

 object SampleScalaApplication extends App {
  println(s"This is a sample Scala application")
}

Example 1.15

Trait

Scala also has traits, which are used to define rich interfaces as well as stackable modifications. You can read more stackable modifications in the Further reading section (the fourth point) Unlike class inheritance, where each class inherits from just one super class, a class can mix in any number of traits. A trait can have abstract as well as concrete members. Here is a simplified example of the Ordered trait from the Scala standard library:

trait Ordered[T] {
  // compares receiver (this) with argument of the same type
  def compare(that: T): Int
  def <(that: T): Boolean = (this compare that) < 0
  def >(that: T): Boolean = (this compare that) > 0
  def <=(that: T): Boolean = (this compare that) <= 0
  def >=(that: T): Boolean = (this compare that) >= 0
}

Example 1.16

The Ordered trait takes a type parameter, T, and has an abstract method, compare. All of the other methods are defined in terms of that method. A class can add the functionalities defined by <, >, and so on, just by defining the compare method. The compare method should return a negative integer if the receiver is less than the argument, positive if the receiver is greater than the argument, and 0 if both objects are the same.

Going back to our Point example, we can define a rule to say that a point, p1, is greater than p2 if the distance of p1 from the origin is greater than that of p2:

case class Point(x: Int, y: Int) extends Ordered[Point] {
  def add(that: Point): Point = new Point(x + that.x, y + that.y)
  def compare(that: Point) = (x ^ 2 + y ^ 2) ^ 1 / 2 - (that.x ^ 2 + that.y ^ 2) ^ 1 / 2
}

Example 1.17

With the definition of compare now in place, we can perform a comparison between two arbitrary points, as follows:

scala> val p1 = Point(1,1)
p1: Point = Point(1,1)
scala> val p2 = Point(2,2)
p2: Point = Point(2,2)
scala> println(s"p1 is greater than p2: ${p1 > p2}")
p1 is greater than p2: false
example 1.18

In this section, we looked at objects, classes, and traits. In the next section, we are going to look at HOFs.

Working with higher-order functions (HOFs)

In Scala, functions are first-class citizens, which means function values can be assigned to variables, passed to functions as arguments, or returned by a function as a value. HOFs take one or more functions as arguments or return a function as a value.

A method can also be passed as an argument to an HOF because the Scala compiler will coerce a method into a function of the required type. For example, let’s define a function literal and a method, both of which take a pair of integers, perform an operation, and then return an integer:

//function literal
val add: (Int, Int) => Int = (x, y) => x + y
//a method
def multiply(x: Int, y: Int): Int = x * y

Example 1.19

Let’s now define a method that takes two integer arguments and performs an operation, op, on them:

def op(x: Int, y: Int) (f: (Int, Int) => Int): Int = f(x,y)

Example 1.20

We can pass any function (or method) of type (Int, Int) => Int to op, as the following example illustrates:

scala> op(1,2)(add)
res15: Int = 3
scala> op(2,3)(multiply)
res16: Int = 6

Example 1.21

This ability to pass functions as parameters is extremely powerful as it allows us to write generic code that can execute arbitrary user-supplied functions. In fact, many of the methods defined in the Scala collection library require functions as arguments, as we will see in the next section.

Examples of HOFs from the Scala collection library

Scala collections provide transformers that take a base collection, run some transformations over each of the collection’s elements, and return a new collection. For example, we can transform a list of integers by doubling each of its elements using the map method, which we will cover in a bit:

scala> List(1,2,3,4).map(_ * 2)
res17: List[Int] = List(2, 4, 6, 8)

Example 1.22

A traversable trait, which is a base trait for all kinds of Scala collections, implements behaviors common to all collections, in terms of a foreach method, with the following signature:

def foreach[U](f: A => U): Unit

Example 1.23

The argument f is a function of type A => U, which is shorthand for Function1[A,U], and thus foreach is an HOF. This is an abstract method that needs to be implemented by all classes that mix in Traversable. The return type is Unit, which means this method does not return any meaningful value and is primarily used for side effects.

Here is an example that prints the elements of a List:

scala> /** let's start with a foreach call that prints the numbers in a list
     |   * List(1,2,3,4).foreach((i: Int) => println(i))
     |   * we can skip the type argument and let Scala infer it
     |   * List(1,2,3,4).foreach( i => println(i))
     |   * Scala provides a shorthand to replace arguments using _
     |   * if the arguments are used only once on the right side
     |   * List(1,2,3,4).foreach(println(_))
     |   * finally Scala allows to leave the argument altogether
     |   * if there is only one argument used on the right side
     |   */
     | List(1,2,3,4).foreach(println)
1
2
3
4

Example 1.24

For the rest of the examples, we will continue to use the List collection type, but they are available for other types of collections, such as Array, Map, and Set.

map is similar to foreach, but instead of returning a unit, it returns a collection by applying the function f to each element of the base collection. Here is the signature for List[A]:

final def map[B](f: (A) ⇒ B): List[B]

Example 1.25

Using the list from the previous example, if we want to double each of the elements in the list, but return a list of Doubles instead of Ints, it can be achieved by using the following:

scala> List(1,2,3,4).map(_ * 2.0)
res22: List[Double] = List(2.0, 4.0, 6.0, 8.0)

Example 1.26

The preceding expression returns a list of Double and can be chained with foreach to print the values contained in the list:

scala> List(1,2,3,4).map(_ * 2.0).foreach(println)
2.0
4.0
6.0
8.0

Example 1.27

A close cousin of map is flatMap, which comprises of two parts—map and flatten. Before looking into flatMap, let’s look at flatten:

//converts a list of traversable collections into a list
//formed by the elements of the traversable collections
def flatten[B]: List[B]

Example 1.28

As the name suggests, it flattens the inner collections:

scala> List(Set(1,2,3), Set(4,5,6)).flatten
res24: List[Int] = List(1, 2, 3, 4, 5, 6)

Example 1.29

Now that we have seen what flatten does, let’s go back to flatMap.

Let’s say that for each element of List(1,2,3,4), we want to create List of elements from 0 to that number (both inclusive) and then combine all of those individual lists into a single list. Our first pass at it would look like the following:

scala> List(1,2,3,4).map(0 to _).flatten
res25: List[Int] = List(0, 1, 0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 3, 4)

Example 1.30

With flatMap, we can achieve the same result in one step:

scala> List(1,2,3,4).flatMap(0 to _)
res26: List[Int] = List(0, 1, 0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 3, 4)

Example 1.31

Scala collections also provide filter, which accepts a function that returns a Boolean as an argument, which is then used to filter elements of a given collection:

def filter(p: (A) ⇒ Boolean): List[A]

Example 1.32

For example, to filter all of the even integers from List of numbers from 1 to 100, try the following:

scala> List.tabulate(100)(_ + 1).filter(_ % 2 == 0)
res27: List[Int] = List(2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100)

Example 1.33

There is also withFilter, which provides performance benefits over filter through the lazy evaluation of intermediate collections. It is part of the TraversableLike trait, with the FilterMonadic trait providing the abstract definition:

trait FilterMonadic[+A, +Repr] extends Any {
  //includes map, flatMap and foreach but are skipped here
  def withFilter(p: A => Boolean): FilterMonadic[A, Repr]
}

Example 1.34

TraversableLike defines the withFilter method through a member class, WithFilter, that extends FilterMonadic:

def withFilter(p: A => Boolean): FilterMonadic[A, Repr] = new WithFilter(p)
class WithFilter(p: A => Boolean) extends FilterMonadic[A, Repr] {
  // implementation of map, flatMap and foreach skipped here
  def withFilter(q: A => Boolean): WithFilter = new WithFilter(x =>
  p(x) && q(x)
  )
}

Example 1.35

Please note that withFilter returns an object of type FilterMonadic, which only has map, flatMap, foreach, and withFilter. These are the only methods that can be chained after a call to withFilter. For example, the following will not compile:

List.tabulate(50)(_ + 1).withFilter(_ % 2 == 0).forall(_ % 2 == 0)

Example 1.36

It is quite common to have a sequence of flatMap, filter, and map chained together and Scala provides syntactic sugar to support that through for comprehensions. To see it in action, let’s consider the following Person class and its instances:

case class Person(firstName: String, isFemale: Boolean, children: Person*)
val bob = Person("Bob", false)
val jennette = Person("Jennette", true)
val laura = Person("Laura", true)
val jean = Person("Jean", true, bob, laura)
val persons = List(bob, jennette, laura, jean)

Example 1.37

Person* represents a variable argument of type Person. A variable argument of type T needs to be the last argument in a class definition or method signature and accepts zero, one, or more instances of type T.

Now say we want to get pairs of mother and child, which would be (Jean, Bob) and (Jean, Laura). Using flatMap, filter, and map we can write it as follows:

scala> persons.filter(_.isFemale).flatMap(p => p.children.map(c => (p.firstName, c.firstName)))
res32: List[(String, String)] = List((Jean,Bob), (Jean,Laura))

Example 1.38

The preceding expression does its job, but it is not quite easy to understand what is happening. This is where for comprehension comes to the rescue:

scala> for {
     |   p <- persons
     |   if p.isFemale
     |   c <- p.children
     | } yield (p.firstName, c.firstName)
res33: List[(String, String)] = List((Jean,Bob), (Jean,Laura))

Example 1.39

It is much easier to understand what this snippet of code does. Behind the scenes, the Scala compiler will convert this expression into the first one (the only difference being filter will be replaced with withFilter).

Scala also provides methods to combine the elements of a collection using the fold and reduce families of functions. The primary difference between the two can be understood by comparing the signatures of foldLeft and reduceLeft:

def foldLeft[B](z: B)(op: (B, A) ⇒ B): B
def reduceLeft[A1 >: A](op: (A1, A1) ⇒ A1): A1

Example 1.40

Both of these methods take a binary operator to combine the elements from left to right. However, foldLeft takes a zero-argument, z, of type B (this value is returned if List is empty), and the output type can differ from the types of the elements in List. On the other hand, reduceLeft requires A1 to be a supertype of A (>: signifies a lower bound). So, we can sum up List[Int] and return the value as Double using foldLeft, as follows:

scala> List(1,2,3,4).foldLeft[Double](0) ( _ + _ )
res34: Double = 10.0

Example 1.41

We cannot do the same with reduceLeft (since Double is not a supertype of Int). Trying to do so will raise a compile-time error of type arguments [Double] do not conform to method reduce's type parameter bounds [A1 >: Int]:

scala> List(1,2,3,4).reduce[Double] ( _ + _ )
<console>:12: error: type arguments [Double] do not conform to method reduce's type parameter bounds [A1 >: Int]
       List(1,2,3,4).reduce[Double] ( _ + _ )
                           ^

Example 1.42

foldRight and reduceRight combine the elements of a collection from right to left. There is also fold and reduce, and for both, the order in which the elements are combined is unspecified and may be nondeterministic.

In this section, we have seen several examples of HOFs from the Scala collection library. By now, you should have noticed that each of these functions uses type parameters. These are called polymorphic functions, which is what we will cover next.

Understanding polymorphic functions

A function that works with multiple types of input arguments or can return a value of different types is called a polymorphic function. While writing a polymorphic function, we provide a comma-separated list of type parameters surrounded by square brackets after the name of the function. For example, we can write a function that returns the index of the first occurrence of an element within List:

scala> def findFirstIn[A](as: List[A], p: A => Boolean): Option[Int] =
     |   as.zipWithIndex.collect { case (e, i) if p(e) => i }.headOption
findFirstIn: [A](as: List[A], p: A => Boolean)Option[Int]
example 1.43

This function will work for any type of list: List[Int], List[String], and so on. For example, we can search for the index of element 5 in a list of integers from 1 to 20:

scala> import scala.util.Random
import scala.util.Random
scala> val ints = Random.shuffle((1 to 20).toList)
ints: List[Int] = List(7, 9, 3, 8, 6, 13, 12, 18, 14, 15, 1, 11, 10, 16, 2, 5, 20, 17, 4, 19)
scala> findFirstIn[Int](ints, _ == 5)
res38: Option[Int] = Some(15)

Example 1.44

In the next section, we are going to look at another property of type parameters, called variance, which defines subtyping relationships between objects, as we will see in the following section.

Variance

As mentioned earlier, functions are first-class objects in Scala. Scala automatically converts function literals into objects of the FunctionN type (N = 0 to 22). For example, consider the following anonymous function:

val f: Int => Any = (x: Int) => x

Example 1.45

This function will be converted automatically to the following:

val f = new Function1[Int, Any] {def apply(x: Int) = x}

Example 1.46

Please note that the preceding syntax represents an object of an anonymous class that extends Function1[Int, Any] and implements its abstract apply method. In other words, it is equivalent to the following:

class AnonymousClass extends Function1[Int, Any] {
  def apply(x: Int): Any = x
}
val f = new AnonymousClass

Example 1.47

If we refer to the type signature of the Function1 trait, we would see the following:

Function1[-T1, +T2]

Example 1.48

T1 represents the argument type and T2 represents the return type. The type variance of T1 is contravariant and that of T2 is covariant. In general, covariance designed by + means if a class or trait is covariant in its type parameter T, that is, C[+T], then C[T1] and C[T2] will adhere to the subtyping relationship between T1 and T2. For example, since Any is a supertype of Int, C[Any] will be a supertype of C[Int].

The order is reversed for contravariance. So, if we have C[-T], then C[Int] will be a supertype of C[Any].

Since we have Function1[-T1, +R], that would then mean type Function1[Int, Any] will be a supertype of, say, Function1[Any, String].

To see it in action, let’s define a method that takes a function of type Int => Any and returns Unit:

def caller(op: Int => Any): Unit = List
  .tabulate(5)(i => i + 1)
  .foreach(i => print(s"$i "))

Example 1.49

Let’s now define two functions:

scala> val f1: Int => Any = (x: Int) => x
f1: Int => Any = $Lambda$9151/1234201645@34f561c8
scala> val f2 : Any => String = (x: Any) => x.toString
f2: Any => String = $Lambda$9152/1734317897@699fe6f6

Example 1.50

A function (or method) with a parameter of type T can be invoked with an argument that is either of type T or its subtype. And since Int => Any is a supertype of Any => String, we should be able to pass both of these functions as arguments. As can be seen, both of them indeed work:

scala> caller(f1)
1 2 3 4 5
scala> caller(f2)
1 2 3 4 5

Example 1.51

Option type

Scala’s option type represents optional values. These values can be of two forms: Some(x), where x is the actual value, or None, which represents a missing value. Many of the Scala collection library methods return a value of the Option[T] type. The following are a few examples:

scala> List(1, 2, 3, 4).headOption
res45: Option[Int] = Some(1)
scala> List(1, 2, 3, 4).lastOption
res46: Option[Int] = Some(4)
scala> List("hello,", "world").find(_ == "world")
res47: Option[String] = Some(world)
scala> Map(1 -> "a", 2 -> "b").get(3)
res48: Option[String] = None

Example 1.52

Option also has a rich API and provides many of the functions from the collection library API through an implicit conversion function, option2Iterable, in the companion object. The following are a few examples of methods supported by the Option type:

scala> Some("hello, world!").headOption
res49: Option[String] = Some(hello, world!)
scala> None.getOrElse("Empty")
res50: String = Empty
scala> Some("hello, world!").map(_.replace("!", ".."))
res51: Option[String] = Some(hello, world..)
scala> Some(List.tabulate(5)(_ + 1)).flatMap(_.headOption)
res52: Option[Int] = Some(1)

Example 1.53

Collections

Scala comes with a powerful collection library. Collections are classified into mutable and immutable collections. A mutable collection can be updated in place, whereas an immutable collection never changes. When we add, remove, or update elements of an immutable collection, a new collection is created and returned, keeping the old collection unchanged.

All collection classes are found in the scala.collection package or one of its subpackages: mutable, immutable, and generic. However, for most of our programming needs, we refer to collections in either the mutable or immutable package.

A collection in the scala.collection.immutable package is guaranteed to be immutable and will never change after it is created. So, we will not have to make any defensive copies of an immutable collection, since accessing a collection multiple times will always yield the same set of elements.

On the other hand, collections in the scala.collection.mutable package provide methods that can update a collection in place. Since these collections are mutable, we need to defend against any inadvertent update, p, by other parts of the code base.

By default, Scala picks immutable collections. This easy access is provided through the Predef object, which is implicitly imported into every Scala source file. Refer to the following example:

object Predef {
  type Set[A] = immutable.Set[A]
  type Map[A, +B] = immutable.Map[A, B]
  val Map = immutable.Map
  val Set = immutable.Set
  // ...
}

Example 1.54

The Traversable trait is the base trait for all of the collection types. This is followed by Iterable, which is divided into three subtypes: Seq, Set, and Map. Both Set and Map provide sorted and unsorted variants. Seq, on the other hand, has IndexedSeq and LinearSeq. There is quite a bit of similarity among all these classes. For instance, an instance of any collection can be created by the same uniform syntax, writing the collection class name followed by its elements:

Traversable(1, 2, 3)
Map("x" -> 24, "y" -> 25, "z" -> 26)
Set("red", "green", "blue")
SortedSet("hello", "world")
IndexedSeq(1.0, 2.0)
LinearSeq(a, b, c)

Example 1.55

The following is the hierarchy for scala.collection.immutable collections taken from the docs.scala-lang.org website.

Figure 1.1 – Scala collection hierarchy

Figure 1.1 – Scala collection hierarchy

The Scala collection library is very rich and has various collection types suited to specific programming needs. If you want to delve deep into the Scala collection library, please refer to the Further reading section (the fifth point).

In this section, we looked at the Scala collection hierarchy. In the next section, we will gain a high-level understanding of pattern matching.

Understanding pattern matching

Scala has excellent support for pattern matching. The most prominent use is the match expression, which takes the following form:

selector match { alternatives }

selector is the expression that the alternatives will be tried against. Each alternative starts with the case keyword and includes a pattern, an arrow symbol =>, and one or more expressions, which will be evaluated if the pattern matches. The patterns can be of various types, such as the following:

  • Wildcard patterns
  • Constant patterns
  • Variable patterns
  • Constructor patterns
  • Sequence patterns
  • Tuple patterns
  • Typed patterns

Before going through each of these pattern types, let’s define our own custom List:

trait List[+A]
case class Cons[+A](head: A, tail: List[A]) extends List[A]
case object Nil extends List[Nothing]
object List {
  def apply[A](as: A*): List[A] = if (as.isEmpty) Nil else Cons(as.head, apply(as.tail: _*))
}

Example 1.56

Wildcard patterns

The wildcard pattern (_) matches any object and is used as a default, catch-all alternative. Consider the following example:

scala> def emptyList[A](l: List[A]): Boolean = l match {
     |   case Nil => true
     |   case _   => false
     | }
emptyList: [A](l: List[A])Boolean
scala> emptyList(List(1, 2))
res8: Boolean = false

Example 1.57

A wildcard can also be used to ignore parts of an object that we do not care about. Refer to the following code:

scala> def threeElements[A](l: List[A]): Boolean = l match {
     |   case Cons(_, Cons(_, Cons(_, Nil))) => true
     |   case _                            => false
     | }
threeElements: [A](l: List[A])Boolean
scala> threeElements(List(true, false))
res11: Boolean = false
scala> threeElements(Nil)
res12: Boolean = false
scala> threeElements(List(1, 2, 3))
res13: Boolean = true
scala> threeElements(List("a", "b", "c", "d"))
res14: Boolean = false

Example 1.58

In the preceding example, the threeElements method checks whether a given list has exactly three elements. The values themselves are not needed and are thus discarded in the pattern match.

Constant patterns

A constant pattern matches only itself. Any literal can be used as a constant – 1, true, and hi are all constant patterns. Any val or singleton object can also be used as a constant. The emptyList method from the previous example uses Nil to check whether the list is empty.

Variable patterns

Like a wildcard, a variable pattern matches any object and is bound to it. We can then use this variable to refer to the object:

scala> val ints = List(1, 2, 3, 4)
ints: List[Int] = Cons(1,Cons(2,Cons(3,Cons(4,Nil))))
scala> ints match {
     |   case Cons(_, Cons(_, Cons(_, Nil))) => println("A three element list")
     |   case l => println(s"$l is not a three element list")
     | }
Cons(1,Cons(2,Cons(3,Cons(4,Nil)))) is not a three element list

Example 1.59

In the preceding example, l is bound to the entire list, which then is printed to the console.

Constructor patterns

A constructor pattern looks like Cons(_, Cons(_, Cons(_, Nil))). It consists of the name of a case class (Cons), followed by a number of patterns in parentheses. These extra patterns can themselves be constructor patterns, and we can use them to check arbitrarily deep into an object. In this case, checks are performed at four levels.

Sequence patterns

Scala allows us to match against sequence types such as Seq, List, and Array among others. It looks similar to a constructor pattern. Refer to the following:

scala> def thirdElement[A](s: Seq[A]): Option[A] = s match {
     |   case Seq(_, _, a, _*) => Some(a)
     |   case _            => None
     | }
thirdElement: [A](s: Seq[A])Option[A]
scala> val intSeq = Seq(1, 2, 3, 4)
intSeq: Seq[Int] = List(1, 2, 3, 4)
scala> thirdElement(intSeq)
res16: Option[Int] = Some(3)
scala> thirdElement(Seq.empty[String])
res17: Option[String] = None

Example 1.60

As the example illustrates, thirdElement returns a value of type Option[A]. If a sequence has three or more elements, it will return the third element, whereas for any sequence with less than three elements, it will return None. Seq(_, _, a, _*) binds a to the third element if present. The _* pattern matches any number of elements.

Tuple patterns

We can pattern match against tuples too:

scala> val tuple3 = (1, 2, 3)
tuple3: (Int, Int, Int) = (1,2,3)
scala> def printTuple(a: Any): Unit = a match {
     |   case (a, b, c) => println(s"Tuple has $a, $b, $c")
     |   case _     =>
     | }
printTuple: (a: Any)Unit
scala> printTuple(tuple3)
Tuple has 1, 2, 3

Example 1.61

Running the preceding program will print Tuple has 1, 2, 3 to the console.

Typed patterns

A typed pattern allows us to check types in the pattern match and can be used for type tests and type casts:

scala> def getLength(a: Any): Int =
     |   a match {
     |     case s: String    => s.length
     |     case l: List[_]   => l.length //this is List from Scala collection library
     |     case m: Map[_, _] => m.size
     |     case _            => -1
     |   }
getLength: (a: Any)Int
scala> getLength("hello, world")
res3: Int = 12
scala> getLength(List(1, 2, 3, 4))
res4: Int = 4
scala> getLength(Map.empty[Int, String])
res5: Int = 0

Example 1.62

Please note that the argument a of type Any does not support methods such as length or size in the result expression. Scala automatically applies a type test and a type cast to match the target type. For example, case s: String => s.length is equivalent to the following snippet:

if (s.isInstanceOf[String]) {
  val x = s.asInstanceOf[String]
  x.length
}

Example 1.63

One important thing to note, though, is that Scala does not maintain type arguments during runtime. So, there is no way to check whether list has all integer elements or not. For example, the following will print A list of String to the console. The compiler will emit a warning to alert about the runtime behavior. Arrays are the only exception because the element type is stored with the array value:

scala> List.fill(5)(0) match {
     |   case _: List[String] => println("A list of String")
     |   case _           =>
     | }
<console>:13: warning: fruitless type test: a value of type List[Int] cannot also be a List[String] (the underlying of List[String]) (but still might match its erasure)
         case _: List[String] => println("A list of String")
                 ^
A list of String

Example 1.64

Implicits in Scala

Scala provides implicit conversions and parameters. Implicit conversion to an expected type is the first place the compiler uses implicits. For example, the following works:

scala> val d: Double = 2
d: Double = 2.0

Example 1.65

This works because of the following implicit method definition in the Int companion object (it was part of Predef prior to 2.10.x):

implicit def int2double(x: Int): Double = x.toDouble

Example 1.66

Another application of implicit conversion is the receiver of a method call. For example, let’s define a Rational class:

scala> class Rational(n: Int, d: Int) extends Ordered[Rational] {
     |
     |   require(d != 0)
     |   private val g = gcd(n.abs, d.abs)
     |   private def gcd(a: Int, b: Int): Int = if (b == 0) a else gcd(b, a % b)
     |   val numer = n / g
     |   val denom = d / g
     |   def this(n: Int) = this(n, 1)
     |   def +(that: Rational) = new Rational(
     |   this.numer * that.numer + this.denom * that.denom,
     |   this.denom * that.denom
     |   )
     |   def compare(that: Rational) = (this.numer * that.numer - this.denom * that.denom)
     |   override def toString = if (denom == 1) numer.toString else s"$numer/$denom"
     | }
defined class Rational

Example 1.67

Then declare a variable of the Rational type:

scala> val r1 = new Rational(1)
r1: Rational = 1
scala> 1 + r1
<console>:14: error: overloaded method value + with alternatives:
  (x: Double)Double <and>
  (x: Float)Float <and>
  (x: Long)Long <and>
  (x: Int)Int <and>
  (x: Char)Int <and>
  (x: Short)Int <and>
  (x: Byte)Int <and>
  (x: String)String
cannot be applied to (Rational)
       1 + r1
         ^

Example 1.68

If we try to add r1 to 1, we will get a compile-time error. The reason is the + method in Int does not support an argument of type Rational. In order to make it work, we can create an implicit conversion from Int to Rational:

scala> implicit def intToRational(n: Int): Rational = new Rational(n)
intToRational: (n: Int)Rational
scala> val r1 = new Rational(1)
r1: Rational = 1
scala> 1 + r1
res11: Rational = 2

Example 1.69

Summary

This was a long chapter and we covered a lot of topics. We started this chapter with a brief introduction to functional programming, looked at why it is useful, and reviewed examples of RT. We then looked at various language features and constructs, starting with classes, objects, and traits. We looked at HOFs, which are one of the fundamental building blocks of functional programming. We looked at polymorphic functions and saw how they enable us to write reusable code. Then, we looked at variance, which defines subtyping relationships between objects, took a detailed tour of pattern matching, and finally, ended with implicit conversion, which is a powerful language feature used in design patterns such as type classes.

In the next chapter, we are going to focus on setting up the environment, which will allow you to follow along with the rest of the chapters.

Further reading

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Transform data into a clean and trusted source of information for your organization using Scala
  • Build streaming and batch-processing pipelines with step-by-step explanations
  • Implement and orchestrate your pipelines by following CI/CD best practices and test-driven development (TDD)
  • Purchase of the print or Kindle book includes a free PDF eBook

Description

Most data engineers know that performance issues in a distributed computing environment can easily lead to issues impacting the overall efficiency and effectiveness of data engineering tasks. While Python remains a popular choice for data engineering due to its ease of use, Scala shines in scenarios where the performance of distributed data processing is paramount. This book will teach you how to leverage the Scala programming language on the Spark framework and use the latest cloud technologies to build continuous and triggered data pipelines. You’ll do this by setting up a data engineering environment for local development and scalable distributed cloud deployments using data engineering best practices, test-driven development, and CI/CD. You’ll also get to grips with DataFrame API, Dataset API, and Spark SQL API and its use. Data profiling and quality in Scala will also be covered, alongside techniques for orchestrating and performance tuning your end-to-end pipelines to deliver data to your end users. By the end of this book, you will be able to build streaming and batch data pipelines using Scala while following software engineering best practices.

Who is this book for?

This book is for data engineers who have experience in working with data and want to understand how to transform raw data into a clean, trusted, and valuable source of information for their organization using Scala and the latest cloud technologies.

What you will learn

  • Set up your development environment to build pipelines in Scala
  • Get to grips with polymorphic functions, type parameterization, and Scala implicits
  • Use Spark DataFrames, Datasets, and Spark SQL with Scala
  • Read and write data to object stores
  • Profile and clean your data using Deequ
  • Performance tune your data pipelines using Scala

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jan 31, 2024
Length: 300 pages
Edition : 1st
Language : English
ISBN-13 : 9781804614327
Category :
Languages :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning

Product Details

Publication date : Jan 31, 2024
Length: 300 pages
Edition : 1st
Language : English
ISBN-13 : 9781804614327
Category :
Languages :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 138.97
Azure Data Engineer Associate Certification Guide
$49.99
Data Engineering with AWS
$51.99
Data Engineering with Scala and Spark
$36.99
Total $ 138.97 Stars icon

Table of Contents

20 Chapters
Part 1 – Introduction to Data Engineering, Scala, and an Environment Setup Chevron down icon Chevron up icon
Chapter 1: Scala Essentials for Data Engineers Chevron down icon Chevron up icon
Chapter 2: Environment Setup Chevron down icon Chevron up icon
Part 2 – Data Ingestion, Transformation, Cleansing, and Profiling Using Scala and Spark Chevron down icon Chevron up icon
Chapter 3: An Introduction to Apache Spark and Its APIs – DataFrame, Dataset, and Spark SQL Chevron down icon Chevron up icon
Chapter 4: Working with Databases Chevron down icon Chevron up icon
Chapter 5: Object Stores and Data Lakes Chevron down icon Chevron up icon
Chapter 6: Understanding Data Transformation Chevron down icon Chevron up icon
Chapter 7: Data Profiling and Data Quality Chevron down icon Chevron up icon
Part 3 – Software Engineering Best Practices for Data Engineering in Scala Chevron down icon Chevron up icon
Chapter 8: Test-Driven Development, Code Health, and Maintainability Chevron down icon Chevron up icon
Chapter 9: CI/CD with GitHub Chevron down icon Chevron up icon
Part 4 – Productionalizing Data Engineering Pipelines – Orchestration and Tuning Chevron down icon Chevron up icon
Chapter 10: Data Pipeline Orchestration Chevron down icon Chevron up icon
Chapter 11: Performance Tuning Chevron down icon Chevron up icon
Part 5 – End-to-End Data Pipelines Chevron down icon Chevron up icon
Chapter 12: Building Batch Pipelines Using Spark and Scala Chevron down icon Chevron up icon
Chapter 13: Building Streaming Pipelines Using Spark and Scala Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.2
(5 Ratings)
5 star 40%
4 star 40%
3 star 20%
2 star 0%
1 star 0%
Loni Apr 04, 2024
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
"Data Engineering with Scala and Spark" offers a comprehensive guide to navigating the complexities of Apache Spark and modern data engineering practices. From fundamental concepts to advanced optimization techniques, each chapter provides clear explanations and practical insights for building efficient data pipelines. With a focus on real-world applications and best practices, this book is essential reading for data engineers and professionals seeking to harness the full potential of Apache Spark in their projects.
Amazon Verified review Amazon
H2N Apr 01, 2024
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
A good resource for who looking to master Scala, Spark, and cloud computing for data engineering. The book covers essential concepts and best practices, it guides readers through setting up environments, developing pipelines, and applying test-driven development and CI/CD and also advanced topics like data transformation, quality checks, and performance tuning with practical examples. Overall, it's a highly valuable resource for anyone aspiring to excel in data engineering.
Amazon Verified review Amazon
Om S Mar 25, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
In "Data Engineering with Scala and Spark," you'll embark on a journey to enhance your data engineering skills using Scala and functional programming techniques. The book focuses on creating continuous and scheduled pipelines for data ingestion, transformation, and aggregation.Key Features:Use Scala to transform data reliably.Learn to build streaming and batch-processing pipelines with clear explanations.Implement CI/CD best practices and test-driven development (TDD).The book covers essential topics like setting up development environments, working with Spark APIs (DataFrame, Dataset, and Spark SQL), data profiling, quality assurance, and pipeline orchestration. It also includes insights into performance tuning and best practices for building robust data pipelines.
Amazon Verified review Amazon
Zheng Zhu Feb 24, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
"Data Engineering with Scala and Spark" is a fantastic survey of the key concepts and practices in modern data engineering with Apache Spark and data lake architectures. I'm a data professional in the software industry and have been working with Apache Spark for close to a decade now, which is even prior to cloud data lakes and platforms like Databricks becoming mainstream. This book does a great job of establishing the foundational concepts with Scala and Spark in its first few chapters, which gives the reader the necessary tools to experiment and extend their knowledge. The progression of the book is easy to follow, which goes toward advanced transformations, data quality, and finally to best practice data engineering patterns. I very much respect its coverage of Spark with the Scala language, as it continues to be the native programming language of Spark itself, and one that has the deepest level of integration and best performance characteristics when it comes to data engineering.One concept I really appreciate from the author in this book is its coverage, albeit somewhat brief, of Test Driven Development and CI/CD. The data engineering industry, in my opinion, has yet to fully adopt and institute the degree of rigor and engineering disciplines that are now pervasive with general software engineering in both backend and frontend settings. As a result, data pipelines of any real complexity for large organizations eventually become very brittle, difficult to manage, and costly to operate. This book plants a great seed in the mind of its readers that these concepts around unit and integration testing via CI/CD with data pipelines are best practices for data engineering and a necessary knowledge area for data engineers in our current environment. I would loved to have seen some concrete samples of full integration tests that tests the logic of Spark transformations, which is an essential practice that typical Spark engineers lack familiarity with.In the concluding parts of the book, the author covers areas on orchestration, performance tuning, and end-to-end pipelines for both batch and streaming modalities. These are deep and advanced concepts, and there certainly can be full books written on each of these topics just by themselves. I like the broad coverage of several orchestration frameworks, giving the users an unbiased perspective on how tools like Airflow, Databricks Workflows, and ADF can be used with Spark. I also support the judicious coverage of some of the key concepts in Spark performance tuning, including data skew, partitioning, and right-sizing compute, which are generally the most important concepts to understand when tuning pipelines.Overall, I recommend this book for readers seeking to gain a deeper level of understanding of what data engineering is about and how to best achieve that with Apache Spark, in addition to the current set of companion platforms and tooling in the data engineering ecosystem. The reader should expect to be able to construct and support cloud-based or local data pipelines from various source modalities with Apache Spark in an end-to-end fashion, which I think makes this book a worthwhile journey.
Amazon Verified review Amazon
fernando Feb 09, 2024
Full star icon Full star icon Full star icon Empty star icon Empty star icon 3
This is a book for a newbie. If you have experience you won’t learn much from it.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.