Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Scala and Spark for Big Data Analytics
Scala and Spark for Big Data Analytics

Scala and Spark for Big Data Analytics: Explore the concepts of functional programming, data streaming, and machine learning

Arrow left icon
Profile Icon Karim Profile Icon Sridhar Alla
Arrow right icon
Mex$179.99 Mex$1164.99
Full star icon Full star icon Half star icon Empty star icon Empty star icon 2.8 (12 Ratings)
eBook Jul 2017 796 pages 1st Edition
eBook
Mex$179.99 Mex$1164.99
Paperback
Mex$1456.99
Subscription
Free Trial
Arrow left icon
Profile Icon Karim Profile Icon Sridhar Alla
Arrow right icon
Mex$179.99 Mex$1164.99
Full star icon Full star icon Half star icon Empty star icon Empty star icon 2.8 (12 Ratings)
eBook Jul 2017 796 pages 1st Edition
eBook
Mex$179.99 Mex$1164.99
Paperback
Mex$1456.99
Subscription
Free Trial
eBook
Mex$179.99 Mex$1164.99
Paperback
Mex$1456.99
Subscription
Free Trial

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

Scala and Spark for Big Data Analytics

Introduction to Scala

"I'm Scala. I'm a scalable, functional and object-oriented programming language. I can grow with you and you can play with me by typing one-line expressions and observing the results instantly"

- Scala Quote

In last few years, Scala has observed steady rise and wide adoption by developers and practitioners, especially in the fields of data science and analytics. On the other hand, Apache Spark which is written in Scala is a fast and general engine for large-scale data processing. Spark's success is due to many factors: easy-to-use API, clean programming model, performance, and so on. Therefore, naturally, Spark has more support for Scala: more APIs are available for Scala compared to Python or Java; although, new Scala APIs are available before those for Java, Python, and R.

Now that before we start writing your data analytics program using Spark and Scala (part II), we will first get familiar with Scala's functional programming concepts, object oriented features and the Scala collection APIs in detail (part I). As a starting point, we will provide a brief introduction to Scala in this chapter. We will cover some basic aspects of Scala including it's history and purposes. Then we will see how to install Scala on different platforms including Windows, Linux, and Mac OS so that your data analytics programs can be written on your favourite editors and IDEs. Later in this chapter, we will provide a comparative analysis between Java and Scala. Finally, we will dive into Scala programming with some examples.

In a nutshell, the following topics will be covered:

  • History and purposes of Scala
  • Platforms and editors
  • Installing and setting up Scala
  • Scala: the scalable language
  • Scala for Java programmers
  • Scala for the beginners
  • Summary

History and purposes of Scala

Scala is a general-purpose programming language that comes with support of functional programming and a strong static type system. The source code of Scala is intended to be compiled into Java bytecode, so that the resulting executable code can be run on Java virtual machine (JVM).

Martin Odersky started the design of Scala back in 2001 at the École Polytechnique Fédérale de Lausanne (EPFL). It was an extension of his work on Funnel, which is a programming language that uses functional programming and Petri nets. The first public release appears in 2004 but with only on the Java platform support. Later on, it was followed by .NET framework in June 2004.

Scala has become very popular and experienced wide adoptions because it not only supports the object-oriented programming paradigm, but it also embraces the functional programming concepts. In addition, although Scala's symbolic operators are hardly easy to read, compared to Java, most of the Scala codes are comparatively concise and easy to read -e.g. Java is too verbose.

Like any other programming languages, Scala was prosed and developed for specific purposes. Now, the question is, why was Scala created and what problems does it solve? To answer these questions, Odersky said in his blog:

"The work on Scala stems from a research effort to develop better language support for component software. There are two hypotheses that we would like to validate with the Scala experiment. First, we postulate that a programming language for component software needs to be scalable in the sense that the same concepts can describe small as well as large parts. Therefore, we concentrate on mechanisms for abstraction, composition, and decomposition, rather than adding a large set of primitives, which might be useful for components at some level of scale but not at other levels. Second, we postulate that scalable support for components can be provided by a programming language which unifies and generalizes object-oriented and functional programming. For statically typed languages, of which Scala is an instance, these two paradigms were up to now largely separate."

Nevertheless, pattern matching and higher order functions, and so on, are also provided in Scala, not to fill the gap between FP and OOP, but because they are typical features of functional programming. For this, it has some incredibly powerful pattern-matching features, which are an actor-based concurrency framework. Moreover, it has the support of the first- and higher-order functions. In summary, the name "Scala" is a portmanteau of scalable language, signifying that it is designed to grow with the demands of its users.

Platforms and editors

Scala runs on Java Virtual Machine (JVM), which makes Scala a good choice for Java programmers too who would like to have a functional programming flavor in their codes. There are lots of options when it comes to editors. It's better for you to spend some time making some sort of a comparative study between the available editors because being comfortable with an IDE is one of the key factors for a successful programming experience. Following are some options to choose from:

  • Scala IDE
  • Scala plugin for Eclipse
  • IntelliJ IDEA
  • Emacs
  • VIM

Scala support programming on Eclipse has several advantages using numerous beta plugins. Eclipse provides some exciting features such as local, remote, and high-level debugging facilities with semantic highlighting and code completion for Scala. You can use Eclipse for Java as well as Scala application development with equal ease. However, I would also suggest Scala IDE (http://scala-ide.org/)--it's a full-fledged Scala editor based on Eclipse and customized with a set of interesting features (for example, Scala worksheets, ScalaTest support, Scala refactoring, and so on).

The second best option, in my view, is the IntelliJ IDEA. The first release came in 2001 as the first available Java IDEs with advanced code navigation and refactoring capabilities integrated. According to the InfoWorld report (see at http://www.infoworld.com/article/2683534/development-environments/infoworld-review--top-java-programming-tools.html), out of the four top Java programming IDE (that is, Eclipse, IntelliJ IDEA, NetBeans, and JDeveloper), IntelliJ received the highest test center score of 8.5 out of 10.

The corresponding scoring is shown in the following figure:

Figure 1: Best IDEs for Scala/Java developers

From the preceding figure, you may be interested in using other IDEs such as NetBeans and JDeveloper too. Ultimately, the choice is an everlasting debate among the developers, which means the final choice is yours.

Installing and setting up Scala

As we have already mentioned, Scala uses JVM, therefore make sure you have Java installed on your machine. If not, refer to the next subsection, which shows how to install Java on Ubuntu. In this section, at first, we will show you how to install Java 8 on Ubuntu. Then, we will see how to install Scala on Windows, Mac OS, and Linux.

Installing Java

For simplicity, we will show how to install Java 8 on an Ubuntu 14.04 LTS 64-bit machine. But for Windows and Mac OS, it would be better to invest some time on Google to know how. For a minimum clue for the Windows users: refer to this link for details https://java.com/en/download/help/windows_manual_download.xml.

Now, let's see how to install Java 8 on Ubuntu with step-by-step commands and instructions. At first, check whether Java is already installed:

$ java -version

If it returns The program java cannot be found in the following packages, Java hasn't been installed yet. Then you would like to execute the following command to get rid of:

 $ sudo apt-get install default-jre 

This will install the Java Runtime Environment (JRE). However, if you may instead need the Java Development Kit (JDK), which is usually needed to compile Java applications on Apache Ant, Apache Maven, Eclipse, and IntelliJ IDEA.

The Oracle JDK is the official JDK, however, it is no longer provided by Oracle as a default installation for Ubuntu. You can still install it using apt-get. To install any version, first execute the following commands:

$ sudo apt-get install python-software-properties
$ sudo apt-get update
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update

Then, depending on the version you want to install, execute one of the following commands:

$ sudo apt-get install oracle-java8-installer

After installing, don't forget to set the Java home environmental variable. Just apply the following commands (for the simplicity, we assume that Java is installed at /usr/lib/jvm/java-8-oracle):

$ echo "export JAVA_HOME=/usr/lib/jvm/java-8-oracle" >> ~/.bashrc  
$ echo "export PATH=$PATH:$JAVA_HOME/bin" >> ~/.bashrc
$ source ~/.bashrc

Now, let's see the Java_HOME as follows:

$ echo $JAVA_HOME

You should observe the following result on Terminal:

 /usr/lib/jvm/java-8-oracle

Now, let's check to make sure that Java has been installed successfully by issuing the following command (you might see the latest version!):

$ java -version

You will get the following output:

java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)

Excellent! Now you have Java installed on your machine, thus you're ready Scala codes once it is installed. Let's do this in the next few subsections.

Windows

This part will focus on installing Scala on the PC with Windows 7, but in the end, it won't matter which version of Windows you to run at the moment:

  1. The first step is to download a zipped file of Scala from the official site. You will find it at https://www.Scala-lang.org/download/all.html. Under the other resources section of this page, you will find a list of the archive files from which you can install Scala. We will choose to download the zipped file for Scala 2.11.8, as shown in the following figure:
Figure 2: Scala installer for Windows
  1. After the downloading has finished, unzip the file and place it in your favorite folder. You can also rename the file Scala for navigation flexibility. Finally, a PATH variable needs to be created for Scala to be globally seen on your OS. For this, navigate to Computer | Properties, as shown in the following figure:
Figure 3: Environmental variable tab on windows
  1. Select Environment Variables from there and get the location of the bin folder of Scala; then, append it to the PATH environment variable. Apply the changes and then press OK, as shown in the following screenshot:
Figure 4: Adding environmental variables for Scala
  1. Now, you are ready to go for the Windows installation. Open the CMD and just type scala. If you were successful in the installation process, then you should see an output similar to the following screenshot:
Figure 5: Accessing Scala from "Scala shell"

Mac OS

It's time now to install Scala on your Mac. There are lots of ways in which you can install Scala on your Mac, and here, we are going to mention two of them:

Using Homebrew installer

  1. At first, check your system to see whether it has Xcode installed or not because it's required in this step. You can install it from the Apple App Store free of charge.
  2. Next, you need to install Homebrew from the terminal by running the following command in your terminal:
$ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Note: The preceding command is changed by the Homebrew guys from time to time. If the command doesn't seem to be working, check the Homebrew website for the latest incantation: http://brew.sh/.

  1. Now, you are ready to go and install Scala by typing this command brew install scala in the terminal.
  2. Finally, you are ready to go by simply typing Scala in your terminal (the second line) and you will observe the following on your terminal:
Figure 6: Scala shell on macOS

Installing manually

Before installing Scala manually, choose your preferred version of Scala and download the corresponding .tgz file of that version Scala-verion.tgz from http://www.Scala-lang.org/download/. After downloading your preferred version of Scala, extract it as follows:

$ tar xvf scala-2.11.8.tgz

Then, move it to /usr/local/share as follows:

$ sudo mv scala-2.11.8 /usr/local/share

Now, to make the installation permanent, execute the following commands:

$ echo "export SCALA_HOME=/usr/local/share/scala-2.11.8" >> ~/.bash_profile
$ echo "export PATH=$PATH: $SCALA_HOME/bin" >> ~/.bash_profile

That's it. Now, let's see how it can be done on Linux distributions like Ubuntu in the next subsection.

Linux

In this subsection, we will show you the installation procedure of Scala on the Ubuntu distribution of Linux. Before starting, let's check to make sure Scala is installed properly. Checking this is straightforward using the following command:

$ scala -version

If Scala is already installed on your system, you should get the following message on your terminal:

Scala code runner version 2.11.8 -- Copyright 2002-2016, LAMP/EPFL

Note that, during the writing of this installation, we used the latest version of Scala, that is, 2.11.8. If you do not have Scala installed on your system, make sure you install it before proceeding to the next step. You can download the latest version of Scala from the Scala website at http://www.scala-lang.org/download/ (for a clearer view, refer to Figure 2). For ease, let's download Scala 2.11.8, as follows:

$ cd Downloads/
$ wget https://downloads.lightbend.com/scala/2.11.8/scala-2.11.8.tgz

After the download has been finished, you should find the Scala tar file in the download folder.

The user should first go into the Download directory with the following command: $ cd /Downloads/. Note that the name of the downloads folder may change depending on the system's selected language.

To extract the Scala tar file from its location or more, type the following command. Using this, the Scala tar file can be extracted from the Terminal:

$ tar -xvzf scala-2.11.8.tgz

Now, move the Scala distribution to the user's perspective (for example, /usr/local/scala/share) by typing the following command or doing it manually:

 $ sudo mv scala-2.11.8 /usr/local/share/

Move to your home directory issue using the following command:

$ cd ~

Then, set the Scala home using the following commands:

$ echo "export SCALA_HOME=/usr/local/share/scala-2.11.8" >> ~/.bashrc       
$ echo "export PATH=$PATH:$SCALA_HOME/bin" >> ~/.bashrc

Then, make the change permanent for the session by using the following command:

$ source ~/.bashrc

After the installation has been completed, you should better to verify it using the following command:

$ scala -version

If Scala has successfully been configured on your system, you should get the following message on your terminal:

Scala code runner version 2.11.8 -- Copyright 2002-2016, LAMP/EPFL

Well done! Now, let's enter into the Scala shell by typing the scala command on the terminal, as shown in the following figure:

Figure 7: Scala shell on Linux (Ubuntu distribution)

Finally, you can also install Scala using the apt-get command, as follows:

$ sudo apt-get install scala

This command will download the latest version of Scala (that is, 2.12.x). However, Spark does not have support for Scala 2.12 yet (at least when we wrote this chapter). Therefore, we would recommend the manual installation described earlier.

Scala: the scalable language

The name Scala comes from a scalable language because Scala's concepts scale well to large programs. Some programs in other languages will take tens of lines to be coded, but in Scala, you will get the power to express the general patterns and concepts of programming in a concise and effective manner. In this section, we will describe some exciting features of Scala that Odersky has created for us:

Scala is object-oriented

Scala is a very good example of an object-oriented language. To define a type or behavior for your objects you need to use the notion of classes and traits, which will be explained later, in the next chapter. Scala doesn't support direct multiple inheritances, but to achieve this structure, you need to use Scala's extension of the subclassing and mixing-based composition. This will be discussed in later chapters.

Scala is functional

Functional programming treats functions like first-class citizens. In Scala, this is achieved with syntactic sugar and objects that extend traits (like Function2), but this is how functional programming is achieved in Scala. Also, Scala defines a simple and easy way to define anonymous functions (functions without names). It also supports higher-order functions and it allows nested functions. The syntax of these concepts will be explained in deeper details in the coming chapters.

Also, it helps you to code in an immutable way, and by this, you can easily apply it to parallelism with synchronization and concurrency.

Scala is statically typed

Unlike the other statically typed languages like Pascal, Rust, and so on, Scala does not expect you to provide redundant type information. You don't have to specify the type in most cases. Most importantly, you don't even need to repeat them again.

A programming language is called statically typed if the type of a variable is known at compile time: this also means that, as a programmer, you must specify what the type of each variable is. For example, Scala, Java, C, OCaml, Haskell, and C++, and so on. On the other hand, Perl, Ruby, Python, and so on are dynamically typed languages, where the type is not associated with the variables or fields, but with the runtime values.

The statically typed nature of Scala ensures that all kinds of checking are done by the compiler. This extremely powerful feature of Scala helps you find/catch most trivial bugs and errors at a very early stage, before being executed.

Scala runs on the JVM

Just like Java, Scala is also compiled into bytecode which can easily be executed by the JVM. This means that the runtime platforms of Scala and Java are the same because both generate bytecodes as the compilation output. So, you can easily switch from Java to Scala, you can and also easily integrate both, or even use Scala in your Android application to add a functional flavor.

Note that, while using Java code in a Scala program is quite easy, the opposite is very difficult, mostly because of Scala's syntactic sugar.

Also, just like the javac command, which compiles Java code into bytecode, Scala has the scalas command, which compiles the Scala code into bytecode.

Scala can execute Java code

As mentioned earlier, Scala can also be used to execute your Java code. Not just installing your Java code; it also enables you to use all the available classes from the Java SDK, and even your own predefined classes, projects, and packages right in the Scala environment.

Scala can do concurrent and synchronized processing

Some programs in other languages will take tens of lines to be coded, but in Scala, you will get the power to express the general patterns and concepts of programming in a concise and effective manner. Also, it helps you to code in an immutable way, and by this, you can easily apply it to parallelism with synchronization and concurrency.

Scala for Java programmers

Scala has a set of features that completely differ from Java. In this section, we will discuss some of these features. This section will be helpful for those who are from a Java background or are at least familiar with basic Java syntax and semantics.

All types are objects

As mentioned earlier, every value in Scala will look like an object. This statement means everything looks like an object, but some of them do not actually object and you will see the interpretation of this in the coming chapters (for example, the difference between the reference types and the primitive types still exists in Scala, but it hides it for the most part). For example, in Scala, strings are implicitly converted to collections of characters, but not in Java!

Type inference

If you are not familiar with the term, it is nothing but the deduction of types at compile time. Hold on, isn't that what dynamic typing means? Well, no. Notice that I said deduction of types; this is drastically different from what dynamically typed languages do, and another thing is, it is done at compile time and not runtime. Many languages have this built in, but the implementation varies from one language to another. This might be confusing at the beginning, but it will become clearer with code examples. Let's jump into the Scala REPL for some experimentation.

Scala REPL

The Scala REPL is a powerful feature that makes it more straightforward and concise to write Scala code on the Scala shell. REPL stands for Read-Eval-Print-Loop also called the Interactive Interpreter. This means it is a program for:

  1. Reading the expressions you type in.
  2. Evaluating the expression in step 1 using the Scala compiler.
  3. Printing out the result of the evaluation in step 2.
  4. Waiting (looping) for you to enter further expressions.
Figure 8: Scala REPL example 1

From the figure, it is evident that there is no magic, the variables are inferred automatically to the best types they deem fit at compile time. If you look even more carefully, when I tried to declare:

 i:Int = "hello"

Then, the Scala shell throws an error saying the following:

<console>:11: error: type mismatch;
found : String("hello")
required: Int
val i:Int = "hello"
^

According to Odersky, "Mapping a character to the character map over a RichString should again yield a RichString, as in the following interaction with the Scala REP". The preceding statement can be proved using the following line of code:

scala> "abc" map (x => (x + 1).toChar) 
res0: String = bcd

However, if someone applies a method from Char to Int to a String, then what happens? In that case, Scala converts them, as a vector of integer also called immutable is a feature of Scala collection, as shown in Figure 9. We will look at the details on Scala collection API in Chapter 4, Collections APIs.

"abc" map (x => (x + 1)) 
res1: scala.collection.immutable.IndexedSeq[Int] = Vector(98, 99, 100)

Both static and instance methods of objects are also available. For example, if you declare x as a string hello and then try to access both the static and instance methods of objects x, they are available. In the Scala shell, type x then . and <tab> and then you will find the available methods:

scala> val x = "hello"
x: java.lang.String = hello
scala> x.re<tab>
reduce reduceRight replaceAll reverse
reduceLeft reduceRightOption replaceAllLiterally reverseIterator
reduceLeftOption regionMatches replaceFirst reverseMap
reduceOption replace repr
scala>

Since this is all accomplished on the fly via reflection, even anonymous classes you've only just defined are equally accessible:

scala> val x = new AnyRef{def helloWord = "Hello, world!"}
x: AnyRef{def helloWord: String} = $anon$1@58065f0c
scala> x.helloWord
def helloWord: String
scala> x.helloWord
warning: there was one feature warning; re-run with -feature for details
res0: String = Hello, world!

The preceding two examples can be shown on the Scala shell, as follows:

Figure 9: Scala REPL example 2
"So it turns out that map yields different types depending on what the result type of the passed function argument is!"

- Odersky

Nested functions

Why will you require a nested functions support in your programming language? Most of the time, we want to maintain our methods to be a few lines and avoid overly large functions. A typical solution for this in Java would be to define all these small functions on a class level, but any other method could easily refer and access them even though they are helper methods. The situation is different in Scala, so you can use define functions inside each other, and this way, prevent any external access to these functions:

def sum(vector: List[Int]): Int = {
// Nested helper method (won't be accessed from outside this function
def helper(acc: Int, remaining: List[Int]): Int = remaining match {
case Nil => acc
case _ => helper(acc + remaining.head, remaining.tail)
}
// Call the nested method
helper(0, vector)
}

We are not expecting you to understand these code snippets, which show the difference between Scala and Java.

Import statements

In Java, you can only import packages at the top of your code file, right after the packages statement. The situation is not the same in Scala; you can write your import statements almost anywhere inside your source file (for example, you can even write your import statements inside a class or a method). You just need to pay attention to the scope of your import statement, because it inherits the same scope of the members of your class or local variables inside your method. The _ (underscore) in Scala is used for wildcard imports, which is similar to the * (asterisk) that you would use in java:

// Import everything from the package math 
import math._

You may also use these { } to indicate a set of imports from the same parent package, just in one line of code. In Java, you would use multiple lines of code to do so:

// Import math.sin and math.cos
import math.{sin, cos}

Unlike the Java, Scala does not have the concept of static imports. In other words, the concept of static doesn't exist in Scala. However, as a developer, obviously, you can import a member or more than one member of an object using a regular import statement. The preceding example already shows this, where we import the methods sin and cos from the package object named math. To demonstrate an example, the preceding code snippet can be defined from the Java programmer's perspective as follows:

import static java.lang.Math.sin;
import static java.lang.Math.cos;

Another beauty of Scala is that, in Scala, you can rename your imported packages as well. Alternatively, you can rename your imported packages to avoid the type conflicting with packages that have similar members. The following statement is valid in Scala:

// Import Scala.collection.mutable.Map as MutableMap 
import Scala.collection.mutable.{Map => MutableMap}

Finally, you may want to exclude a member of packages for collisions or other purposes. For this, you can use a wildcard to do so:

// Import everything from math, but hide cos 
import math.{cos => _, _}

Operators as methods

It's worth mentioning that Scala doesn't support the operator overloading. You might think that there are no operators at all in Scala.

An alternative syntax for calling a method taking a single parameter is the use of the infix syntax. The infix syntax provides you with a flavor just like you are applying an operator overloading, as like what you did in C++. For example:

val x = 45
val y = 75

In the following case, the + means a method in class Int. The following code is a non-conventional method calling syntax:

val add1 = x.+(y)

More formally, the same can be done using the infix syntax, as follows:

val add2 = x + y

Moreover, you can utilize the infix syntax. However, the method has only a single parameter, as follows:

val my_result = List(3, 6, 15, 34, 76) contains 5

There's one special case when using the infix syntax. That is, if the method name ends with a : (colon), then the invocation or call will be right associative. This means that the method is called on the right argument with the expression on the left as the argument, instead of the other way around. For example, the following is valid in Scala:

val my_list = List(3, 6, 15, 34, 76)

The preceding statement signifies that: my_list.+:(5) rather than 5.+:(my_list) and more formally:

val my_result = 5 +: my_list

Now, let's look at the preceding examples on Scala REPL:

scala> val my_list = 5 +: List(3, 6, 15, 34, 76)
my_list: List[Int] = List(5, 3, 6, 15, 34, 76)
scala> val my_result2 = 5+:my_list
my_result2: List[Int] = List(5, 5, 3, 6, 15, 34, 76)
scala> println(my_result2)
List(5, 5, 3, 6, 15, 34, 76)
scala>

In addition to the above, operators here are just methods, so that they can simply be overridden just like methods.

Methods and parameter lists

In Scala, a method can have multiple parameter lists or even no parameter list at all. On the other hand, in Java, a method always has one parameter list, with zero or more parameters. For example, in Scala, the following is the valid method definition (written in currie notation) where a method has two parameter lists:

def sum(x: Int)(y: Int) = x + y     

The preceding method cannot be written as:

def sum(x: Int, y: Int) = x + y

A method, let's say sum2, can have no parameter list at all, as follows:

def sum2 = sum(2) _

Now, you can call the method add2, which returns a function taking one parameter. Then, it calls that function with the argument 5, as follows:

val result = add2(5)

Methods inside methods

Sometimes, you would like to make your applications, code modular by avoiding too long and complex methods. Scala provides you this facility to avoid your methods becoming overly large so that you can split them up into several smaller methods.

On the other hand, Java allows you only to have the methods defined at class level. For example, suppose you have the following method definition:

def main_method(xs: List[Int]): Int = {
// This is the nested helper/auxiliary method
def auxiliary_method(accu: Int, rest: List[Int]): Int = rest match {
case Nil => accu
case _ => auxiliary_method(accu + rest.head, rest.tail)
}
}

Now, you can call the nested helper/auxiliary method as follows:

auxiliary_method(0, xs)

Considering the above, here's the complete code segment which is valid:

def main_method(xs: List[Int]): Int = {
// This is the nested helper/auxiliary method
def auxiliary_method(accu: Int, rest: List[Int]): Int = rest match {
case Nil => accu
case _ => auxiliary_method(accu + rest.head, rest.tail)
}
auxiliary_method(0, xs)
}

Constructor in Scala

One surprising thing about Scala is that the body of a Scala class is itself a constructor. However, Scala does so; in fact, in a more explicit way. After that, a new instance of that class is created and executed. Moreover, you can specify the arguments of the constructor in the class declaration line.

Consequently, the constructor arguments are accessible from all of the methods defined in that class. For example, the following class and constructor definition is valid in Scala:

class Hello(name: String) {
// Statement executed as part of the constructor
println("New instance with name: " + name)
// Method which accesses the constructor argument
def sayHello = println("Hello, " + name + "!")
}

The equivalent Java class would look like this:

public class Hello {
private final String name;
public Hello(String name) {
System.out.println("New instance with name: " + name);
this.name = name;
}
public void sayHello() {
System.out.println("Hello, " + name + "!");
}
}

Objects instead of static methods

As mentioned earlier, static does not exist in Scala. You cannot do static imports and neither can you cannot add static methods to classes. In Scala, when you define an object with the same name as the class and in the same source file, then the object is said to be the companion of that class. Functions that you define in this companion object of a class are like static methods of a class in Java:

class HelloCity(CityName: String) {
def sayHelloToCity = println("Hello, " + CityName + "!")
}

This is how you can define a companion object for the class hello:

object HelloCity { 
// Factory method
def apply(CityName: String) = new Hello(CityName)
}

The equivalent class in Java would look like this:

public class HelloCity { 
private final String CityName;
public HelloCity(String CityName) {
this.CityName = CityName;
}
public void sayHello() {
System.out.println("Hello, " + CityName + "!");
}
public static HelloCity apply(String CityName) {
return new Hello(CityName);
}
}

So, lot's of verbose in this simple class, isn't there? The apply method in Scala is treated in a different way, such that you can find a special shortcut syntax to call it. This is the familiar way of calling the method:

val hello1 = Hello.apply("Dublin")

Here's the shortcut syntax that is equivalent to the one earlier:

 val hello2 = Hello("Dublin")

Note that this only works if you used the apply method in your code because Scala treats methods that are named apply in this different way.

Traits

Scala provides a great functionality for you in order to extend and enrich your classes' behaviors. These traits are similar to the interface in which you define the function prototypes or signatures. So, with this, you can have mix-ins of functionality coming from different traits and, in this way, you enriched your classes' behavior. So, what's so good about traits in Scala? They enable the composition of classes from these traits, with traits being the building blocks. As always, let's look at in an example. This is how a conventional logging routine is set up in Java:

Note that, even though you can mix in any number of traits you want. Moreover, like Java, Scala does not have the support of multiple inheritances. However, in both Java and Scala, a subclass can only extend a single superclass. For example, in Java:

class SomeClass {
//First, to have to log for a class, you must initialize it
final static Logger log = LoggerFactory.getLogger(this.getClass());
...
//For logging to be efficient, you must always check, if logging level for current message is enabled
//BAD, you will waste execution time if the log level is an error, fatal, etc.
log.debug("Some debug message");
...
//GOOD, it saves execution time for something more useful
if (log.isDebugEnabled()) { log.debug("Some debug message"); }
//BUT looks clunky, and it's tiresome to write this construct every time you want to log something.
}

For a more detailed discussion, refer to this URL https://stackoverflow.com/questions/963492/in-log4j-does-checking-isdebugenabled-before-logging-improve-performance/963681#963681.

However, it's different with traits. It's very tiresome to always check for the log level being enabled. It would be good, if you could write this routine once and reuse it anywhere, in any class right away. Traits in Scala make this all possible. For example:

trait Logging {
lazy val log = LoggerFactory.getLogger(this.getClass.getName)
//Let's start with info level...
...
//Debug level here...
def debug() {
if (log.isDebugEnabled) log.info(s"${msg}")
}
def debug(msg: => Any, throwable: => Throwable) {
if (log.isDebugEnabled) log.info(s"${msg}", throwable)
}
...
//Repeat it for all log levels you want to use
}

If you look at the preceding code, you will see an example of using string starting with s. This way, Scala offers the mechanism to create strings from your data called String Interpolation.

String Interpolation, allows you to embed variable references directly in processed string literals. For example:
scala> val name = "John Breslin"
scala> println(s"Hello, $name") // Hello, John Breslin.

Now, we can get an efficient logging routine in a more conventional style as a reusable block. To enable logging for any class, we just mix in our Logging trait! Fantastic! Now that's all it takes to add a logging feature to your class:

class SomeClass extends Logging {
...
//With logging trait, no need for declaring a logger manually for every class
//And now, your logging routine is either efficient and doesn't litter the code!

log.debug("Some debug message")
...
}

It is even possible to mix-up multiple traits. For example, for the preceding trait (that is, Logging) you can keep extending in the following order:

trait Logging  {
override def toString = "Logging "
}
class A extends Logging {
override def toString = "A->" + super.toString
}
trait B extends Logging {
override def toString = "B->" + super.toString
}
trait C extends Logging {
override def toString = "C->" + super.toString
}
class D extends A with B with C {
override def toString = "D->" + super.toString
}

However, it is noted that a Scala class can extend multiple traits at once, but JVM classes can extend only one parent class.

Now, to invoke the above traits and classes, use new D() from Scala REPL, as shown in the following figure:

Figure 10: Mixing multiple traits

Everything has gone smoothly so far in this chapter. Now, let's move to a new section where we will discuss some topics for the beginner who wants to drive themselves into the realm of Scala programming.

Scala for the beginners

In this part, you will find that we assume that you have a basic understanding of any previous programming language. If Scala is your first entry into the coding world, then you will find a large set of materials and even courses online that explain Scala for beginners. As mentioned, there are lots of tutorials, videos, and courses out there.

There is a whole Specialization, which contains this course, on Coursera: https://www.coursera.org/specializations/scala. Taught by the creator of Scala, Martin Odersky, this online class takes a somewhat academic approach to teaching the fundamentals of functional programming. You will learn a lot about Scala by solving the programming assignments. Moreover, this specialization includes a course on Apache Spark. Furthermore, Kojo (http://www.kogics.net/sf:kojo) is an interactive learning environment that uses Scala programming to explore and play with math, art, music, animations, and games.

Your first line of code

As a first example, we will use the pretty common Hello, world! program in order to show you how to use Scala and its tools without knowing much about it. Let's open your favorite editor (this example runs on Windows 7, but can be run similarly on Ubuntu or macOS), say Notepad++, and type the following lines of code:

object HelloWorld {
def main(args: Array[String]){
println("Hello, world!")
}
}

Now, save the code with a name, say HelloWorld.scala, as shown in the following figure:

Figure 11: Saving your first Scala source code using Notepad++

Let's compile the source file as follows:

C:\>scalac HelloWorld.scala
C:\>scala HelloWorld
Hello, world!
C:\>

I'm the hello world program, explain me well!

The program should be familiar to anyone who has some programming of experience. It has a main method which prints the string Hello, world! to your console. Next, to see how we defined the main function, we used the def main() strange syntax to define it. def is a Scala keyword to declare/define a method, and we will be covering more about methods and different ways of writing them in the next chapter. So, we have an Array[String] as an argument for this method, which is an array of strings that can be used for initial configurations of your program, and omit is valid. Then, we use the common println() method, which takes a string (or formatted one) and prints it to the console. A simple hello world has opened up many topics to learn; three in particular:

● Methods (covered in a later chapter)
● Objects and classes (covered in a later chapter)
● Type inference - the reason why Scala is a statically typed language - explained earlier

Run Scala interactively!

The scala command starts the interactive shell for you, where you can interpret Scala expressions interactively:

> scala
Welcome to Scala 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121).
Type in expressions for evaluation. Or try :help.
scala>
scala> object HelloWorld {
| def main(args: Array[String]){
| println("Hello, world!")
| }
| }
defined object HelloWorld
scala> HelloWorld.main(Array())
Hello, world!
scala>
The shortcut :q stands for the internal shell command :quit, used to exit the interpreter.

Compile it!

The scalac command, which is similar to javac command, compiles one or more Scala source files and generates a bytecode as output, which then can be executed on any Java Virtual Machine. To compile your hello world object, use the following:

> scalac HelloWorld.scala

By default, scalac generates the class files into the current working directory. You may specify a different output directory using the -d option:

> scalac -d classes HelloWorld.scala

However, note that the directory called classes must be created before executing this command.

Execute it with Scala command

The scala command executes the bytecode that is generated by the interpreter:

$ scala HelloWorld

Scala allows us to specify command options, such as the -classpath (alias -cp) option:

$ scala -cp classes HelloWorld

Before using the scala command to execute your source file(s), you should have a main method that acts as an entry point for your application. Otherwise, you should have an Object that extends Trait Scala.App, then all the code inside this object will be executed by the command. The following is the same Hello, world! example, but using the App trait:

#!/usr/bin/env Scala 
object HelloWorld extends App {
println("Hello, world!")
}
HelloWorld.main(args)

The preceding script can be run directly from the command shell:

./script.sh

Note: we assume here that the file script.sh has the execute permission:

$ sudo chmod +x script.sh

Then, the search path for the scala command is specified in the $PATH environment variable.

Summary

Throughout this chapter, you have learned the basics of the Scala programming language, its features, and available editor. We have also briefly discussed Scala and its syntax. We demonstrated the installation and setting up guidelines for beginners who are new to Scala programming. Later in the chapter, you learned how to write, compile, and execute a sample Scala code. Moreover, a comparative discussion about Scala and Java provided for those who are from a Java background. Here's a short comparison between Scala and Python:

Scala is statically typed, but Python is dynamically typed. Scala (mostly) embraces the functional programming paradigm, while Python doesn't. Python has a unique syntax that lacks most of the parentheses, while Scala (almost) always requires them. In Scala, almost everything is an expression; while this isn't true in Python. However, there are a few points on the upside that are seemingly convoluted. The type complexity is mostly optional. Secondly, according to the documentation provided by https://stackoverflow.com/questions/1065720/what-is-the-purpose-of-scala-programming-language/5828684#5828684, Scala compiler is like free testing and documentation as cyclomatic complexity and lines of code escalate. When aptly implemented Scala can perform otherwise all but impossible operations behind consistent and coherent APIs.

In next the chapter, we will discuss how to improve our experience on the basics to know how Scala implements the object oriented paradigm to allow building modular software systems.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Learn Scala’s sophisticated type system that combines Functional Programming and object-oriented concepts
  • Work on a wide array of applications, from simple batch jobs to stream processing and machine learning
  • Explore the most common as well as some complex use-cases to perform large-scale data analysis with Spark

Description

Scala has been observing wide adoption over the past few years, especially in the field of data science and analytics. Spark, built on Scala, has gained a lot of recognition and is being used widely in productions. Thus, if you want to leverage the power of Scala and Spark to make sense of big data, this book is for you. The first part introduces you to Scala, helping you understand the object-oriented and functional programming concepts needed for Spark application development. It then moves on to Spark to cover the basic abstractions using RDD and DataFrame. This will help you develop scalable and fault-tolerant streaming applications by analyzing structured and unstructured data using SparkSQL, GraphX, and Spark structured streaming. Finally, the book moves on to some advanced topics, such as monitoring, configuration, debugging, testing, and deployment. You will also learn how to develop Spark applications using SparkR and PySpark APIs, interactive data analytics using Zeppelin, and in-memory data processing with Alluxio. By the end of this book, you will have a thorough understanding of Spark, and you will be able to perform full-stack data analytics with a feel that no amount of data is too big.

Who is this book for?

Anyone who wishes to learn how to perform data analysis by harnessing the power of Spark will find this book extremely useful. No knowledge of Spark or Scala is assumed, although prior programming experience (especially with other JVM languages) will be useful to pick up concepts quicker.

What you will learn

  • Understand object-oriented & functional programming concepts of Scala
  • In-depth understanding of Scala collection APIs
  • Work with RDD and DataFrame to learn Spark's core abstractions
  • Analysing structured and unstructured data using SparkSQL and GraphX
  • Scalable and fault-tolerant streaming application development using Spark structured streaming
  • Learn machine-learning best practices for classification, regression, dimensionality reduction, and recommendation system to build predictive models with widely used algorithms in Spark MLlib & ML
  • Build clustering models to cluster a vast amount of data
  • Understand tuning, debugging, and monitoring Spark applications
  • Deploy Spark applications on real clusters in Standalone, Mesos, and YARN

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jul 25, 2017
Length: 796 pages
Edition : 1st
Language : English
ISBN-13 : 9781783550500
Vendor :
Apache
Category :
Languages :
Concepts :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : Jul 25, 2017
Length: 796 pages
Edition : 1st
Language : English
ISBN-13 : 9781783550500
Vendor :
Apache
Category :
Languages :
Concepts :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just Mex$85 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just Mex$85 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total Mex$ 3,939.97
Mastering Apache Spark 2.x
Mex$1128.99
Scala for Machine Learning, Second Edition
Mex$1353.99
Scala and Spark for Big Data Analytics
Mex$1456.99
Total Mex$ 3,939.97 Stars icon
Banner background image

Table of Contents

18 Chapters
Introduction to Scala Chevron down icon Chevron up icon
Object-Oriented Scala Chevron down icon Chevron up icon
Functional Programming Concepts Chevron down icon Chevron up icon
Collection APIs Chevron down icon Chevron up icon
Tackle Big Data – Spark Comes to the Party Chevron down icon Chevron up icon
Start Working with Spark – REPL and RDDs Chevron down icon Chevron up icon
Special RDD Operations Chevron down icon Chevron up icon
Introduce a Little Structure - Spark SQL Chevron down icon Chevron up icon
Stream Me Up, Scotty - Spark Streaming Chevron down icon Chevron up icon
Everything is Connected - GraphX Chevron down icon Chevron up icon
Learning Machine Learning - Spark MLlib and Spark ML Chevron down icon Chevron up icon
My Name is Bayes, Naive Bayes Chevron down icon Chevron up icon
Time to Put Some Order - Cluster Your Data with Spark MLlib Chevron down icon Chevron up icon
Text Analytics Using Spark ML Chevron down icon Chevron up icon
Spark Tuning Chevron down icon Chevron up icon
Time to Go to ClusterLand - Deploying Spark on a Cluster Chevron down icon Chevron up icon
Testing and Debugging Spark Chevron down icon Chevron up icon
PySpark and SparkR Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Half star icon Empty star icon Empty star icon 2.8
(12 Ratings)
5 star 41.7%
4 star 0%
3 star 0%
2 star 8.3%
1 star 50%
Filter icon Filter
Top Reviews

Filter reviews by




Verified Amazon Customer Jul 27, 2017
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Fantastic book! I purchased this book yesterday and already read some chapters and had a quick look to other chapters too. It contains everything needed for learning big data analytics with Spark. It elaborately covers the Scala programming, Sparks basic operations, many machine learning algorithms (classification, regression, clustering, recommendation system), NLP, graph analytics, structured streaming, and of course some other advanced topics of Spark such as tuning, debugging and cluster deployment.More interestingly, it also discussed PySpark, SparkR, Alixuio, and Zeppelin so I don't need to switch to something else. So in summary, it contains everything I was looking for few months.
Amazon Verified review Amazon
Md Ashiqur Rahman Nov 16, 2017
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This is a fantastic book for getting a real understanding about how to develop big data processing applications using Scala and Spark. This book is very up to date containing wide coverage of all the APIs such as Spark SQL, structured streaming, graphX, Spark MLib and more.
Amazon Verified review Amazon
Amazon Customer Aug 01, 2017
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I am a scala enthusiastic, also professional java programmer, now a days im into bigdata stack. I've read the entire book more or less. I would highly recommend this book if someone working in java area but want to write java code in functional (scala) way to solve bigdata problems - scala makes spark processing much easier. I would also recommend this book who is yet to start bigdata coding using scala for spark processing. This book includes a lot of nice, small and real life bigdata analytics problems and how to solve them easily using scala on top of hadoop ecosystem.Thanks to authors for such an useful book - its not just only concepts but also guide to implement concepts.
Amazon Verified review Amazon
Ariel Herrera Oct 03, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
What I love about this book is that the author explains each concept very well and provides numerous examples. Yes, it is a large read however you can jump around in the book based on your need and it is sufficient.
Amazon Verified review Amazon
Julio Bregeiro Nov 18, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Aprovadi
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.