Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Scala for Data Science

You're reading from   Scala for Data Science Leverage the power of Scala with different tools to build scalable, robust data science applications

Arrow left icon
Product type Paperback
Published in Jan 2016
Publisher
ISBN-13 9781785281372
Length 416 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Pascal Bugnion Pascal Bugnion
Author Profile Icon Pascal Bugnion
Pascal Bugnion
Arrow right icon
View More author details
Toc

Table of Contents (17) Chapters Close

Preface 1. Scala and Data Science FREE CHAPTER 2. Manipulating Data with Breeze 3. Plotting with breeze-viz 4. Parallel Collections and Futures 5. Scala and SQL through JDBC 6. Slick – A Functional Interface for SQL 7. Web APIs 8. Scala and MongoDB 9. Concurrency with Akka 10. Distributed Batch Processing with Spark 11. Spark SQL and DataFrames 12. Distributed Machine Learning with MLlib 13. Web APIs with Play 14. Visualization with D3 and the Play Framework A. Pattern Matching and Extractors Index

What you need for this book

The examples provided in this book require that you have a working Scala installation and SBT, the Simple Build Tool, a command line utility for compiling and running Scala code. We will walk you through how to install these in the next sections.

We do not require a specific IDE. The code examples can be written in your favorite text editor or IDE.

Installing the JDK

Scala code is compiled to Java byte code. To run the byte code, you must have the Java Virtual Machine (JVM) installed, which comes as part of a Java Development Kit (JDK). There are several JDK implementations and, for the purpose of this book, it does not matter which one you choose. You may already have a JDK installed on your computer. To check this, enter the following in a terminal:

$ java -version
java version "1.8.0_66"
Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)

If you do not have a JDK installed, you will get an error stating that the java command does not exist.

If you do have a JDK installed, you should still verify that you are running a sufficiently recent version. The number that matters is the minor version number: the 8 in 1.8.0_66. Versions 1.8.xx of Java are commonly referred to as Java 8. For the first twelve chapters of this book, Java 7 will be sufficient (your version number should be something like 1.7.xx or newer). However, you will need Java 8 for the last two chapters, since the Play framework requires it. We therefore recommend that you install Java 8.

On Mac, the easiest way to install a JDK is using Homebrew:

$ brew install java

This will install Java 8, specifically the Java Standard Edition Development Kit, from Oracle.

Homebrew is a package manager for Mac OS X. If you are not familiar with Homebrew, I highly recommend using it to install development tools. You can find installation instructions for Homebrew on: http://brew.sh.

To install a JDK on Windows, go to http://www.oracle.com/technetwork/java/javase/downloads/index.html (or, if this URL does not exist, to the Oracle website, then click on Downloads and download Java Platform, Standard Edition). Select Windows x86 for 32-bit Windows, or Windows x64 for 64 bit. This will download an installer, which you can run to install the JDK.

To install a JDK on Ubuntu, install OpenJDK with the package manager for your distribution:

$ sudo apt-get install openjdk-8-jdk

If you are running a sufficiently old version of Ubuntu (14.04 or earlier), this package will not be available. In this case, either fall back to openjdk-7-jdk, which will let you run examples in the first twelve chapters, or install the Java Standard Edition Development Kit from Oracle through a PPA (a non-standard package archive):

$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java8-installer

You then need to tell Ubuntu to prefer Java 8 with:

$ sudo update-java-alternatives -s java-8-oracle

Installing and using SBT

The Simple Build Tool (SBT) is a command line tool for managing dependencies and building and running Scala code. It is the de facto build tool for Scala. To install SBT, follow the instructions on the SBT website (http://www.scala-sbt.org/0.13/tutorial/Setup.html).

When you start a new SBT project, SBT downloads a specific version of Scala for you. You, therefore, do not need to install Scala directly on your computer. Managing the entire dependency suite from SBT, including Scala itself, is powerful: you do not have to worry about developers working on the same project having different versions of Scala or of the libraries used.

Since we will use SBT extensively in this book, let's create a simple test project. If you have used SBT previously, do skip this section.

Create a new directory called sbt-example and navigate to it. Inside this directory, create a file called build.sbt. This file encodes all the dependencies for the project. Write the following in build.sbt:

// build.sbt

scalaVersion := "2.11.7"

This specifies which version of Scala we want to use for the project. Open a terminal in the sbt-example directory and type:

$ sbt

This starts an interactive shell. Let's open a Scala console:

> console

This gives you access to a Scala console in the context of your project:

scala> println("Scala is running!")
Scala is running!

Besides running code in the console, we will also write Scala programs. Open an editor in the sbt-example directory and enter a basic "hello, world" program. Name the file HelloWorld.scala:

// HelloWorld.scala

object HelloWorld extends App {
  println("Hello, world!")
}

Return to SBT and type:

> run

This will compile the source files and run the executable, printing "Hello, world!".

Besides compiling and running your Scala code, SBT also manages Scala dependencies. Let's specify a dependency on Breeze, a library for numerical algorithms. Modify the build.sbt file as follows:

// build.sbt

scalaVersion := "2.11.7"

libraryDependencies ++= Seq(
  "org.scalanlp" %% "breeze" % "0.11.2",
  "org.scalanlp" %% "breeze-natives" % "0.11.2"
)

SBT requires that statements be separated by empty lines, so make sure that you leave an empty line between scalaVersion and libraryDependencies. In this example, we have specified a dependency on Breeze version "0.11.2". How did we know to use these coordinates for Breeze? Most Scala packages will quote the exact SBT string to get the latest version in their documentation.

If this is not the case, or you are specifying a dependency on a Java library, head to the Maven Central website (http://mvnrepository.com) and search for the package of interest, for example "Breeze". The website provides a list of packages, including several named breeze_2.xx packages. The number after the underscore indicates the version of Scala the package was compiled for. Click on "breeze_2.11" to get a list of the different Breeze versions available. Choose "0.11.2". You will be presented with a list of package managers to choose from (Maven, Ivy, Leiningen, and so on). Choose SBT. This will print a line like:

libraryDependencies += "org.scalanlp" % "breeze_2.11" % "0.11.2"

These are the coordinates that you will want to copy to the build.sbt file. Note that we just specified "breeze", rather than "breeze_2.11". By preceding the package name with two percentage signs, %%, SBT automatically resolves to the correct Scala version. Thus, specifying %% "breeze" is identical to % "breeze_2.11".

Now return to your SBT console and run:

> reload

This will fetch the Breeze jars from Maven Central. You can now import Breeze in either the console or your scripts (within the context of this Scala project). Let's test this in the console:

> console
scala> import breeze.linalg._
import breeze.linalg._

scala> import breeze.numerics._
import breeze.numerics._

scala> val vec = linspace(-2.0, 2.0, 100)
vec: breeze.linalg.DenseVector[Double] = DenseVector(-2.0, -1.9595959595959596, ...

scala> sigmoid(vec)
breeze.linalg.DenseVector[Double] = DenseVector(0.11920292202211755, 0.12351078065 ...

You should now be able to compile, run and specify dependencies for your Scala scripts.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image