Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Apache Spark Deep Learning Cookbook
Apache Spark Deep Learning Cookbook

Apache Spark Deep Learning Cookbook: Over 80 best practice recipes for the distributed training and deployment of neural networks using Keras and TensorFlow

eBook
€8.99 €32.99
Paperback
€41.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Table of content icon View table of contents Preview book icon Preview Book

Apache Spark Deep Learning Cookbook

Setting Up Spark for Deep Learning Development

In this chapter, the following recipes will be covered:

  • Downloading an Ubuntu Desktop image
  • Installing and configuring Ubuntu with VMWare Fusion on macOS
  • Installing and configuring Ubuntu with Oracle VirtualBox on Windows
  • Installing and configuring Ubuntu Desktop for Google Cloud Platform
  • Installing and configuring Spark and prerequisites on Ubuntu Desktop
  • Integrating Jupyter notebooks with Spark
  • Starting and configuring a Spark cluster
  • Stopping a Spark cluster

Introduction

Deep learning is the focused study of machine learning algorithms that deploy neural networks as their main method of learning. Deep learning has exploded onto the scene just within the last couple of years. Microsoft, Google, Facebook, Amazon, Apple, Tesla and many other companies are all utilizing deep learning models in their apps, websites, and products. At the same exact time, Spark, an in-memory compute engine running on top of big data sources, has made it easy to process volumes of information at record speeds and ease. In fact, Spark has now become the leading big data development tool for data engineers, machine learning engineers, and data scientists.

Since deep learning models perform better with more data, the synergy between Spark and deep learning allowed for a perfect marriage. Almost as important as the code used to execute deep learning algorithms is the work environment that enables optimal development. Many talented minds are eager to develop neural networks to help answer important questions in their research. Unfortunately, one of the greatest barriers to the development of deep learning models is access to the necessary technical resources required to learn on big data. The purpose of this chapter is to create an ideal virtual development environment for deep learning on Spark.

Downloading an Ubuntu Desktop image

Spark can be set up for all types of operating systems, whether they reside on-premise or in the cloud. For our purposes, Spark will be installed on a Linux-based virtual machine with Ubuntu as the operating system. There are several advantages to using Ubuntu as the go-to virtual machine, not least of which is cost. Since they are based on open source software, Ubuntu operating systems are free to use and do not require licensing. Cost is always a consideration and one of the main goals of this publication is to minimize the financial footprint required to get started with deep learning on top of a Spark framework.

Getting ready

There are some minimum recommendations required for downloading the image file:

  • Minimum of 2 GHz dual-core processor
  • Minimum of 2 GB system memory
  • Minimum of 25 GB of free hard drive space

How to do it...

Follow the steps in the recipe to download an Ubuntu Desktop image:

  1. In order to create a virtual machine of Ubuntu Desktop, it is necessary to first download the file from the official website: https://www.ubuntu.com/download/desktop.
  2. As of this writing, Ubuntu Desktop 16.04.3 is the most recent available version for download.

  1. Access the following file in a .iso format once the download is complete:

    ubuntu-16.04.3-desktop-amd64.iso

How it works...

Virtual environments provide an optimal development workspace by isolating the relationship to the physical or host machine. Developers may be using all types of machines for their host environments such as a MacBook running macOS, a Microsoft Surface running Windows or even a virtual machine on the cloud with Microsoft Azure or AWS; however, to ensure consistency within the output of the code executed, a virtual environment within Ubuntu Desktop will be deployed that can be used and shared among a wide variety of host platforms.

There's more...

There are several options for desktop virtualization software, depending on whether the host environment is on a Windows or a macOS. There are two common software applications for virtualization when using macOS:

  • VMWare Fusion
  • Parallels

See also

Installing and configuring Ubuntu with VMWare Fusion on macOS

This section will focus on building a virtual machine using an Ubuntu operating system with VMWare Fusion.

Getting ready

How to do it...

Follow the steps in the recipe to configure Ubuntu with VMWare Fusion on macOS:

  1. Once VMWare Fusion is up and running, click on the + button on the upper-left-hand side to begin the configuration process and select New..., as seen in the following screenshot:
  1. Once the selection has been made, select the option to Install from Disk or Image, as seen in the following screenshot:
  1. Select the operating system's iso file that was downloaded from the Ubuntu Desktop website, as seen in the following screenshot:
  1. The next step will ask whether you want to choose Linux Easy Install. It is recommended to do so, as well as to incorporate a Display Name/Password combination for the Ubuntu environment, as seen in the following screenshot:
  1. The configuration process is almost complete. A Virtual Machine Summary is displayed with the option to Customize Settings to increase the Memory and Hard Disk, as seen in the following screenshot:
  1. Anywhere from 20 to 40 GB hard disk space is sufficient for the virtual machine; however, bumping up the memory to either 2 GB or even 4 GB will assist with the performance of the virtual machine when executing Spark code in later chapters. Update the memory by selecting Processors and Memory under the Settings of the virtual machine and increasing the Memory to the desired amount, as seen in the following screenshot:

How it works...

The setup allows for manual configuration of the settings necessary to get Ubuntu Desktop up and running successfully on VMWare Fusion. The memory and hard drive storage can be increased or decreased based on the needs and availability of the host machine.

There's more...

All that is remaining is to fire up the virtual machine for the first time, which initiates the installation process of the system onto the virtual machine. Once all the setup is complete and the user has logged in, the Ubuntu virtual machine should be available for development, as seen in the following screenshot:

See also

Aside from VMWare Fusion, there is also another product that offers similar functionality on a Mac. It is called Parallels Desktop for Mac. To learn more about VMWare and Parallels, and decide which program is a better fit for your development, visit the following websites:

Installing and configuring Ubuntu with Oracle VirtualBox on Windows

Unlike with macOS, there are several options to virtualize systems within Windows. This mainly has to do with the fact that virtualization on Windows is very common as most developers are using Windows as their host environment and need virtual environments for testing purposes without affecting any of the dependencies that rely on Windows.

Getting ready

VirtualBox from Oracle is a common virtualization product and is free to use. Oracle VirtualBox provides a straightforward process to get an Ubuntu Desktop virtual machine up and running on top of a Windows environment.

How to do it...

Follow the steps in this recipe to configure Ubuntu with VirtualBox on Windows:

  1. Initiate an Oracle VM VirtualBox Manager. Next, create a new virtual machine by selecting the New icon and specify the Name, Type, and Version of the machine, as seen in the following screenshot:
  1. Select Expert Mode as several of the configuration steps will get consolidated, as seen in the following screenshot:

Ideal memory size should be set to at least 2048 MB, or preferably 4096 MB, depending on the resources available on the host machine.

  1. Additionally, set an optimal hard disk size for an Ubuntu virtual machine performing deep learning algorithms to at least 20 GB, if not more, as seen in the following screenshot:
  1. Point the virtual machine manager to the start-up disk location where the Ubuntu iso file was downloaded to and then Start the creation process, as seen in the following screenshot:
  1. After allotting some time for the installation, select the Start icon to complete the virtual machine and get it ready for development as seen in the following screenshot:

How it works...

The setup allows for manual configuration of the settings necessary to get Ubuntu Desktop up and running successfully on Oracle VirtualBox. As was the case with VMWare Fusion, the memory and hard drive storage can be increased or decreased based on the needs and availability of the host machine.

There's more...

Please note that some machines that run Microsoft Windows are not set up by default for virtualization and users may receive an initial error indicating the VT-x is not enabled. This can be reversed and virtualization may be enabled in the BIOS during a reboot.

See also

To learn more about Oracle VirtualBox and decide whether or not it is a good fit, visit the following website and select Windows hosts to begin the download process: https://www.virtualbox.org/wiki/Downloads.

Installing and configuring Ubuntu Desktop for Google Cloud Platform

Previously, we saw how Ubuntu Desktop could be set up locally using VMWare Fusion. In this section, we will learn how to do the same on Google Cloud Platform.

Getting ready

The only requirement is a Google account username. Begin by logging in to your Google Cloud Platform using your Google account. Google provides a free 12-month subscription with $300 credited to your account. The setup will ask for your bank details; however, Google will not charge you for anything without explicitly letting you know first. Go ahead and verify your bank account and you are good to go.

How to do it...

Follow the steps in the recipe to configure Ubuntu Desktop for Google Cloud Platform:

  1. Once logged in to your Google Cloud Platform, access a dashboard that looks like the one in the following screenshot:
Google Cloud Platform Dashboard
  1. First, click on the product services button in the top-left-hand corner of your screen. In the drop-down menu, under Compute, click on VM instances, as shown in the following screenshot:
  1. Create a new instance and name it. We are naming it ubuntuvm1 in our case. Google Cloud automatically creates a project while launching an instance and the instance will be launched under a project ID. The project may be renamed if required.

  1. After clicking on Create Instance, select the zone/area you are located in.
  2. Select Ubuntu 16.04LTS under the boot disk as this is the operating system that will be installed in the cloud. Please note that LTS stands for version, and will have long-term support from Ubuntu’s developers.
  3. Next, under the boot disk options, select SSD persistent disk and increase the size to 50 GB for some added storage space for the instance, as shown in the following screenshot:
  1. Next, set Access scopes to Allow full access to all Cloud APIs.
  2. Under firewall, please check to allow HTTP traffic as well as allow HTTPS traffic, as shown in the following screenshot:
Selecting options  Allow HTTP traffic and HTTPS Traffic
  1. Once the instance is configured as shown in this section, go ahead and create the instance by clicking on the Create button.
After clicking on the Create button, you will notice that the instance gets created with a unique internal as well as external IP address. We will require this at a later stage. SSH refers to secure shell tunnel, which is basically an encrypted way of communicating in client-server architectures. Think of it as data going to and from your laptop, as well as going to and from Google's cloud servers, through an encrypted tunnel.
  1.  Click on the newly created instance. From the drop-down menu, click on open in browser window, as shown in the following screenshot:
  1. You will see that Google opens up a shell/terminal in a new window, as shown in the following screenshot:
  1. Once the shell is open, you should have a window that looks like the following screenshot:
  1. Type the following commands in the Google cloud shell:
$ sudo apt-get update
$ sudo apt-get upgrade
$ sudo apt-get install gnome-shell
$ sudo apt-get install ubuntu-gnome-desktop
$ sudo apt-get install autocutsel
$ sudo apt-get install gnome-core
$ sudo apt-get install gnome-panel
$ sudo apt-get install gnome-themes-standard
  1. When presented with a prompt to continue or not, type y and select ENTER, as shown in the following screenshot:
  1. Once done with the preceding steps, type the following commands to set up the vncserver and allow connections to the local shell:
$ sudo apt-get install tightvncserver
$ touch ~/.Xresources
  1. Next, launch the server by typing the following command:
$ tightvncserver
  1. This will prompt you to enter a password, which will later be used to log in to the Ubuntu Desktop virtual machine. This password is limited to eight characters and needs to be set and verified, as shown in the following screenshot:
  1. A startup script is automatically generated by the shell, as shown in the following screenshot. This startup script can be accessed and edited by copying and pasting its PATH in the following manner:
  1. In our case, the command to view and edit the script is:
:~$ vim /home/amrith2kmeanmachine/.vnc/xstartup

This PATH may be different in each case. Ensure you set the right PATH. The vim command opens up the script in the text editor on a Mac.

The local shell generated a startup script as well as a log file. The startup script needs to be opened and edited in a text editor, which will be discussed next.
  1. After typing the vim command, the screen with the startup script should look something like this screenshot:
  1. Type i to enter INSERT mode. Next, delete all the text in the startup script. It should then look like the following screenshot:
  1. Copy paste the following code into the startup script:
#!/bin/sh
autocutsel -fork
xrdb $HOME/.Xresources
xsetroot -solid grey
export XKL_XMODMAP_DISABLE=1
export XDG_CURRENT_DESKTOP="GNOME-Flashback:Unity"
export XDG_MENU_PREFIX="gnome-flashback-"
unset DBUS_SESSION_BUS_ADDRESS
gnome-session --session=gnome-flashback-metacity --disable-acceleration-check --debug &
  1. The script should appear in the editor, as seen in the following screenshot:
  1. Press Esc to exit out of INSERT mode and type :wq to write and quit the file.
  2. Once the startup script has been configured, type the following command in the Google shell to kill the server and save the changes:
$ vncserver -kill :1
  1. This command should produce a process ID that looks like the one in the following screenshot:
  1. Start the server again by typing the following command:
$ vncserver -geometry 1024x640

The next series of steps will focus on securing the shell tunnel into the Google Cloud instance from the local host. Before typing anything on the local shell/terminal, ensure that Google Cloud is installed. If not already installed, do so by following the instructions in this quick-start guide located at the following website:

https://cloud.google.com/sdk/docs/quickstart-mac-os-x

  1. Once Google Cloud is installed, open up the terminal on your machine and type the following commands to connect to the Google Cloud compute instance:
$ gcloud compute ssh \
YOUR INSTANCE NAME HERE \
--project YOUR PROJECT NAME HERE \
--zone YOUR TIMEZONE HERE \
--ssh-flag "-L 5901:localhost:5901"
  1. Ensure that the instance name, project ID, and zone are specified correctly in the preceding commands. On pressing ENTER, the output on the local shell changes to what is shown in the following screenshot:
  1. Once you see the name of your instance followed by ":~$", it means that a connection has successfully been established between the local host/laptop and the Google Cloud instance. After successfully SSHing into the instance, we require software called VNC Viewer to view and interact with the Ubuntu Desktop that has now been successfully set up on the Google Cloud Compute engine. The following few steps will discuss how this is achieved.
  1. VNC Viewer may be downloaded using the following link:

https://www.realvnc.com/en/connect/download/viewer/

  1. Once installed, click to open VNC Viewer and in the search bar, type in localhost::5901, as shown in the following screenshot:
  1. Next, click on continue when prompted with the following screen:
  1. This will prompt you to enter your password for the virtual machine. Enter the password that you set earlier while launching the tightvncserver command for the first time, as shown in the following screenshot:
  1. You will finally be taken into the desktop of your Ubuntu virtual machine on Google Cloud Compute. Your Ubuntu Desktop screen must now look something like the following screenshot when viewed on VNC Viewer:

How it works...

You have now successfully set up VNC Viewer for interactions with the Ubuntu virtual machine/desktop. Anytime the Google Cloud instance is not in use, it is recommended to suspend or shut down the instance so that additional costs are not being incurred. The cloud approach is optimal for developers who may not have access to physical resources with high memory and storage.

There's more...

While we discussed Google Cloud as a cloud option for Spark,  it is possible to leverage Spark on the following cloud platforms as well:

  • Microsoft Azure
  • Amazon Web Services

See also

In order to learn more about Google Cloud Platform and sign up for a free subscription, visit the following website:

https://cloud.google.com/

Installing and configuring Spark and prerequisites on Ubuntu Desktop

Before Spark can get up and running, there are some necessary prerequisites that need to be installed on a newly minted Ubuntu Desktop. This section will focus on installing and configuring the following on Ubuntu Desktop:

  • Java 8 or higher
  • Anaconda
  • Spark

Getting ready

The only requirement for this section is having administrative rights to install applications onto the Ubuntu Desktop.

How to do it...

This section walks through the steps in the recipe to install Python 3, Anaconda, and Spark on Ubuntu Desktop:

  1. Install Java on Ubuntu through the terminal application, which can be found by searching for the app and then locking it to the launcher on the left-hand side, as seen in the following screenshot:
  1. Perform an initial test for Java on the virtual machine by executing the following command at the terminal:
java -version
  1. Execute the following four commands at the terminal to install Java:
sudo apt-get install software-properties-common 
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java8-installer
  1. After accepting the necessary license agreements for Oracle, perform a secondary test of Java on the virtual machine by executing java -version once again in the terminal. A successful installation for Java will display the following outcome in the terminal:
$ java -version
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
  1. Next, install the most recent version of Anaconda. Current versions of Ubuntu Desktop come preinstalled with Python. While it is convenient that Python comes preinstalled with Ubuntu, the installed version is for Python 2.7, as seen in the following output:
$ python --version
Python 2.7.12
  1. The current version of Anaconda is v4.4 and the current version of Python 3 is v3.6. Once downloaded, view the Anaconda installation file by accessing the Downloads folder using the following command:
$ cd Downloads/
~/Downloads$ ls
Anaconda3-4.4.0-Linux-x86_64.sh
  1. Once in the Downloads folder, initiate the installation for Anaconda by executing the following command:
~/Downloads$ bash Anaconda3-4.4.0-Linux-x86_64.sh 
Welcome to Anaconda3 4.4.0 (by Continuum Analytics, Inc.)
In order to continue the installation process, please review the license agreement.
Please, press ENTER to continue
Please note that the version of Anaconda, as well as any other software installed, may differ as newer updates are released to the public. The version of Anaconda that we are using in this chapter and in this book can be downloaded from https://repo.continuum.io/archive/Anaconda3-4.4.0-Linux-x86.sh
  1. Once the Anaconda installation is complete, restart the Terminal application to confirm that Python 3 is now the default Python environment through Anaconda by executing python --version in the terminal:
$ python --version
Python 3.6.1 :: Anaconda 4.4.0 (64-bit)
  1. The Python 2 version is still available under Linux, but will require an explicit call when executing a script, as seen in the following command:
~$ python2 --version
Python 2.7.12
  1. Visit the following website to begin the Spark download and installation process:

https://spark.apache.org/downloads.html

  1. Select the download link. The following file will be downloaded to the Downloads folder in Ubuntu:

spark-2.2.0-bin-hadoop2.7.tgz

  1. View the file at the terminal level by executing the following commands:
$ cd Downloads/
~/Downloads$ ls
spark-2.2.0-bin-hadoop2.7.tgz
  1. Extract the tgz file by executing the following command:
~/Downloads$ tar -zxvf spark-2.2.0-bin-hadoop2.7.tgz
  1. Another look at the Downloads directory using ls shows both the tgz file and the extracted folder:
~/Downloads$ ls
spark-2.2.0-bin-hadoop2.7 spark-2.2.0-bin-hadoop2.7.tgz
  1. Move the extracted folder from the Downloads folder to the Home folder by executing the following command:
~/Downloads$ mv spark-2.2.0-bin-hadoop2.7 ~/
~/Downloads$ ls
spark-2.2.0-bin-hadoop2.7.tgz
~/Downloads$ cd
~$ ls
anaconda3 Downloads Pictures Templates
Desktop examples.desktop Public Videos
Documents Music spark-2.2.0-bin-hadoop2.7
  1. Now, the spark-2.2.0-bin-hadoop2.7 folder has been moved to the Home folder, which can be viewed when selecting the Files icon on the left-hand side toolbar, as seen in the following screenshot:
  1. Spark is now installed. Initiate Spark from the terminal by executing the following script at the terminal level:
~$ cd ~/spark-2.2.0-bin-hadoop2.7/
~/spark-2.2.0-bin-hadoop2.7$ ./bin/pyspark
  1. Perform a final test to ensure Spark is up and running at the terminal by executing the following command to ensure that the SparkContext is driving the cluster in the local environment:
>>> sc
<SparkContext master=local[*] appName=PySparkShell>

How it works...

This section explains the reasoning behind the installation process for Python, Anaconda, and Spark.

  1. Spark runs on the Java virtual machine (JVM), the Java Software Development Kit (SDK) is a prerequisite installation for Spark to run on an Ubuntu virtual machine.
In order for Spark to run on a local machine or in a cluster, a minimum version of Java 6 is required for installation.
  1. Ubuntu recommends the sudo apt install method for Java as it ensures that packages downloaded are up to date. 
  2. Please note that if Java is not currently installed, the output in the terminal will show the following message:
    The program 'java' can be found in the following packages:
    * default-jre
    * gcj-5-jre-headless
    * openjdk-8-jre-headless
    * gcj-4.8-jre-headless
    * gcj-4.9-jre-headless
    * openjdk-9-jre-headless
    Try: sudo apt install <selected package>
    1. While Python 2 is fine, it is considered legacy Python. Python 2 is facing an end of life date in 2020; therefore, it is recommended that all new Python development be performed with Python 3, as will be the case in this publication. Up until recently, Spark was only available with Python 2. That is no longer the case. Spark works with both Python 2 and 3. A convenient way to install Python 3, as well as many dependencies and libraries, is through Anaconda. Anaconda is a free and open source distribution of Python, as well as R. Anaconda manages the installation and maintenance of many of the most common packages used in Python for data science-related tasks.
    2. During the installation process for Anaconda, it is important to confirm the following conditions: 

      • Anaconda is installed in the /home/username/Anaconda3 location
      • The Anaconda installer prepends the Anaconda3 install location to a PATH in /home/username/.bashrc
    1. After Anaconda has been installed, download Spark. Unlike Python, Spark does not come preinstalled on Ubuntu and therefore, will need to be downloaded and installed.
    2. For the purposes of development with deep learning, the following preferences will be selected for Spark:

      • Spark release: 2.2.0 (Jul 11 2017)
      • Package type: Prebuilt for Apache Hadoop 2.7 and later
      • Download type: Direct download
    3. Once Spark has been successfully installed, the output from executing Spark at the command line should look something similar to that shown in the following screenshot:
    1. Two important features to note when initializing Spark are that it is under the Python 3.6.1 | Anaconda 4.4.0 (64-bit) | framework and that the Spark logo is version 2.2.0.
    2. Congratulations! Spark is successfully installed on the local Ubuntu virtual machine. But, not everything is complete. Spark development is best when Spark code can be executed within a Jupyter notebook, especially for deep learning. Thankfully, Jupyter has been installed with the Anaconda distribution performed earlier in this section.

    There's more...

    You may be asking why we did not just use pip install pyspark to use Spark in Python. Previous versions of Spark required going through the installation process that we did in this section. Future versions of Spark, starting with 2.2.0 will begin to allow installation directly through the pip approach. We used the full installation method in this section to ensure that you will be able to get Spark installed and fully-integrated, in case you are using an earlier version of Spark.

    See also

    Integrating Jupyter notebooks with Spark

    When learning Python for the first time, it is useful to use Jupyter notebooks as an interactive developing environment (IDE). This is one of the main reasons why Anaconda is so powerful. It fully integrates all of the dependencies between Python and Jupyter notebooks. The same can be done with PySpark and Jupyter notebooks. While Spark is written in Scala, PySpark allows for the translation of code to occur within Python instead.

    Getting ready

    Most of the work in this section will just require accessing the .bashrc script from the terminal.

    How to do it...

    PySpark is not configured to work within Jupyter notebooks by default, but a slight tweak of the .bashrc script can remedy this issue. We will walk through these steps in this section:

    1. Access the .bashrc script by executing the following command:
    $ nano .bashrc
    1. Scrolling all the way to the end of the script should reveal the last command modified, which should be the PATH set by Anaconda during the installation earlier in the previous section. The PATH should appear as seen in the following:
    # added by Anaconda3 4.4.0 installer
    export PATH="/home/asherif844/anaconda3/bin:$PATH"
    1. Underneath, the PATH added by the Anaconda installer can include a custom function that helps communicate the Spark installation with the Jupyter notebook installation from Anaconda3. For the purposes of this chapter and remaining chapters, we will name that function sparknotebook. The configuration should appear as the following for sparknotebook():
    function sparknotebook()
    {
    export SPARK_HOME=/home/asherif844/spark-2.2.0-bin-hadoop2.7
    export PYSPARK_PYTHON=python3
    export PYSPARK_DRIVER_PYTHON=jupyter
    export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
    $SPARK_HOME/bin/pyspark
    }
    1. The updated .bashrc script should look like the following once saved:
    1. Save and exit from the .bashrc file. It is recommended to communicate that the .bashrc file has been updated by executing the following command and restarting the terminal application:
    $ source .bashrc

    How it works...

    Our goal in this section is to integrate Spark directly into a Jupyter notebook so that we are not doing our development at the terminal and instead utilizing the benefits of developing within a notebook. This section explains how the Spark integration within a Jupyter notebook takes place.

    1. We will create a command function, sparknotebook, that we can call from the terminal to open up a Spark session through Jupyter notebooks from the Anaconda installation. This requires two settings to be set in the .bashrc file:
      1. PySpark Python be set to python 3
      2. PySpark driver for python to be set to Jupyter
    2. The sparknotebook function can now be accessed directly from the terminal by executing the following command:
    $ sparknotebook
    1. The function should then initiate a brand new Jupyter notebook session through the default web browser. A new Python script within Jupyter notebooks with a .ipynb extension can be created by clicking on the New button on the right-hand side and by selecting Python 3 under Notebook: as seen in the following screenshot:
    1. Once again, just as was done at the terminal level for Spark, a simple script of sc will be executed within the notebook to confirm that Spark is up and running through Jupyter:
    1. Ideally, the Version, Master, and AppName should be identical to the earlier output when sc was executed at the terminal. If this is the case, then PySpark has been successfully installed and configured to work with Jupyter notebooks.

    There's more...

    It is important to note that if we were to call a Jupyter notebook through the terminal without specifying sparknotebook, our Spark session will never be initiated and we will receive an error when executing the SparkContext script.

    We can access a traditional Jupyter notebook by executing the following at the terminal:

    jupyter-notebook

    Once we start the notebook, we can try and execute the same script for sc.master as we did previously, but this time we will receive the following error:

    See also

    Starting and configuring a Spark cluster

    For most chapters, one of the first things that we will do is to initialize and configure our Spark cluster.

    Getting ready

    Import the following before initializing cluster.

    • from pyspark.sql import SparkSession

    How to do it...

    This section walks through the steps to initialize and configure a Spark cluster.

    1. Import SparkSession using the following script:
    from pyspark.sql import SparkSession
    1. Configure SparkSession with a variable named spark using the following script:
    spark = SparkSession.builder \
    .master("local[*]") \
    .appName("GenericAppName") \
    .config("spark.executor.memory", "6gb") \
    .getOrCreate()

    How it works...

    This section explains how the SparkSession works as an entry point to develop within Spark.

    1. Staring with Spark 2.0, it is no longer necessary to create a SparkConf and SparkContext to begin development in Spark. Those steps are no longer needed as importing SparkSession will handle initializing a cluster.  Additionally, it is important to note that SparkSession is part of the sql module from pyspark.
    2. We can assign properties to our SparkSession:
      1. master: assigns the Spark master URL to run on our local machine with the maximum available number of cores.  
      2. appName: assign a name for the application
      3.  config: assign 6gb to the spark.executor.memory
      4. getOrCreate: ensures that a SparkSession is created if one is not available and retrieves an existing one if it is available

    There's more...

    For development purposes, while we are building an application on smaller datasets, we can just use master("local").  If we were to deploy on a production environment, we would want to specify master("local[*]") to ensure we are using the maximum cores available and get optimal performance.

    See also

    Stopping a Spark cluster

    Once we are done developing on our cluster, it is ideal to shut it down and preserve resources.

    How to do it...

    This section walks through the steps to stop the SparkSession.

    1. Execute the following script:

    spark.stop()

    1. Confirm that the session has closed by executing the following script:

    sc.master

    How it works...

    This section explains how to confirm that a Spark cluster has been shut down.

    1. If the cluster has been shut down, you will receive the error message seen in the following screenshot when executing another Spark command in the notebook:

    There's more...

    Shutting down Spark clusters may not be as critical when working in a local environment; however, it will prove costly when Spark is deployed in a cloud environment where you are charged for compute power.

     

    Left arrow icon Right arrow icon
    Download code icon Download Code

    Key benefits

    • Train distributed complex neural networks on Apache Spark
    • Use TensorFlow and Keras to train and deploy deep learning models
    • Explore practical tips to enhance performance

    Description

    Organizations these days need to integrate popular big data tools such as Apache Spark with highly efficient deep learning libraries if they’re looking to gain faster and more powerful insights from their data. With this book, you’ll discover over 80 recipes to help you train fast, enterprise-grade, deep learning models on Apache Spark. Each recipe addresses a specific problem, and offers a proven, best-practice solution to difficulties encountered while implementing various deep learning algorithms in a distributed environment. The book follows a systematic approach, featuring a balance of theory and tips with best practice solutions to assist you with training different types of neural networks such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs). You’ll also have access to code written in TensorFlow and Keras that you can run on Spark to solve a variety of deep learning problems in computer vision and natural language processing (NLP), or tweak to tackle other problems encountered in deep learning. By the end of this book, you'll have the skills you need to train and deploy state-of-the-art deep learning models on Apache Spark.

    Who is this book for?

    If you’re looking for a practical resource for implementing efficiently distributed deep learning models with Apache Spark, then this book is for you. Knowledge of core machine learning concepts and a basic understanding of the Apache Spark framework is required to get the most out of this book. Some knowledge of Python programming will also be useful.

    What you will learn

    • Set up a fully functional Spark environment
    • Understand practical machine learning and deep learning concepts
    • Employ built-in machine learning libraries within Spark
    • Discover libraries that are compatible with TensorFlow and Keras
    • Explore NLP models such as word2vec and TF-IDF on Spark
    • Organize DataFrames for deep learning evaluation
    • Apply testing and training modeling to ensure accuracy
    • Access readily available code that can be reused
    Estimated delivery fee Deliver to Malta

    Premium delivery 7 - 10 business days

    €32.95
    (Includes tracking information)

    Product Details

    Country selected
    Publication date, Length, Edition, Language, ISBN-13
    Publication date : Jul 13, 2018
    Length: 474 pages
    Edition : 1st
    Language : English
    ISBN-13 : 9781788474221
    Category :
    Languages :
    Concepts :

    What do you get with Print?

    Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
    Product feature icon Paperback book shipped to your preferred address
    Product feature icon Download this book in EPUB and PDF formats
    Product feature icon Access this title in our online reader with advanced features
    Product feature icon DRM FREE - Read whenever, wherever and however you want
    OR
    Modal Close icon
    Payment Processing...
    tick Completed

    Shipping Address

    Billing Address

    Shipping Methods
    Estimated delivery fee Deliver to Malta

    Premium delivery 7 - 10 business days

    €32.95
    (Includes tracking information)

    Product Details

    Publication date : Jul 13, 2018
    Length: 474 pages
    Edition : 1st
    Language : English
    ISBN-13 : 9781788474221
    Category :
    Languages :
    Concepts :

    Packt Subscriptions

    See our plans and pricing
    Modal Close icon
    €18.99 billed monthly
    Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
    Feature tick icon Constantly refreshed with 50+ new titles a month
    Feature tick icon Exclusive Early access to books as they're written
    Feature tick icon Solve problems while you work with advanced search and reference features
    Feature tick icon Offline reading on the mobile app
    Feature tick icon Simple pricing, no contract
    €189.99 billed annually
    Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
    Feature tick icon Constantly refreshed with 50+ new titles a month
    Feature tick icon Exclusive Early access to books as they're written
    Feature tick icon Solve problems while you work with advanced search and reference features
    Feature tick icon Offline reading on the mobile app
    Feature tick icon Choose a DRM-free eBook or Video every month to keep
    Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
    Feature tick icon Exclusive print discounts
    €264.99 billed in 18 months
    Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
    Feature tick icon Constantly refreshed with 50+ new titles a month
    Feature tick icon Exclusive Early access to books as they're written
    Feature tick icon Solve problems while you work with advanced search and reference features
    Feature tick icon Offline reading on the mobile app
    Feature tick icon Choose a DRM-free eBook or Video every month to keep
    Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
    Feature tick icon Exclusive print discounts

    Frequently bought together


    Stars icon
    Total 120.97
    Java Deep Learning Projects
    €41.99
    Apache Spark Deep Learning Cookbook
    €41.99
    Hands-On Deep Learning with Apache Spark
    €36.99
    Total 120.97 Stars icon
    Banner background image

    Table of Contents

    14 Chapters
    Setting Up Spark for Deep Learning Development Chevron down icon Chevron up icon
    Creating a Neural Network in Spark Chevron down icon Chevron up icon
    Pain Points of Convolutional Neural Networks Chevron down icon Chevron up icon
    Pain Points of Recurrent Neural Networks Chevron down icon Chevron up icon
    Predicting Fire Department Calls with Spark ML Chevron down icon Chevron up icon
    Using LSTMs in Generative Networks Chevron down icon Chevron up icon
    Natural Language Processing with TF-IDF Chevron down icon Chevron up icon
    Real Estate Value Prediction Using XGBoost Chevron down icon Chevron up icon
    Predicting Apple Stock Market Cost with LSTM Chevron down icon Chevron up icon
    Face Recognition Using Deep Convolutional Networks Chevron down icon Chevron up icon
    Creating and Visualizing Word Vectors Using Word2Vec Chevron down icon Chevron up icon
    Creating a Movie Recommendation Engine with Keras Chevron down icon Chevron up icon
    Image Classification with TensorFlow on Spark Chevron down icon Chevron up icon
    Other Books You May Enjoy Chevron down icon Chevron up icon

    Customer reviews

    Top Reviews
    Rating distribution
    Full star icon Half star icon Empty star icon Empty star icon Empty star icon 1.7
    (6 Ratings)
    5 star 16.7%
    4 star 0%
    3 star 0%
    2 star 0%
    1 star 83.3%
    Filter icon Filter
    Top Reviews

    Filter reviews by




    Adnan Masood, PhD Aug 16, 2018
    Full star icon Full star icon Full star icon Full star icon Full star icon 5
    The tremendous impact of Artificial Intelligence and Machine learning, and the uncanny effectiveness of deep neural networks are hard to escape in both academia and industry. Meanwhile implementation grade material outlining the deep learning using Spark are not always easy to find. The manuscript you are holding in your hands (or in your e-reader) is an problem-solution oriented approach which not only shows Spark’s capabilities but also the art of possible around various machine learning and deep learning problems.Full disclosure, I am the technical reviewer of the book and wrote the foreword. It was a pleasure reading and reviewing the Ahmed Sherif and Amrith Ravindra’s work which I hope you as a reader will also find very compelling.The authors begin with helping to set up Spark for Deep Learning development by providing clear and concisely written recipes. The initial setup is naturally followed by creating a neural network, elaborating on pain points of Convolutional Neural Networks, and Recurrent Neural Networks. Later on, authors provided practical (yet simplified) use cases of predicting fire department calls with SparkML, real estate value prediction using XGBoost, predicting the stock market cost of Apple with LSTM, and creating a movie recommendation Engine with Keras. The book covers pertinent and highly relevant technologies with operational use cases like LSTMs in Generative Networks, natural language processing with TF-IDF, face recognition using deep convolutional networks, creating and visualizing word vectors using Word2Vec and image classification with TensorFlow on Spark. Beside crisp and focused writing, this wide array of highly relevant machine learning and deep learning topics give the book its core strength.I hope this book will help you leverage Apache Spark to tackle business and technology problems. Highly recommended reading.
    Amazon Verified review Amazon
    Todd Sep 06, 2023
    Full star icon Empty star icon Empty star icon Empty star icon Empty star icon 1
    $55 bucks… not worth it. There’s this website called google
    Amazon Verified review Amazon
    Bethany Sep 06, 2023
    Full star icon Empty star icon Empty star icon Empty star icon Empty star icon 1
    An absolutely disgrace from cover to cover. I’d rather eat the book than use the recipes.
    Amazon Verified review Amazon
    Amanda Sep 02, 2023
    Full star icon Empty star icon Empty star icon Empty star icon Empty star icon 1
    Wouldn’t recommend
    Amazon Verified review Amazon
    Patrick Sullivan Sep 06, 2023
    Full star icon Empty star icon Empty star icon Empty star icon Empty star icon 1
    Total Garbage. Not worth it
    Amazon Verified review Amazon
    Get free access to Packt library with over 7500+ books and video courses for 7 days!
    Start Free Trial

    FAQs

    What is the delivery time and cost of print book? Chevron down icon Chevron up icon

    Shipping Details

    USA:

    '

    Economy: Delivery to most addresses in the US within 10-15 business days

    Premium: Trackable Delivery to most addresses in the US within 3-8 business days

    UK:

    Economy: Delivery to most addresses in the U.K. within 7-9 business days.
    Shipments are not trackable

    Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
    Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

    EU:

    Premium: Trackable delivery to most EU destinations within 4-9 business days.

    Australia:

    Economy: Can deliver to P. O. Boxes and private residences.
    Trackable service with delivery to addresses in Australia only.
    Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
    Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

    Premium: Delivery to addresses in Australia only
    Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

    India:

    Premium: Delivery to most Indian addresses within 5-6 business days

    Rest of the World:

    Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

    Asia:

    Premium: Delivery to most Asian addresses within 5-9 business days

    Disclaimer:
    All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


    Unfortunately, due to several restrictions, we are unable to ship to the following countries:

    1. Afghanistan
    2. American Samoa
    3. Belarus
    4. Brunei Darussalam
    5. Central African Republic
    6. The Democratic Republic of Congo
    7. Eritrea
    8. Guinea-bissau
    9. Iran
    10. Lebanon
    11. Libiya Arab Jamahriya
    12. Somalia
    13. Sudan
    14. Russian Federation
    15. Syrian Arab Republic
    16. Ukraine
    17. Venezuela
    What is custom duty/charge? Chevron down icon Chevron up icon

    Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

    Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

    The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

    List of EU27 countries: www.gov.uk/eu-eea:

    A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

    How do I know my custom duty charges? Chevron down icon Chevron up icon

    The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

    For example:

    • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
    • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
    How can I cancel my order? Chevron down icon Chevron up icon

    Cancellation Policy for Published Printed Books:

    You can cancel any order within 1 hour of placing the order. Simply contact customercare@packt.com with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at customercare@packt.com using the returns and refund process.

    Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

    What is your returns and refunds policy? Chevron down icon Chevron up icon

    Return Policy:

    We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on customercare@packt.com with the order number and issue details as explained below:

    1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on customercare@packt.com within one hour of placing the order and we will replace/refund you the item cost.
    2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on customercare@packt.com who will be able to resolve this issue for you.
    3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
    4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
    5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
    6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

    On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on customercare@packt.com within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

    What tax is charged? Chevron down icon Chevron up icon

    Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

    What payment methods can I use? Chevron down icon Chevron up icon

    You can pay with the following card types:

    1. Visa Debit
    2. Visa Credit
    3. MasterCard
    4. PayPal
    What is the delivery time and cost of print books? Chevron down icon Chevron up icon

    Shipping Details

    USA:

    '

    Economy: Delivery to most addresses in the US within 10-15 business days

    Premium: Trackable Delivery to most addresses in the US within 3-8 business days

    UK:

    Economy: Delivery to most addresses in the U.K. within 7-9 business days.
    Shipments are not trackable

    Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
    Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

    EU:

    Premium: Trackable delivery to most EU destinations within 4-9 business days.

    Australia:

    Economy: Can deliver to P. O. Boxes and private residences.
    Trackable service with delivery to addresses in Australia only.
    Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
    Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

    Premium: Delivery to addresses in Australia only
    Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

    India:

    Premium: Delivery to most Indian addresses within 5-6 business days

    Rest of the World:

    Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

    Asia:

    Premium: Delivery to most Asian addresses within 5-9 business days

    Disclaimer:
    All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


    Unfortunately, due to several restrictions, we are unable to ship to the following countries:

    1. Afghanistan
    2. American Samoa
    3. Belarus
    4. Brunei Darussalam
    5. Central African Republic
    6. The Democratic Republic of Congo
    7. Eritrea
    8. Guinea-bissau
    9. Iran
    10. Lebanon
    11. Libiya Arab Jamahriya
    12. Somalia
    13. Sudan
    14. Russian Federation
    15. Syrian Arab Republic
    16. Ukraine
    17. Venezuela