You're reading from Mastering Spark for Data Science Lightning fast and scalable data science solutions

Product type Paperback

Published in Mar 2017

Publisher Packt

ISBN-13 9781785882142

Length 560 pages

Edition 1st Edition

Tools

Apache Spark

Concepts

Data Science

Authors (5):

David George

Matthew Hallett

Antoine Amend

Andrew Morgan

Albert Bifet

+1 more

View More author details

Table of Contents (15) Chapters

Preface

1. The Big Data Science Ecosystem FREE CHAPTER

2. Data Acquisition

3. Input Formats and Schema

4. Exploratory Data Analysis

5. Spark for Geographic Analysis

6. Scraping Link-Based External Data

7. Building Communities

8. Building a Recommendation System

9. News Dictionary and Real-Time Tagging System

10. Story De-duplication and Mutation

11. Anomaly Detection on Sentiment Analysis

12. TrendCalculus

13. Secure Data

14. Scalable Algorithms

Conventions

In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "The next lines of code read the link and assign it to the to the BeautifulSoup function."

A block of code is set as follows:

import org.apache.spark.sql.functions._      
 
val rdd = rawDS map GdeltParser.toCaseClass    
val ds = rdd.toDS()     
  
// DataFrame-style API 
ds.agg(avg("goldstein")).as("goldstein").show()

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

spark.sql("SELECT V2GCAM FROM GKG LIMIT 5").show 
spark.sql("SELECT AVG(GOLDSTEIN) AS GOLDSTEIN FROM GKG WHERE GOLDSTEIN IS NOT NULL").show()

Any command-line input or output is written as follows:

$ cat 20150218230000.gkg.csv | gawk -F"\t" '{print $4}'

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "In order to download new modules, we will go to Files | Settings | Project Name | Project Interpreter."