Mastering Spark for Data Science: Lightning fast and scalable data science solutions

Morgan

Bifet

Hallett

Amend

George +1 more

€18.99 per month

4 (2 Ratings)

Paperback Mar 2017 560 pages 1st Edition

Morgan

Bifet

Hallett

Amend

George +1 more

€18.99 per month

4 (2 Ratings)

Paperback Mar 2017 560 pages 1st Edition

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!

Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!

50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

Thousands of reference materials covering every tech concept you need to stay up to date.

Subscribe now

View plans & pricing

View table of contents

Preview Book

Download Code

Key benefits

Develop and apply advanced analytical techniques with Spark
Learn how to tell a compelling story with data science using Spark’s ecosystem
Explore data at scale and work with cutting edge data science methods

Description

Data science seeks to transform the world using data, and this is typically achieved through disrupting and changing real processes in real industries. In order to operate at this level you need to build data science solutions of substance –solutions that solve real problems. Spark has emerged as the big data platform of choice for data scientists due to its speed, scalability, and easy-to-use APIs. This book deep dives into using Spark to deliver production-grade data science solutions. This process is demonstrated by exploring the construction of a sophisticated global news analysis service that uses Spark to generate continuous geopolitical and current affairs insights.You will learn all about the core Spark APIs and take a comprehensive tour of advanced libraries, including Spark SQL, Spark Streaming, MLlib, and more. You will be introduced to advanced techniques and methods that will help you to construct commercial-grade data products. Focusing on a sequence of tutorials that deliver a working news intelligence service, you will learn about advanced Spark architectures, how to work with geographic data in Spark, and how to tune Spark algorithms so they scale linearly.

Who is this book for?

This book is for those who have beginner-level familiarity with the Spark architecture and data science applications, especially those who are looking for a challenge and want to learn cutting edge techniques. This book assumes working knowledge of data science, common machine learning methods, and popular data science tools, and assumes you have previously run proof of concept studies and built prototypes.

What you will learn

Learn the design patterns that integrate Spark into industrialized data science pipelines
See how commercial data scientists design scalable code and reusable code for data science services
Explore cutting edge data science methods so that you can study trends
and causality
Discover advanced programming techniques using RDD and the DataFrame and Dataset APIs
Find out how Spark can be used as a universal ingestion engine tool and as a web scraper
Practice the implementation of advanced topics in graph processing, such as community detection and contact chaining
Get to know the best practices when performing Extended Exploratory Data Analysis, commonly used in commercial data science teams
Study advanced Spark concepts, solution design patterns, and integration architectures
Demonstrate powerful data science pipelines

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!

Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!

50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

Thousands of reference materials covering every tech concept you need to stay up to date.

Subscribe now

View plans & pricing

Frequently bought together

€36.99

Apache Spark 2.x Machine Learning Cookbook

€41.99

€45.99

Total € 124.97

Sumit Pal May 25, 2017

This book if for an intermediate to an expert level knowledge on Spark, Algorithms and Data Science in general. Each of the authors of the book are experts and highly accomplished craftsmen in their respective fields.The indepth coverage in the book in terms of coverage, depth, variety of algorithms and the pure fun, elegance of working with Spark and Scala code - leaves nothing more to be desired from a book of this calibre. The code is well written, and tested and explanations of the reasoning behind the code - why it is used and appropriate usage as per the algorithm makes the book highly readable. I have read numerous books on Spark for Data Processing, Streaming and Machine Learning - and this one stands out in terms of its organization, approach to solving problems in the Data Science space.I highly recommend the book. I have read the book 2 times ( while doing Technical reviewing - I was the technical reviewer of the book ) and again after it was published. I am hooked to reading it again.This book will not teach you Spark in terms of its basics, deployments, performance tuning.

Amazon Verified review

Amanda Jan 12, 2018

There is a definitely a market for Data Science books that are aimed at intermediate/advanced users and there is certainly a wealth of information contained within these pages. The examples were interesting enough to keep me engaged. There is the usual poor Packt editing and there were a few spelling mistakes to annoy the pedants among us.A word of caution though - don't buy this book thinking it will teach you how to use Kafka, Avro, NiFi, Accumulo - you will need to be well versed in how to use these products and link them as well as the usual Hadoop, Spark and Scala if you want to code the examples.

Mastering Spark for Data Science: Lightning fast and scalable data science solutions

What do you get with a Packt Subscription?