Apache Spark for Data Science Cookbook

Chapter 2. Tricky Statistics with Spark

In this chapter, you will learn the following recipes:

Working with Pandas
Variable identification
Sampling data
Summary and descriptive statistics
Generating frequency tables
Installing Pandas on Linux
Installing Pandas from source
Using IPython with PySpark
Creating Pandas DataFrames over Spark
Splitting, slicing, sorting, filtering and grouping DataFrames over Spark.
Implementing co-variance and correlation using DataFrames over Spark.
Concatenating and merging operations over DataFrames
Complex operations over DataFrames.
Sparkling Pandas

pavan kumar jalla Sep 10, 2019

As a big data engineer for 3 years in the industry, I was looking around for a solid hands on book for data science, this book has great content and well structred right from the beginning till the end, which takes you a deep dive into data science concepts, appreciate the author for sharing her knowledge.would recommend to anyone who is looking for practical data science approach.

Amazon Verified review

Brandon Jan 23, 2017

This book represents a useful resource to learn Spark programming model and how to employ it in several tasks. The approach followed is very practical, with code provided in every chapter, which guarantees a fast learning process. As technical reviewer of this book I feel to suggest it to people who want to understand how to perform data exploration, analysis and visualization tasks in Spark. With the many use cases covered in the book, it will represent a resource to inspire solutions for daily working tasks.

Dimitri Shvorob Jun 01, 2017

I would dismiss a five-star review by the book's technical reviewer - conflict of interest, anyone? - and "Apache Spark for Data Science Cookbook" is not a five-star book. It is, however, a decent book which compensates for the Packt-standard weakness of explanations with a thoughtful collection of (Scala) code, paying attention to the less glamorous but essential job of data manipulation. And yet, I hesitate to recommend it, and feel that a combo of "Machine Learning with Spark" by Pentreath and "Spark for Data Science" by Duvvuri and Singhal would be a better choice. I would suggest getting all three and deciding which one(s) to leave.

Santanu Feb 25, 2017

This book does not improve you spark knowledge. Only bunch of code with input and output. No proper comments on code.