Packt+ | Advance your knowledge in tech

You're reading from Big Data Analysis with Python Combine Spark and Python to unlock the powers of parallel computing and machine learning

Product type Paperback

Published in Apr 2019

Publisher Packt

ISBN-13 9781789955286

Length 276 pages

Edition 1st Edition

Languages

Python

Tools

Combine

Concepts

Big Data

Authors (3):

Ivan Marin

Ankit Shukla

Sarang VK

View More author details

Table of Contents (11) Chapters

Big Data Analysis with Python

Preface

1. The Python Data Science Stack

2. Statistical Visualizations FREE CHAPTER

3. Working with Big Data Frameworks

4. Diving Deeper with Spark

5. Handling Missing Values and Correlation Analysis

6. Exploratory Data Analysis

7. Reproducibility in Big Data Analysis

8. Creating a Full Analysis Report

Appendix

Pandas DataFrames and Grouped Data

As we learned in the previous chapter, when analyzing data and using Pandas to do so, we can use the plot functions from Pandas or use Matplotlib directly. Pandas uses Matplotlib under the hood, so the integration is great. Depending on the situation, we can either plot directly from pandas or create a figure and an axes with Matplotlib and pass it to pandas to plot. For example, when doing a GroupBy, we can separate the data into a GroupBy key. But how can we plot the results of GroupBy? We have a few approaches at our disposal. We can, for example, use pandas directly, if the DataFrame is already in the right format:

Note

The following code is a sample and will not get executed.

fig, ax = plt.subplots()
df = pd.read_csv('data/dow_jones_index.data')
df[df.stock.isin(['MSFT', 'GE', 'PG'])].groupby('stock')['volume'].plot(ax=ax)

Or we can just plot each GroupBy key on the same plot:

fig, ax = plt.subplots()
df.groupby('stock').volume.plot(ax=ax)

For the following...

Tech Concepts

Programming languages

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

You have been reading a chapter from

Big Data Analysis with Python

Published in: Apr 2019

Publisher: Packt

ISBN-13: 9781789955286

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (3)

Ivan Marin

Ivan Marin is a systems architect and data scientist working at Daitan Group, a Campinas-based software company. He designs big data systems for large volumes of data and implements machine learning pipelines end to end using Python and Spark. He is also an active organizer of data science, machine learning, and Python in So Paulo, and has given Python for data science courses at university level.

See other products by Ivan Marin

Ankit Shukla

Ankit Shukla is a data scientist working with World Wide Technology, a leading US-based technology solution provider, where he develops and deploys machine learning and artificial intelligence solutions to solve business problems and create actual dollar value for clients. He is also part of the company's R&D initiative, which is responsible for producing intellectual property, building capabilities in new areas, and publishing cutting-edge research in corporate white papers. Besides tinkering with AI/ML models, he likes to read and is a big-time foodie.

See other products by Ankit Shukla

Sarang VK

Sarang VK is a lead data scientist at StraitsBridge Advisors, where his responsibilities include requirement gathering, solutioning, development, and productization of scalable machine learning, artificial intelligence, and analytical solutions using open source technologies. Alongside this, he supports pre-sales and competency.

See other products by Sarang VK