Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Newsletter Hub

Free Learning

You're reading from Mastering pandas A complete guide to pandas, from installation to advanced data analysis techniques

Product type Paperback

Published in Oct 2019

Publisher

ISBN-13 9781789343236

Length 674 pages

Edition 2nd Edition

Languages

Python

Tools

Pandas

Concepts

Data Analysis

Author (1):

Ashish Kumar

View More author details

Table of Contents (21) Chapters

Preface

1. Section 1: Overview of Data Analysis and pandas FREE CHAPTER

2. Introduction to pandas and Data Analysis

3. Installation of pandas and Supporting Software

4. Section 2: Data Structures and I/O in pandas

5. Using NumPy and Data Structures with pandas

6. I/Os of Different Data Formats with pandas

7. Section 3: Mastering Different Data Operations in pandas

8. Indexing and Selecting in pandas

9. Grouping, Merging, and Reshaping Data in pandas

10. Special Data Operations in pandas

11. Time Series and Plotting Using Matplotlib

12. Section 4: Going a Step Beyond with pandas

13. Making Powerful Reports In Jupyter Using pandas

14. A Tour of Statistics with pandas and NumPy

15. A Brief Tour of Bayesian Statistics and Maximum Likelihood Estimates

16. Data Case Studies Using pandas

17. The pandas Library Architecture

18. pandas Compared with Other Tools

19. A Brief Tour of Machine Learning

20. Other Books You May Enjoy

Leave a review - let other readers know what you think

Managing sparse data

Sparse data refers to data structures such as arrays, series, DataFrames, and panels in which there is a very high proportion of missing data or NaNs.

Let's create a sparse DataFrame:

df = pd.DataFrame(np.random.randn(100, 3))
df.iloc[:95] = np.nan

This DataFrame has NaNs in 95% of the records. The memory usage of this data can be estimated with the following code:

df.memory_usage()

Take a look at the following output:

Memory usage of a DataFrame with 95% NaNs

As we can see, each element consumes 8 bytes of data, irrespective of whether it is actual data or a NaN. Pandas offers a memory-efficient solution for handling sparse data, as depicted in the following code:

sparse_df = df.to_sparse()
sparse_df.memory_usage()

Take a look at the following output:

Memory usage of sparse data

Now, the memory usage has come down, with memory not being allotted to...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €18.99/month. Cancel anytime

Authors (1)

Kumar

Ashish Kumar is a seasoned data science professional, a publisher author and a thought leader in the field of data science and machine learning. An IIT Madras graduate and a Young India Fellow, he has around 7 years of experience in implementing and deploying data science and machine learning solutions for challenging industry problems in both hands-on and leadership roles. Natural Language Procession, IoT Analytics, R Shiny product development, Ensemble ML methods etc. are his core areas of expertise. He is fluent in Python and R and teaches a popular ML course at Simplilearn. When not crunching data, Ashish sneaks off to the next hip beach around and enjoys the company of his Kindle. He also trains and mentors data science aspirants and fledgling start-ups.

See other products by Kumar