Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Hands-On Data Analysis with Pandas
Hands-On Data Analysis with Pandas

Hands-On Data Analysis with Pandas: A Python data science handbook for data collection, wrangling, analysis, and visualization , Second Edition

eBook
€34.99 €38.99
Paperback
€48.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

Hands-On Data Analysis with Pandas

Chapter 1: Introduction to Data Analysis

Before we can begin our hands-on introduction to data analysis with pandas, we need to learn about the fundamentals of data analysis. Those who have ever looked at the documentation for a software library know how overwhelming it can be if you have no clue what you are looking for. Therefore, it is essential that we master not only the coding aspect but also the thought process and workflow required to analyze data, which will prove the most useful in augmenting our skill set in the future.

Much like the scientific method, data science has some common workflows that we can follow when we want to conduct an analysis and present the results. The backbone of this process is statistics, which gives us ways to describe our data, make predictions, and also draw conclusions about it. Since prior knowledge of statistics is not a prerequisite, this chapter will give us exposure to the statistical concepts we will use throughout this book, as well as areas for further exploration.

After covering the fundamentals, we will get our Python environment set up for the remainder of this book. Python is a powerful language, and its uses go way beyond data science: building web applications, software, and web scraping, to name a few. In order to work effectively across projects, we need to learn how to make virtual environments, which will isolate each project's dependencies. Finally, we will learn how to work with Jupyter Notebooks in order to follow along with the text.

The following topics will be covered in this chapter:

  • The fundamentals of data analysis
  • Statistical foundations
  • Setting up a virtual environment

Chapter materials

All the files for this book are on GitHub at https://github.com/stefmolin/Hands-On-Data-Analysis-with-Pandas-2nd-edition. While having a GitHub account isn't necessary to work through this book, it is a good idea to create one, as it will serve as a portfolio for any data/coding projects. In addition, working with Git will provide a version control system and make collaboration easy.

Tip

Check out this article to learn some Git basics: https://www.freecodecamp.org/news/learn-the-basics-of-git-in-under-10-minutes-da548267cc91/.

In order to get a local copy of the files, we have a few options (ordered from least useful to most useful):

  • Download the ZIP file and extract the files locally.
  • Clone the repository without forking it.
  • Fork the repository and then clone it.

This book includes exercises for every chapter; therefore, for those who want to keep a copy of their solutions along with the original content on GitHub, it is highly recommended to fork the repository and clone the forked version. When we fork a repository, GitHub will make a repository under our own profile with the latest version of the original. Then, whenever we make changes to our version, we can push the changes back up. Note that if we simply clone, we don't get this benefit.

The relevant buttons for initiating this process are circled in the following screenshot:

Figure 1.1 – Getting a local copy of the code for following along

Figure 1.1 – Getting a local copy of the code for following along

Important note

The cloning process will copy the files to the current working directory in a folder called Hands-On-Data-Analysis-with-Pandas-2nd-edition. To make a folder to put this repository in, we can use mkdir my_folder && cd my_folder. This will create a new folder (directory) called my_folder and then change the current directory to that folder, after which we can clone the repository. We can chain these two commands (and any number of commands) together by adding && in between them. This can be thought of as and then (provided the first command succeeds).

This repository has folders for each chapter. This chapter's materials can be found at https://github.com/stefmolin/Hands-On-Data-Analysis-with-Pandas-2nd-edition/tree/master/ch_01. While the bulk of this chapter doesn't involve any coding, feel free to follow along in the introduction_to_data_analysis.ipynb notebook on the GitHub website until we set up our environment toward the end of the chapter. After we do so, we will use the check_your_environment.ipynb notebook to get familiar with Jupyter Notebooks and to run some checks to make sure that everything is set up properly for the rest of this book.

Since the code that's used to generate the content in these notebooks is not the main focus of this chapter, the majority of it has been separated into the visual_aids package, which is used to create visuals for explaining concepts throughout the book, and the check_environment.py file. If you choose to inspect these files, don't be overwhelmed; everything that's relevant to data science will be covered in this book.

Every chapter includes exercises; however, for this chapter only, there is an exercises.ipynb notebook, with code to generate some initial data. Knowledge of basic Python will be necessary to complete these exercises. For those who would like to review the basics, make sure to run through the python_101.ipynb notebook, included in the materials for this chapter, for a crash course. The official Python tutorial is a good place to start for a more formal introduction: https://docs.python.org/3/tutorial/index.html.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Perform efficient data analysis and manipulation tasks using pandas 1.x
  • Apply pandas to different real-world domains with the help of step-by-step examples
  • Make the most of pandas as an effective data exploration tool

Description

Extracting valuable business insights is no longer a ‘nice-to-have’, but an essential skill for anyone who handles data in their enterprise. Hands-On Data Analysis with Pandas is here to help beginners and those who are migrating their skills into data science get up to speed in no time. This book will show you how to analyze your data, get started with machine learning, and work effectively with the Python libraries often used for data science, such as pandas, NumPy, matplotlib, seaborn, and scikit-learn. Using real-world datasets, you will learn how to use the pandas library to perform data wrangling to reshape, clean, and aggregate your data. Then, you will learn how to conduct exploratory data analysis by calculating summary statistics and visualizing the data to find patterns. In the concluding chapters, you will explore some applications of anomaly detection, regression, clustering, and classification using scikit-learn to make predictions based on past data. This updated edition will equip you with the skills you need to use pandas 1.x to efficiently perform various data manipulation tasks, reliably reproduce analyses, and visualize your data for effective decision making – valuable knowledge that can be applied across multiple domains.

Who is this book for?

This book is for data science beginners, data analysts, and Python developers who want to explore each stage of data analysis and scientific computing using a wide range of datasets. Data scientists looking to implement pandas in their machine learning workflow will also find plenty of valuable know-how as they progress. You’ll find it easier to follow along with this book if you have a working knowledge of the Python programming language, but a Python crash-course tutorial is provided in the code bundle for anyone who needs a refresher.

What you will learn

  • Understand how data analysts and scientists gather and analyze data
  • Perform data analysis and data wrangling using Python
  • Combine, group, and aggregate data from multiple sources
  • Create data visualizations with pandas, matplotlib, and seaborn
  • Apply machine learning algorithms to identify patterns and make predictions
  • Use Python data science libraries to analyze real-world datasets
  • Solve common data representation and analysis problems using pandas
  • Build Python scripts, modules, and packages for reusable analysis code

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Apr 29, 2021
Length: 788 pages
Edition : 2nd
Language : English
ISBN-13 : 9781800565913
Vendor :
Microsoft
Category :
Languages :
Concepts :
Tools :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : Apr 29, 2021
Length: 788 pages
Edition : 2nd
Language : English
ISBN-13 : 9781800565913
Vendor :
Microsoft
Category :
Languages :
Concepts :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 126.97
Hands-On Data Preprocessing in Python
€41.99
Hands-On Financial Trading with Python
€35.99
Hands-On Data Analysis with Pandas
€48.99
Total 126.97 Stars icon

Table of Contents

19 Chapters
Section 1: Getting Started with Pandas Chevron down icon Chevron up icon
Chapter 1: Introduction to Data Analysis Chevron down icon Chevron up icon
Chapter 2: Working with Pandas DataFrames Chevron down icon Chevron up icon
Section 2: Using Pandas for Data Analysis Chevron down icon Chevron up icon
Chapter 3: Data Wrangling with Pandas Chevron down icon Chevron up icon
Chapter 4: Aggregating Pandas DataFrames Chevron down icon Chevron up icon
Chapter 5: Visualizing Data with Pandas and Matplotlib Chevron down icon Chevron up icon
Chapter 6: Plotting with Seaborn and Customization Techniques Chevron down icon Chevron up icon
Section 3: Applications – Real-World Analyses Using Pandas Chevron down icon Chevron up icon
Chapter 7: Financial Analysis – Bitcoin and the Stock Market Chevron down icon Chevron up icon
Chapter 8: Rule-Based Anomaly Detection Chevron down icon Chevron up icon
Section 4: Introduction to Machine Learning with Scikit-Learn Chevron down icon Chevron up icon
Chapter 9: Getting Started with Machine Learning in Python Chevron down icon Chevron up icon
Chapter 10: Making Better Predictions – Optimizing Models Chevron down icon Chevron up icon
Chapter 11: Machine Learning Anomaly Detection Chevron down icon Chevron up icon
Section 5: Additional Resources Chevron down icon Chevron up icon
Chapter 12: The Road Ahead Chevron down icon Chevron up icon
Solutions Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.6
(14 Ratings)
5 star 78.6%
4 star 7.1%
3 star 7.1%
2 star 7.1%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




Brenton Chang Aug 04, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
As an analyst in a cyber security operation center role, I live and breathe data. The more, the better. Pandas is a natural fit for organizing, navigating and analyzing diverse data at scale. However, if you’ve ever tried leveraging Pandas to do this, you quickly realize how difficult it can be. The documentation is ambiguous and due to the diversity of the how others leverage Pandas it’s difficult to find scenarios and code examples that line up with your needs. Enter “Hands-On Data Analysis with Pandas”. Molin does a great job at organizing and presenting all you need to get started leveraging both pandas and Jupyter notebooks. She also clearly and concisely explains the fundamental of machine learning and statistical analysis. Her mastery is in both understanding the discipline and the libraries used to get the work done. I not only reference the book to help with organizing and analyzing my data, I also reference the book to support my visualization and plotting requirements. There aren’t many books out there that are both this comprehensive and good at teaching a very complex subject. If you are in cyber security you need this book.
Amazon Verified review Amazon
John Renne May 24, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This easy to follow book is exactly what I needed at this stage of my learning curve. I love how the author takes the reader through accessing real world data that are messy and in some cases missing. Accessing real world data with APIs is a tool I appreciate leaning and then seeing the shortcomings of the real world data has been what most other books are missing. This book will better allow me to translate the lessons to my own needs. Well done!
Amazon Verified review Amazon
SJ Jul 14, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
For those new to data science and data analysis in general, the book does a good job of providing just enough background and explanation of the underlying statistical concepts to allow one to understand and gain the most benefit from following along with the examples presented. This is far more useful than just presenting code snippets and pretty plots while skimming over the "why".As a PhD researcher, I use Pandas almost every day and have done for years (previously as a systematic portfolio manager); I still came across several methods and functionalities covered in the book that I either hadn't had too much previous exposure to or that at least were of benefit to provide a solid "knowledge refresher" I wasn't even aware I needed.Overall it's a very solid offering from Stefanie Molin, for whom it was clearly a "labour of love".
Amazon Verified review Amazon
Pranshu Jaryal May 05, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This is a great book for whosoever wants to jump into the domain of Data Analysis. Pandas is the most important Python Library that one should master if he or she wants to excel in this domain. This book covers all the necessary topics under Pandas like Data Wrangling, Aggregation, and Visualization. Apart from the theory, one can also find an Application to work on to practice real-world problems. Therefore if you are a Data Enthusiast then this is a must-read book for you.
Amazon Verified review Amazon
BBCReview Dec 02, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This is a killer book for the Python and data wrangling professional, making all other books look like elementary school treatments. I've read 7 Pandas books and 32 Python books that have Pandas sections. Stepanie Molin's is by far the strongest, most detailed, easiest to follow, best-exampled book, and easiest to understand of any of the 39 books read. All other books pale in comparison to this must-read book from Molin.The datasets are intuitive. Not a boring texty book. Instead, lots of example code appears on every single page, illustrating the features. The story and example code flow together, not skipping around or showing disjointed points. The chapters follow your workflow, from data ingest and EDA to data cleaning, data wrangling, visualizataion, and finally to applications.Thorough treatments are given to data cleaning, data wrangling, and data enrichment as separate topics, going into deep details on how to reshape and reindex data frames, how to do proper joins on data frames, left, right, inner, and outer, and how to do many other data cleaning and wrangling steps. For exaple, you'll learn how to set a new index, and why you should do that. And when inserting rows from different dataframes, you can leave yourself a new indicator column that shows you which table added the row. Pandas has many features like this that professionals should know, and Stephanie Molin shows the "how to".Of course there's a GitHub link so you can download the example datasets. Honestly, I'm only up through data wrangling - have not even reached the financial analysis, machine learning, and advanced visualization code. I can hardly wait to work all the examples in person. (As you know, reading is good, but building the code is by far the most effective way to learn.)Thanks Stephanie for devoting the time to making this a wonderful detailed and usable guide on how to use Pandas to solve my customer's problems. What a joy to read and use. This is the first and best book you should buy for Pandas.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.