Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Pandas Cookbook

You're reading from   Pandas Cookbook Recipes for Scientific Computing, Time Series Analysis and Data Visualization using Python

Arrow left icon
Product type Paperback
Published in Oct 2017
Publisher Packt
ISBN-13 9781784393878
Length 532 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Theodore Petrou Theodore Petrou
Author Profile Icon Theodore Petrou
Theodore Petrou
Arrow right icon
View More author details
Toc

Table of Contents (12) Chapters Close

Preface 1. Pandas Foundations FREE CHAPTER 2. Essential DataFrame Operations 3. Beginning Data Analysis 4. Selecting Subsets of Data 5. Boolean Indexing 6. Index Alignment 7. Grouping for Aggregation, Filtration, and Transformation 8. Restructuring Data into a Tidy Form 9. Combining Pandas Objects 10. Time Series Analysis 11. Visualization with Matplotlib, Pandas, and Seaborn

Dissecting the anatomy of a DataFrame

Before diving deep into pandas, it is worth knowing the components of the DataFrame. Visually, the outputted display of a pandas DataFrame (in a Jupyter Notebook) appears to be nothing more than an ordinary table of data consisting of rows and columns. Hiding beneath the surface are the three components--the index, columns, and data (also known as values) that you must be aware of in order to maximize the DataFrame's full potential.

Getting ready

This recipe reads in the movie dataset into a pandas DataFrame and provides a labeled diagram of all its major components.

How to do it...

  1. Use the read_csv function to read in the movie dataset, and display the first five rows with the head method:
>>> movie = pd.read_csv('data/movie.csv')
>>> movie.head()
  1. Analyze the labeled anatomy of the DataFrame:

How it works...

Pandas first reads the data from disk into memory and into a DataFrame using the excellent and versatile read_csv function. The output for both the columns and the index is in bold font, which makes them easy to identify. By convention, the terms index label and column name refer to the individual members of the index and columns, respectively. The term index refers to all the index labels as a whole just as the term columns refers to all the column names as a whole.

The columns and the index serve a particular purpose, and that is to provide labels for the columns and rows of the DataFrame. These labels allow for direct and easy access to different subsets of data. When multiple Series or DataFrames are combined, the indexes align first before any calculation occurs. Collectively, the columns and the index are known as the axes.

A DataFrame has two axes--a vertical axis (the index) and a horizontal axis(the columns). Pandas borrows convention from NumPy and uses the integers 0/1 as another way of referring to the vertical/horizontal axis.

DataFrame data (values) is always in regular font and is an entirely separate component from the columns or index. Pandas uses NaN (not a number) to represent missing values. Notice that even though the color column has only string values, it uses NaN to represent a missing value.

The three consecutive dots in the middle of the columns indicate that there is at least one column that exists but is not displayed due to the number of columns exceeding the predefined display limits.

The Python standard library contains the csv module, which can be used to parse and read in data. The pandas read_csv function offers a powerful increase in performance and functionality over this module.

There's more...

The head method accepts a single parameter, n, which controls the number of rows displayed. Similarly, the tail method returns the last n rows.

See also

You have been reading a chapter from
Pandas Cookbook
Published in: Oct 2017
Publisher: Packt
ISBN-13: 9781784393878
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image