Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
The Art of Data-Driven Business

You're reading from   The Art of Data-Driven Business Transform your organization into a data-driven one with the power of Python machine learning

Arrow left icon
Product type Paperback
Published in Dec 2022
Publisher Packt
ISBN-13 9781804611036
Length 314 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Alan Bernardo Palacio Alan Bernardo Palacio
Author Profile Icon Alan Bernardo Palacio
Alan Bernardo Palacio
Arrow right icon
View More author details
Toc

Table of Contents (17) Chapters Close

Preface 1. Part 1: Data Analytics and Forecasting with Python
2. Chapter 1: Analyzing and Visualizing Data with Python FREE CHAPTER 3. Chapter 2: Using Machine Learning in Business Operations 4. Part 2: Market and Customer Insights
5. Chapter 3: Finding Business Opportunities with Market Insights 6. Chapter 4: Understanding Customer Preferences with Conjoint Analysis 7. Chapter 5: Selecting the Optimal Price with Price Demand Elasticity 8. Chapter 6: Product Recommendation 9. Part 3: Operation and Pricing Optimization
10. Chapter 7: Predicting Customer Churn 11. Chapter 8: Grouping Users with Customer Segmentation 12. Chapter 9: Using Historical Markdown Data to Predict Sales 13. Chapter 10: Web Analytics Optimization 14. Chapter 11: Creating a Data-Driven Culture in Business 15. Index 16. Other Books You May Enjoy

Storing and manipulating data with pandas

pandas is an open-source toolkit built on top of NumPy that offers Python programmers high-performance, user-friendly data structures, and data analysis capabilities. It enables quick analysis, data preparation, and cleaning. It performs and produces at a high level.

pandas is a package for data analysis, and because it includes many built-in auxiliary functions, it is typically used for financial time series data, economic data, and any form of tabular data. For scientific computing, NumPy is a quick way to manage huge multidimensional arrays, and it can be used in conjunction with the SciPy and pandas packages.

Constructing a DataFrame from a dictionary is possible by passing this dictionary to the DataFrame constructor:

import pandas as pd
d = {'col1': [1,5,8, 2], 'col2': [3,3,7, 4]}
df = pd.DataFrame(data=d)
df

The pandas groupby function is a powerful and versatile function that allows us to split data into separate groups to perform computations for better analysis:

df = pd.DataFrame({'Animal': ['Dog', 'Dog',
                              'Rat', 'Rat','Rat'],
                   'Max Speed': [380., 370., 24., 26.,25.],
                   'Max Weight': [10., 8.1, .1, .12,.09]})
df

The three steps of “split,” “apply,” and “combine” make it the simplest to recall what a “groupby” performs. Split refers to dividing your data into distinct groups based on a particular column. As an illustration, we can divide our sales data into months:

df.groupby(['Animal']).mean()

pandas’ groupby technique is extremely potent. Using value counts, you can group by one column and count the values of a different column as a function of this column value. We can count the number of activities each person completed using groupby and value counts:

df.value_counts()

We can also aggregate data over the rows using the aggregate() method, which allows you to apply a function or a list of function names to be executed along one of the axes of the DataFrame. The default is 0, which is the index (row) axis. It’s important to note that the agg() method is an alias of the aggregate() method:

df.agg("mean", axis="rows",numeric_only=True)

We can also pass several functions to be used in each of the selected columns:

df.agg({'Max Speed' : ['sum', 'min'], 'Max Weight' : ['mean', 'max']})

The quantile of the values on a given axis is determined via the quantile() method. The row-level axis is the default. The quantile() method calculates the quantile column-wise and returns the mean value for each row when the column axis is specified (axis='columns'). The following line will give us the 10% quantile across the entire DataFrame:

df.quantile(.1)

We can also pass a list of quantiles:

df.quantile([.1, .5])

The pivot() function is used to reshape a given DataFrame structured by supplied index or column values and is one of the different types of functions that we can use to change the data. Data aggregation is not supported by this function; multiple values produce a MultiIndex in the columns:

df = pd.DataFrame(
{'type': ['one', 'one', 'one', 'two', 'two',  'two'],
 'cat': ['A', 'B', 'C', 'A', 'B', 'C'],
'val': [1, 2, 3, 4, 5, 6],
'letter': ['x', 'y', 'z', 'q', 'w', 't']})
df.pivot(index='type', columns='cat', values='val')

Pivot tables are one of pandas’ most powerful features. A pivot table allows us to draw insights from data. pandas provides a similar function called pivot_table(). It is a simple function but can produce a very powerful analysis very quickly.

The next step for us will be to learn how to visualize the data to create proper storytelling and appropriate interpretations.

You have been reading a chapter from
The Art of Data-Driven Business
Published in: Dec 2022
Publisher: Packt
ISBN-13: 9781804611036
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image