Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
The Art of Data-Driven Business
The Art of Data-Driven Business

The Art of Data-Driven Business: Transform your organization into a data-driven one with the power of Python machine learning

Arrow left icon
Profile Icon Palacio
Arrow right icon
NZ$14.99 NZ$49.99
Full star icon Full star icon Full star icon Full star icon Half star icon 4.8 (5 Ratings)
eBook Dec 2022 314 pages 1st Edition
eBook
NZ$14.99 NZ$49.99
Paperback
NZ$61.99
Subscription
Free Trial
Arrow left icon
Profile Icon Palacio
Arrow right icon
NZ$14.99 NZ$49.99
Full star icon Full star icon Full star icon Full star icon Half star icon 4.8 (5 Ratings)
eBook Dec 2022 314 pages 1st Edition
eBook
NZ$14.99 NZ$49.99
Paperback
NZ$61.99
Subscription
Free Trial
eBook
NZ$14.99 NZ$49.99
Paperback
NZ$61.99
Subscription
Free Trial

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

The Art of Data-Driven Business

Analyzing and Visualizing Data with Python

Advanced analytics and data science now play a major role in the majority of businesses. It supports organizations in tracking, managing, and gathering performance metrics to enhance organizational decision-making. Business managers can utilize innovative analysis and machine learning to help them decide how to best engage customers, enhance business performance, and increase sales. Data science and analytics can be utilized to create user-centric products and make wise choices. This can be achieved by comparing various product aspects and studying consumer feedback and market trends to develop goods and services that can draw clients and keep them around for an extended period.

This book is intended for everyone who wants to have an introduction to the techniques and methods of data science, advanced analytics, and machine learning for studying business cases that have been impacted by the use of these methods. The cases shown are heavily based on real use cases, with a demonstrated positive impact in various companies of different sectors. So, anyone who might be considering the application of data science in business operations, regardless of whether they are a seasoned business analyst seeking to enhance their list of skills, or a manager looking for methods that can be applied to maximize certain operations, can benefit from the examples discussed in this book.

In this chapter, we will lay down the initial components that will be used throughout this book to manage the data, manipulate it, and visualize it. Specifically, we will discuss the following:

  • The use of data science in business and the main differences with roles such as business or data analysts
  • The use of statistical programming libraries such as NumPy to apply matrix algebra and statical methods
  • Storing the data in pandas, a library for data analysis and manipulation that is widely used in the context of data science
  • Visualization with Seaborn and how the different types of charts can be used in different kinds of situations

Next, we will discuss the technical requirements that you will need to be able to follow the examples presented in this chapter.

Technical requirements

To be able to follow the steps in this chapter, you will need to meet the following requirements:

  • A Jupyter notebook instance running Python 3.7 and above. You can use the Google Colab notebook to run the steps as well if you have a Google Drive account.
  • A basic understanding of math and statistical concepts.

Using data science and advanced analytics in business

Most of the, time the question of what differentiates a data scientist from a business analyst arises, as both roles focus on attaining insight from data. From a certain perspective, it can be considered that data science involves creating forecasts by analyzing the patterns behind the raw data. Business intelligence is backward-looking and discovers the previous and current trends, while data science is forward-looking and forecasts future trends.

Business decision-making strongly relies on data science and advanced analytics because they help managers understand how decisions affect outcomes. As a result, data scientists are increasingly required to integrate common machine learning technologies with knowledge of the underlying causal linkages. These developments have given rise to positions like that of the decision scientist, a technologist who focuses on using technology to support business and decision-making. When compared to a different employment description known as a “data scientist” or “big data scientist,” however, the phrase “decision scientist” becomes truly meaningful.

Most times, there might be confusion between the roles of business analysts, data scientists, and data analysts. Business analysts are more likely to address business problems and suggest solutions, whereas data analysts typically work more directly with the data itself. Both positions are in high demand and are often well paid, but data science is far more engaged in forecasting since it examines the patterns hidden in the raw data.

Using NumPy for statistics and algebra

NumPy is a Python library used for working with arrays. Additionally, it provides functions for working with matrices, the Fourier transform, and the area of linear algebra. Large, multi-dimensional arrays and matrices are now supported by NumPy, along with a wide range of sophisticated mathematical operations that may be performed on these arrays. They use a huge number of sophisticated mathematical functions to process massive multidimensional arrays and matrices, as well as basic scientific computations in machine learning, which makes them highly helpful. It gives the n-dimensional array, a straightforward yet effective data structure. Learning NumPy is the first step on every Python data scientist’s path because it serves as the cornerstone on which nearly all of the toolkit’s capabilities are constructed.

The array, which is a grid of values all of the same type that’s indexed by a tuple of nonnegative integers, is the fundamental building block utilized by NumPy. Similar to how the dimensions of a matrix are defined in algebra, the array’s rank is determined by its number of dimensions. A tuple of numbers indicating the size of the array along each dimension makes up the shape of an array:

import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr)
print(type(arr))

A NumPy array is a container that can house a certain number of elements, all of which must be of the same type, as was previously specified. The majority of data structures employ arrays to carry out their algorithms. Similar to how you can slice a list, you can also slice a NumPy array, but in more than one dimension. Similar to indexing, slicing a NumPy array returns an array that is a view of the original array.

Slicing in Python means taking elements from one given index to another given index. We can select certain elements of an array by slicing the array using [start:end], where we reference the elements of the array from where we can start and where we want to finish. We can also define the step using [start:end:step]:

print('select elements by index:',arr[0])
print('slice elements of the array:',arr[1:5])
print('ending point of the array:',arr[4:])
print('ending point of the array:',arr[:4])

There are three different sorts of indexing techniques: field access, fundamental slicing, and advanced indexing. Basic slicing is the n-dimensional extension of Python’s fundamental slicing notion. By passing start, stop, and step parameters to the built-in slice function, a Python slice object is created. Writing understandable, clear, and succinct code is made possible through slicing. An iterable element is referred to by its position within the iterable when it is “indexed.” Getting a subset of elements from an iterable, depending on their indices, is referred to as “slicing.”

To combine (concatenate) two arrays, we must copy each element in both arrays to result by using the np.concatenate() function:

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2))
print(arr)

Arrays can be joined using NumPy stack methods as well. We can combine two 1D arrays along the second axis to stack them on top of one another, a process known as stacking. The stack() method receives a list of arrays that we wish to connect with the axis:

arr = np.stack((arr1, arr2), axis=1)
print(arr)

The axis parameter can be used to reference the axis over which we want to make the concatenation:

arr = np.stack((arr1, arr2), axis=0)
print(arr)

The NumPy mean() function is used to compute the arithmetic mean along the specified axis:

np.mean(arr,axis=1)

You need to use the NumPy mean() function with axis=0 to compute the average by column. To compute the average by row, you need to use axis=1:

np.mean(arr,axis=0)

In the next section, we will introduce pandas, a library for data analysis and manipulation. pandas is one of the most extensively used Python libraries in data science, much like NumPy. It offers high-performance, simple-to-use data analysis tools. In contrast to the multi-dimensional array objects provided by the NumPy library, pandas offers an in-memory 2D table object called a DataFrame.

Storing and manipulating data with pandas

pandas is an open-source toolkit built on top of NumPy that offers Python programmers high-performance, user-friendly data structures, and data analysis capabilities. It enables quick analysis, data preparation, and cleaning. It performs and produces at a high level.

pandas is a package for data analysis, and because it includes many built-in auxiliary functions, it is typically used for financial time series data, economic data, and any form of tabular data. For scientific computing, NumPy is a quick way to manage huge multidimensional arrays, and it can be used in conjunction with the SciPy and pandas packages.

Constructing a DataFrame from a dictionary is possible by passing this dictionary to the DataFrame constructor:

import pandas as pd
d = {'col1': [1,5,8, 2], 'col2': [3,3,7, 4]}
df = pd.DataFrame(data=d)
df

The pandas groupby function is a powerful and versatile function that allows us to split data into separate groups to perform computations for better analysis:

df = pd.DataFrame({'Animal': ['Dog', 'Dog',
                              'Rat', 'Rat','Rat'],
                   'Max Speed': [380., 370., 24., 26.,25.],
                   'Max Weight': [10., 8.1, .1, .12,.09]})
df

The three steps of “split,” “apply,” and “combine” make it the simplest to recall what a “groupby” performs. Split refers to dividing your data into distinct groups based on a particular column. As an illustration, we can divide our sales data into months:

df.groupby(['Animal']).mean()

pandas’ groupby technique is extremely potent. Using value counts, you can group by one column and count the values of a different column as a function of this column value. We can count the number of activities each person completed using groupby and value counts:

df.value_counts()

We can also aggregate data over the rows using the aggregate() method, which allows you to apply a function or a list of function names to be executed along one of the axes of the DataFrame. The default is 0, which is the index (row) axis. It’s important to note that the agg() method is an alias of the aggregate() method:

df.agg("mean", axis="rows",numeric_only=True)

We can also pass several functions to be used in each of the selected columns:

df.agg({'Max Speed' : ['sum', 'min'], 'Max Weight' : ['mean', 'max']})

The quantile of the values on a given axis is determined via the quantile() method. The row-level axis is the default. The quantile() method calculates the quantile column-wise and returns the mean value for each row when the column axis is specified (axis='columns'). The following line will give us the 10% quantile across the entire DataFrame:

df.quantile(.1)

We can also pass a list of quantiles:

df.quantile([.1, .5])

The pivot() function is used to reshape a given DataFrame structured by supplied index or column values and is one of the different types of functions that we can use to change the data. Data aggregation is not supported by this function; multiple values produce a MultiIndex in the columns:

df = pd.DataFrame(
{'type': ['one', 'one', 'one', 'two', 'two',  'two'],
 'cat': ['A', 'B', 'C', 'A', 'B', 'C'],
'val': [1, 2, 3, 4, 5, 6],
'letter': ['x', 'y', 'z', 'q', 'w', 't']})
df.pivot(index='type', columns='cat', values='val')

Pivot tables are one of pandas’ most powerful features. A pivot table allows us to draw insights from data. pandas provides a similar function called pivot_table(). It is a simple function but can produce a very powerful analysis very quickly.

The next step for us will be to learn how to visualize the data to create proper storytelling and appropriate interpretations.

Visualizing patterns with Seaborn

Seaborn is a Python data visualization library based on Matplotlib. It offers a sophisticated drawing tool for creating eye-catching and educational statistical visuals.

The primary distinction between Seaborn and Matplotlib is how well Seaborn handles pandas DataFrames. Beautiful graphics are provided in Python by using simple sets of functions. When dealing with DataFrames and arrays, Matplotlib performs well. It views axes and figures as objects. There are several stateful plotting APIs in it.

Here, we will start our examples using the “tips” dataset, which contains a mixture of numeric and categorical variables:

import seaborn as sns
import matplotlib.pyplot as plt
tips = sns.load_dataset("tips")
f, ax = plt.subplots(figsize=(12, 6))
sns.scatterplot(data=tips, x="total_bill", y="tip", hue="time")

In the preceding code snippet, we have imported Seaborn and Matplotlib; the latter allows users to control certain aspects of the plots created, such as the figure size, which we defined as a 12 by 6 inches size. This creates the layout in which Seaborn will place the visualization.

We are using the scatterplot() function to create a visualization of points where the X-axis refers to the total_bill variable and the Y-axis refers to the tip variable. Here, we are using the hue parameter to color the different dots according to the time categorical variable, which allows us to plot numerical data with a categorical dimension:

Figure 1.1: Seaborn scatterplot with the color depending on the categorical variable

Figure 1.1: Seaborn scatterplot with the color depending on the categorical variable

This generated figure shows the distribution of the data according to the color codes that we have specified, which in our case are the tips that were received, their relationship with the total bill amount, and whether it was during lunch or dinner.

The interpretation that we can make is that there might be a linear relationship between the total amount of the bill and the tip received. But if we look closer, we can see that the highest total bill amounts are placed during dinner, also leading to the highest values in tips.

This information can be really useful in the context of business, but it first needs to be validated with proper hypothesis testing approaches, which can be a t-test to validate these hypotheses, plus a linear regression analysis to conclude that there is a relationship between the total amount and the tip distribution, accounting for the differences in the time in which this occurred. We will look into these analyses in the next chapter.

We can now see how a simple exploration graph can help us construct the hypothesis over which we can base decisions to better improve business products or services.

We can also assign hue and style to different variables that will vary colors and markers independently. This allows us to introduce another categorical dimension in the same graph, which in the case of Seaborn can be used with the style parameter, which will assign different types of markers according to our referenced categorical variable:

f, ax = plt.subplots(figsize=(12, 6))
sns.scatterplot(data=tips, x="total_bill", y="tip", hue="day", style="time")

The preceding code snippet will create a layout that’s 12 x 6 inches and will add information about the time categorical variable, as shown in the following graph:

Figure 1.2: Seaborn scatterplot with color and shape depending on the categorical variable

Figure 1.2: Seaborn scatterplot with color and shape depending on the categorical variable

This kind of graph allows us to pack a lot of information into a single plot, which can be beneficial but also can lead to a cluttering of information that can be difficult to digest at once. It is important to always account for the understanding of the information that we want to show, making it easier for the stakeholders to be able to see the relationships at a glance.

Here, it is much more difficult to see any kind of interpretation of the days of the week at first glance. This is because a lot of information is already being shown. These differences that cannot be obtained by simply looking at a graph can be achieved through other kinds of analysis, such as statistical tests, correlations, and causations.

Another way to add more dimensions to the graphics created with Seaborn is to represent numerical variables as the size of the points in the scatterplot. Numerical variables can be assigned to size to apply a semantic mapping to the areas of the points.

We can control the range of marker areas with sizes, and set the legend parameter to full to force every unique value to appear in the legend:

f, ax = plt.subplots(figsize=(12, 6))
sns.scatterplot(
    data=tips, x="total_bill", y="tip", hue="size", 
    size="size", sizes=(20, 200), legend="full"
)

The preceding code snippet creates a scatterplot where the points have a size and color that depends on the size variable. This can be useful to pack another numerical dimension into these kinds of plots:

Figure 1.3: Seaborn scatterplot with size depending on a third variable

Figure 1.3: Seaborn scatterplot with size depending on a third variable

Another important way to represent data is by looking at time series information. We can use the Seaborn package to display time series data without the need to give the data any special treatment.

In the following example, we are creating a pandas DataFrame with dates, using Matplotlib to create a figure that’s 15 x 8 inches, and then using the Seaborn lineplot function to display the information:

df = pd.DataFrame({"Dates":
['01/01/2019','01/02/2019','01/03/2019','01/04/2019',
'01/05/2019','01/06/2019','01/07/2019','01/08/2019'],
"Count": [727,716,668,710,718,732,694,755]})
plt.figure(figsize = (15,8))
sns.lineplot(x = 'Dates', y = 'Count',data = df)

The preceding example creates a wonderful plot with the dates on the x axis and the count variable on the y axis:

Figure 1.4: Seaborn line plot with a time-based axis

Figure 1.4: Seaborn line plot with a time-based axis

For the following example, we will load a pre-defined dataset from Seaborn known as the FMRI dataset, which contains time series data.

First, we will load an example dataset with long-form data and then plot the responses for different events and regions. To do this, we will create a 15 x 8 inches Matplotlib figure and use the lineplot function to show the information, using the hue parameter to display categorical information about the region, and the style parameter to show categorical information about the type of event:

fmri = sns.load_dataset("fmri")
f, ax = plt.subplots(figsize=(15, 8))
sns.lineplot(x="timepoint", y="signal", hue="region", style="event",data=fmri)

The preceding code snippet creates a display of the information that allows us to study how the variables move through time according to the different categorical aspects of the data:

Figure 1.5: Seaborn line plot with confidence intervals

Figure 1.5: Seaborn line plot with confidence intervals

One of the features of the Seaborn lineplot function is that it shows us the confidence intervals of all points within a range of 95% confidence; the solid line represents the main. This way of showing us the information can be really useful when showing time series data that contains multiple data points for each point in time. Trends can be visualized by the mean as well as to give us a sense of the degree of dispersion, which is something that can be important when analyzing behavior patterns.

One of the ways we can visualize data is through bar plots. Seaborn uses the barplot function to create bar plots:

f, ax = plt.subplots(figsize=(12, 6))
ax = sns.barplot(x="day", y="total_bill", data=tips,ci=.9)

The preceding code uses Matplotlib to create a 12 x 6 inches figure where the Seaborn bar plot is created. Here, we will display the days on the x axis and the total bill on the y axis, showing the confidence bars as whiskers above the bars. The preceding code generates the following graph:

Figure 1.6: Seaborn bar plot

Figure 1.6: Seaborn bar plot

In the preceding graph, we cannot see the whiskers in detail as the data has a very small amount of dispersion. We can see this in better detail by drawing a set of vertical bars while grouping them by two variables:

f, ax = plt.subplots(figsize=(12, 6))
ax = sns.barplot(x="day", y="total_bill", hue="sex", data=tips)

The preceding code snippet creates a bar plot on a 12 x 6-inch Matplotlib figure. The difference is that we use the hue parameter to show gender differences:

Figure 1.7: Seaborn bar plot with categorical data

Figure 1.7: Seaborn bar plot with categorical data

One of the conclusions that can be extracted from this graph is that females get to have total bills that are lower than males on average, with Saturday being the only day when there’s a difference between the means, though there’s a much lower basepoint for the dispersion.

We can add another categorical dimension to the visualization using catplot to combine a barplot with a FacetGrid to create multiple plots. This allows us to group within additional categorical variables. Using catplot is safer than using FacetGrid to create multiple graphs as it ensures synchronization of variable order across different facets:

sns.catplot(x="sex", y="total_bill",hue="smoker", col="time",data=tips, kind="bar",height=6, aspect=.7)

The preceding code snippet generates a categorical plot that contains the different bar plots. Note that the size of the graph is controlled using the height and aspect variables instead of via a Matplotlib figure:

Figure 1.8: Seaborn bar plot with two categorical variables

Figure 1.8: Seaborn bar plot with two categorical variables

Here, we can see an interesting trend during lunch, where the mean of the male smokers is lower than the non-smokers, while the female smoker’s mean is higher than those of non-smokers. This tendency is inverted during dinner when there are more male smokers on average than female smokers.

Analyzing trends using histograms is a wonderful tool to be used while analyzing patterns. We can use them with the Searbon hisplot function. Here, we will use the penguins dataset and create a Matplotlib figure that’s 12 x 6 inches:

penguins = sns.load_dataset("penguins")
f, ax = plt.subplots(figsize=(12, 6))
sns.histplot(data=penguins, x="flipper_length_mm", bins=30)

The preceding code creates a histogram of the flipper length grouping data in 30 bins:

Figure 1.9: Seaborn histogram plot

Figure 1.9: Seaborn histogram plot

Here, we can add a kernel density line estimate, which softens the histogram, providing more information about the shape of the data distribution.

The following code adds the kde parameter set to True to show this line:

f, ax = plt.subplots(figsize=(12, 6))
sns.histplot(data=penguins, x="flipper_length_mm", kde=True)
Figure 1.10: Seaborn histogram plot with KDE estimated density

Figure 1.10: Seaborn histogram plot with KDE estimated density

Here, we can see that the data approaches some superimposed standard distribution, which can mean that we are looking at different kinds of data.

We can also add more dimensions to the graph by using the hue parameter on the categorical species variable:

f, ax = plt.subplots(figsize=(12, 6))
sns.histplot(data=penguins, x="flipper_length_mm", hue="species")
Figure 1.11: Seaborn histogram plot with categorical data

Figure 1.11: Seaborn histogram plot with categorical data

As suspected, we were looking at the superposition of different species of penguins, each of which has a normal distribution, though some of them are more skewed than others.

The pairplot function can be used to plot several paired bivariate distributions in a dataset. The diagonal plots are the univariate plots, and this displays the relationship for the (n, 2) combination of variables in a DataFrame as a matrix of plots. pairplot is used to determine the most distinct clusters or the best combination of features to explain the relationship between two variables. Constructing a linear separation or some simple lines in our dataset also helps to create some basic classification models:

sns.pairplot(penguins,height=3)

The preceding line of code creates a pairplot of the data where each box has a height of 3 inches:

Figure 1.12: Variable relationship and histogram of selected features

Figure 1.12: Variable relationship and histogram of selected features

The variable names are shown on the matrix’s outer borders, making it easy to comprehend. The density plot for each variable is shown in the boxes along the diagonals. The scatterplot between each variable is displayed in the boxes in the lower left corner.

We can also use the hue parameter to add categorical dimensions to the visualization:

sns.pairplot(penguins, hue="species", diag_kind="hist",height=3)
Figure 1.13: Variable relationship and histogram with categorical labels

Figure 1.13: Variable relationship and histogram with categorical labels

Although incredibly useful, this graph can be very computationally expensive, which can be solved by looking only at some of the variables instead of the whole dataset.

We can reduce the time required to render the visualization by reducing the number of graphs shown. We can do this by specifying the types of variables we want to show in each axis, as shown in the following block of code:

sns.pairplot(
    penguins,
    x_vars=["bill_length_mm", "bill_depth_mm", 
           "flipper_length_mm"],
    y_vars=["bill_length_mm", "bill_depth_mm"],
    height=3
)
Figure 1.14: Variable relationship and histogram of selected features

Figure 1.14: Variable relationship and histogram of selected features

A box plot, sometimes referred to as a box-and-whisker plot in descriptive statistics, is a type of chart that is frequently used in explanatory data analysis. Box plots use the data’s quartiles (or percentiles) and averages to visually depict the distribution of numerical data and skewness.

We can use them in Seaborn using the boxplot function, as shown here:

f, ax = plt.subplots(figsize=(12, 6))
ax = sns.boxplot(x="day", y="total_bill", data=tips)
Figure 1.15: Seaborn box plot

Figure 1.15: Seaborn box plot

The seaborn box plot has a very simple structure. Distributions are represented visually using box plots. When you want to compare data between two groups, they are helpful. A box plot may also be referred to as a box-and-whisker plot. Any box displays the dataset’s quartiles, and the whiskers extend to display the remainder of the distribution.

Here, we can specify a type of categorical variable we might want to show using the hue parameter, as well as specify the palette of colors we want to use from Seaborn’s default options:

f, ax = plt.subplots(figsize=(12, 6))
ax = sns.boxplot(x="day", y="total_bill", hue="smoker",data=tips, palette="Set3")
Figure 1.16: Seaborn box plot with categorical data

Figure 1.16: Seaborn box plot with categorical data

There is always the question of when you would use a box plot. Box plots are used to display the distributions of numerical data values, particularly when comparing them across various groups. They are designed to give high-level information at a glance and provide details like the symmetry, skew, variance, and outliers of a set of data.

Summary

In this chapter, we introduced the initial concepts of how we can store and manipulate data with pandas and NumPy, and how to visualize data patterns using Seaborn. These elements are used not only to explore the data but to be able to create visual narratives that allow us to understand patterns in the data and to be able to communicate simply and practically.

In the next chapter, we will build upon this to understand how machine learning and descriptive statistics can be used to validate hypotheses, study correlations and causations, as well as to make predictive models.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Learn and practice various clustering techniques to gather market insights Explore real-life use cases from the business world to contextualize your learning
  • Work your way through practical recipes that will reinforce what you have learned

Description

One of the most valuable contributions of data science is toward helping businesses make the right decisions. Understanding this complicated confluence of two disparate worlds, as well as a fiercely competitive market, calls for all the guidance you can get. The Art of Data-Driven Business is your invaluable guide to gaining a business-driven perspective, as well as leveraging the power of machine learning (ML) to guide decision-making in your business. This book provides a common ground of discussion for several profiles within a company. You’ll begin by looking at how to use Python and its many libraries for machine learning. Experienced data scientists may want to skip this short introduction, but you’ll soon get to the meat of the book and explore the many and varied ways ML with Python can be applied to the domain of business decisions through real-world business problems that you can tackle by yourself. As you advance, you’ll gain practical insights into the value that ML can provide to your business, as well as the technical ability to apply a wide variety of tried-and-tested ML methods. By the end of this Python book, you’ll have learned the value of basing your business decisions on data-driven methodologies and have developed the Python skills needed to apply what you’ve learned in the real world.

Who is this book for?

This book is for data scientists, machine learning engineers and developers, data engineers, and business decision makers who want to apply data science for business process optimization and develop the skills needed to implement data science projects in marketing, sales, pricing, customer success, ad tech, and more from a business perspective. Other professionals looking to explore how data science can be used to improve business operations, as well as individuals with technical skills who want to back their technical proposal with a strong business case will also find this book useful.

What you will learn

  • Create effective dashboards with the seaborn library
  • Predict whether a customer will cancel their subscription to a service
  • Analyze key pricing metrics with pandas
  • Recommend the right products to your customers
  • Determine the costs and benefits of promotions
  • Segment your customers using clustering algorithms

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Dec 02, 2022
Length: 314 pages
Edition : 1st
Language : English
ISBN-13 : 9781804618738
Category :
Concepts :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : Dec 02, 2022
Length: 314 pages
Edition : 1st
Language : English
ISBN-13 : 9781804618738
Category :
Concepts :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just NZ$7 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just NZ$7 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total NZ$ 185.97
Modern Time Series Forecasting with Python
NZ$77.99
Data Literacy in Practice
NZ$45.99
The Art of Data-Driven Business
NZ$61.99
Total NZ$ 185.97 Stars icon
Banner background image

Table of Contents

16 Chapters
Part 1: Data Analytics and Forecasting with Python Chevron down icon Chevron up icon
Chapter 1: Analyzing and Visualizing Data with Python Chevron down icon Chevron up icon
Chapter 2: Using Machine Learning in Business Operations Chevron down icon Chevron up icon
Part 2: Market and Customer Insights Chevron down icon Chevron up icon
Chapter 3: Finding Business Opportunities with Market Insights Chevron down icon Chevron up icon
Chapter 4: Understanding Customer Preferences with Conjoint Analysis Chevron down icon Chevron up icon
Chapter 5: Selecting the Optimal Price with Price Demand Elasticity Chevron down icon Chevron up icon
Chapter 6: Product Recommendation Chevron down icon Chevron up icon
Part 3: Operation and Pricing Optimization Chevron down icon Chevron up icon
Chapter 7: Predicting Customer Churn Chevron down icon Chevron up icon
Chapter 8: Grouping Users with Customer Segmentation Chevron down icon Chevron up icon
Chapter 9: Using Historical Markdown Data to Predict Sales Chevron down icon Chevron up icon
Chapter 10: Web Analytics Optimization Chevron down icon Chevron up icon
Chapter 11: Creating a Data-Driven Culture in Business Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.8
(5 Ratings)
5 star 80%
4 star 20%
3 star 0%
2 star 0%
1 star 0%
Ian Klosowicz Mar 08, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book is an excellent resource for anyone who wants to learn how to use Python and its many libraries for machine learning. One of the best things about this book is that it provides real-world business problems which are very valuable if you want to create a portfolio. Overall, it's a great book to add to your data toolbelt!
Amazon Verified review Amazon
Emmanuel Elom Oct 16, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Leveraging data-driven insights is essential in the dynamic world of modern business. The full spectrum of data analytics and its applications is covered in this book. These topics enable businesses to make wise judgments, from machine learning for optimizing corporate operations to Python-based data analysis and visualization, comprehending market insights, client preferences, and price elasticity. The critical role data plays in encouraging success and progress in the corporate sector is further highlighted by investigating topics like customer segmentation, web analytics, and developing a data-driven culture. Finally, I would suggest the book to anyone attempting to circumvent the issues with real-world data analytics.
Amazon Verified review Amazon
Steven Fernandes Feb 13, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book presents a comprehensive guide to various data analysis and visualization techniques, utilizing the Python libraries Pandas and Seaborn, including machine learning with Sci-Kit Learn, customer and product segmentation, estimating customer satisfaction, predicting churn and conversion, performing pricing analytics, forecasting sales, and optimizing promotion and stock. The book also includes a focus on improving digital marketing strategy, creating effective dashboards with Seaborn, and interviewing business leaders. Additionally, it provides practical applications, such as predicting customer cancellations, recommending products to customers, and analyzing pricing metrics. Lastly, it covers segmentation using clustering algorithms and determining the costs and benefits of promotions.
Amazon Verified review Amazon
Nikhil Feb 13, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book explains how we can make the most of the data by optimising solutions in our day to day business problems. This hands-on approach provides the reader with a tangible understanding of the value that data-driven methodologies can bring to a business. The book is well-suited for business decision-makers who want to apply data solutions to their business processes. The book's clear explanations and practical examples make it an invaluable resource for anyone looking to gain a deeper understanding of the topic.
Amazon Verified review Amazon
Tynan Mar 09, 2023
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
Super informative, giving tutorials on useful analysis code from import to finish. Super clear an concise directions.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.