Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
The Data Analysis Workshop
The Data Analysis Workshop

The Data Analysis Workshop: Solve business problems with state-of-the-art data analysis models, developing expert data analysis skills along the way

Arrow left icon
Profile Icon Gururajan Govindan Profile Icon Konstantin Palagachev Profile Icon Shubhangi Hora
Arrow right icon
$9.99 $26.99
Full star icon Full star icon Full star icon Full star icon Half star icon 4.4 (21 Ratings)
eBook Jul 2020 626 pages 1st Edition
eBook
$9.99 $26.99
Paperback
$38.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon Gururajan Govindan Profile Icon Konstantin Palagachev Profile Icon Shubhangi Hora
Arrow right icon
$9.99 $26.99
Full star icon Full star icon Full star icon Full star icon Half star icon 4.4 (21 Ratings)
eBook Jul 2020 626 pages 1st Edition
eBook
$9.99 $26.99
Paperback
$38.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
$9.99 $26.99
Paperback
$38.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

The Data Analysis Workshop

2. Absenteeism at Work

Overview

In this chapter, you will perform standard data analysis techniques, such as estimating conditional probabilities, Bayes' theorem, and Kolmogorov-Smirnov tests, for distribution comparison. You will also implement data transformation techniques, such as the Box-Cox and Yeo-Johnson transformations, and apply these techniques to a given dataset.

Introduction

In the previous chapter, we looked at some of the main techniques that are used in data analysis. We saw how hypothesis testing can be used when analyzing data, we got a brief introduction to visualizations, and finally, we explored some concepts related to time series analysis. In this chapter, we will elaborate on some of the topics we've already looked at (such as plotting and hypothesis testing) while introducing new ones coming from probability theory and data transformations.

Nowadays, work relationships are becoming more and more trust-oriented, and conservative contracts (in which working time is strictly monitored) are being replaced with more agile ones in which the employee themselves is responsible for accounting working time. This liberty may lead to unregulated absenteeism and may reflect poorly on an employee's candidature, even if absent hours can be accounted for with genuine reasons. This can significantly undermine healthy working relationships...

Initial Data Analysis

As a rule of thumb, when starting the analysis of a new dataset, it is good practice to check the dimensionality of the data, type of columns, possible missing values, and some generic statistics on the numerical columns. We can also get the first 5 to 10 entries in order to acquire a feeling for the data itself. We'll perform these steps in the following code snippets:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
# import data from the GitHub page of the book
data = pd.read_csv('https://raw.githubusercontent.com'\
                   '/PacktWorkshops/The-Data-Analysis-Workshop'\
                   '/master/Chapter02/data/'\
           ...

Initial Analysis of the Reason for Absence

Let's start with a simple analysis of the Reason for absence column. We will try to address questions such as, what is the most common reason for absence? Does being a drinker or smoker have some effect on the causes? Does the distance to work have some effect on the reasons? And so on. Starting with these types of questions is often important when performing data analysis, as this is a good way to obtain confidence and understanding of the data.

The first thing we are interested in is the overall distribution of the absence reasons in the data—that is, how many entries we have for a specific reason for absence in our dataset. We can easily address this question by using the countplot() function from the seaborn package:

# get the number of entries for each reason for absence
plt.figure(figsize=(10, 5))
ax = sns.countplot(data=preprocessed_data, x="Reason for absence")
ax.set_ylabel("Number of entries per...

Analysis of Social Drinkers and Smokers

Let's begin with an analysis of the impact of being a drinker or smoker on employee absenteeism. As smoking and frequent drinking have a negative impact on health conditions, we would expect that certain diseases are more frequent in smokers and drinkers than others. Note that in the absenteeism dataset, 56% of the registered employees are drinkers, while only 7% are smokers. We can produce a figure, similar to Figure 2.6 for the social drinkers and smokers with the following code:

# plot reasons for absence against being a social drinker/smoker
plt.figure(figsize=(8, 6))
sns.countplot(data=preprocessed_data, x="Reason for absence", \
              hue="Social drinker", hue_order=["Yes", "No"])
plt.savefig('figs/absence_reasons_drinkers.png', \
            format...

Body Mass Index

The Body Mass Index (BMI) is defined as a person's weight in kilograms, divided by the square of their height in meters:

Figure 2.28: Expression for BMI

BMI is a universal way to classify people as underweight, healthy weight, overweight, and obese, based on tissue mass (muscle, fat, and bone) and height. The following plot indicates the relationship between weight and height for the various categories:

Figure 2.29: Body Mass Index categories (source: https://en.wikipedia.org/wiki/Body_mass_index)

According to the preceding plot, we can build the four categories (underweight, healthy weight, overweight, and obese) based on the BMI values:

"""
define function for computing the BMI category, based on BMI value
"""
def get_bmi_category(bmi):
    if bmi < 18.5:
        category = "underweight"
   ...

Age and Education Factors

Age and education may also influence employees' absenteeism. For instance, older employees might need more frequent medical treatment, while employees with higher education degrees, covering positions of higher responsibility, might be less prone to being absent.

First, let's investigate the correlation between age and absence hours. We will create a regression plot, in which we'll plot the Age column on the x axis and Absenteeism time in hours on the y axis. We'll also include the Pearson's correlation coefficient and its p-value, where the null hypothesis is that the correlation coefficient between the two features is equal to zero:

from scipy.stats import pearsonr
# compute Pearson's correlation coefficient and p-value
pearson_test = pearsonr(preprocessed_data["Age"], \
               preprocessed_data["Absenteeism time in hours...

Transportation Costs and Distance to Work Factors

Two possible indicators for absenteeism may also be the distance between home and work (the Distance from Residence to Work column) and transportation costs (the Transportation expense column). Employees who have to travel longer, or whose costs for commuting to work are high, might be more prone to absenteeism.

In this section, we will investigate the relationship between these variables and the absence time in hours. Since we do not believe the aforementioned factors might be indicative of disease problems, we will not consider a possible relationship with the Reason for absence column.

First, let's start our analysis by plotting the previously mentioned columns (Distance from Residence to Work and Transportation expense) against the Absenteeism time in hours column:

# plot transportation costs and distance to work against hours
plt.figure(figsize=(10, 6))
sns.jointplot(x="Distance from Residence to Work",...

Temporal Factors

Factors such as day of the week and month may also be indicators for absenteeism. For instance, employees might prefer to have their medical examinations on Friday when the workload is lower, and it is closer to the weekend. In this section, we will analyze the impact of the Day of the week and Month of absence columns, and their impact on the employees' absenteeism.

Let's begin with an analysis of the number of entries for each day of the week and each month:

# count entries per day of the week and month
plt.figure(figsize=(12, 5))
ax = sns.countplot(data=preprocessed_data, \
                   x='Day of the week', \
                   order=["Monday", "Tuesday", \
           &...

Summary

In this chapter, we analyzed a dataset containing employees' absences and their relationship to additional health and socially related factors. We introduced various data analysis techniques, such as distribution plots, conditional probabilities, Bayes' theorem, data transformation techniques (such as Box-Cox and Yeo-Johnson), and the Kolmogorov-Smirnov test, and applied these to the dataset.

In the next chapter, we will be analyzing the marketing campaign dataset of a Portuguese bank and the impact it had on acquiring new customers.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Get to grips with data analysis by studying use cases from different fields
  • Develop your critical thinking skills by following tried-and-true data analysis
  • Learn how to use conclusions from data analyses to make better business decisions

Description

Businesses today operate online and generate data almost continuously. While not all data in its raw form may seem useful, if processed and analyzed correctly, it can provide you with valuable hidden insights. The Data Analysis Workshop will help you learn how to discover these hidden patterns in your data, to analyze them, and leverage the results to help transform your business. The book begins by taking you through the use case of a bike rental shop. You'll be shown how to correlate data, plot histograms, and analyze temporal features. As you progress, you’ll learn how to plot data for a hydraulic system using the Seaborn and Matplotlib libraries, and explore a variety of use cases that show you how to join and merge databases, prepare data for analysis, and handle imbalanced data. By the end of the book, you'll have learned different data analysis techniques, including hypothesis testing, correlation, and null-value imputation, and will have become a confident data analyst.

Who is this book for?

The Data Analysis Workshop is for programmers who already know how to code in Python and want to use it to perform data analysis. If you are looking to gain practical experience in data science with Python, this book is for you.

What you will learn

  • Get to grips with the fundamental concepts and conventions of data analysis
  • Understand how different algorithms help you to analyze the data effectively
  • Determine the variation between groups of data using hypothesis testing
  • Visualize your data correctly using appropriate plotting points
  • Use correlation techniques to uncover the relationship between variables
  • Find hidden patterns in data using advanced techniques and strategies

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jul 29, 2020
Length: 626 pages
Edition : 1st
Language : English
ISBN-13 : 9781839218125
Category :
Languages :
Concepts :
Tools :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : Jul 29, 2020
Length: 626 pages
Edition : 1st
Language : English
ISBN-13 : 9781839218125
Category :
Languages :
Concepts :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 121.97
The Data Wrangling Workshop
$38.99
The Data Science Workshop
$43.99
The Data Analysis Workshop
$38.99
Total $ 121.97 Stars icon
Banner background image

Table of Contents

10 Chapters
1. Bike Sharing Analysis Chevron down icon Chevron up icon
2. Absenteeism at Work Chevron down icon Chevron up icon
3. Analyzing Bank Marketing Campaign Data Chevron down icon Chevron up icon
4. Tackling Company Bankruptcy Chevron down icon Chevron up icon
5. Analyzing the Online Shopper's Purchasing Intention Chevron down icon Chevron up icon
6. Analysis of Credit Card Defaulters Chevron down icon Chevron up icon
7. Analyzing the Heart Disease Dataset Chevron down icon Chevron up icon
8. Analyzing Online Retail II Dataset Chevron down icon Chevron up icon
9. Analysis of the Energy Consumed by Appliances Chevron down icon Chevron up icon
10. Analyzing Air Quality Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.4
(21 Ratings)
5 star 57.1%
4 star 23.8%
3 star 19%
2 star 0%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




Nithin Feb 15, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I found the book "The Data Analysis Workshop" really helpful. I like the approach author has taken to go step by step on the process. Every problem solving follows the data exploration and preprocessing to data visualization in Python.The book uses real world and variety datasets with great well formatted colored visualizations. Code snippets are clear and explains the problem statement with clarity across the entire book.The book covers a lot of important concepts sklearn, classification, regression, hypothesis testing, clustering, time series, and many more. It also features "Activities" for every section which helps with better understanding of problem statement.I would highly recommend this book.
Amazon Verified review Amazon
Gennaro Maida, MS, BSBME -- CTO/Co-founder Vital Intelligence, Inc. Oct 05, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book does exactly what the author intends. There are many real world examples that are used to build knowledge and expertise in some of todays most powerful python tools. Jupyter notebooks, matplotlib, seaborn, scikit-learn, numpy, scipy, and pandas are employed diving into many of the most useful methods. The author takes the time to walk the reader through the steps of solving analytical problems from data exploration and preprocessing to data visualization. Additionally, the author lightly digs into the statistics and probability behind the analyses to build concepts rather then repetitive memorization. I would recommend this book, not only as a quick reference for intermediate to advanced data scientists, but as a book to introduce beginners (with python knowledge) to the world of data science. A definite thumbs up.
Amazon Verified review Amazon
Richard Dec 28, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Excellent book that clearly explains vital data analysis techniques using real-world examples. The author does a particularly great job highlighting the use of Python-based statistical analyses via Jupyter notebooks by importing pandas, matplotlib, seaborn, scikit-learn, numpy, and scipy.This book also covers some great data visualization techniques, which are must-haves for anyone who is interested in crafting meaningful data storytelling.Overall, this is a great read for anyone who is interested in expanding their data analysis skill set. A highly recommended read!
Amazon Verified review Amazon
Alan Dec 09, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book is a good learning tool for beginners as well as a great reference book for the more experienced. You do need to be versed in Python before reading, although being a programmer in VBA, I was able to get the gist of what the programs were doing (programming fundamentals). It covers many different and increasingly complex datasets and shows how to turn them into meaningful data insights. I would recommend it to anyone that wants to gain serious knowledge about data analysis.
Amazon Verified review Amazon
Nhikki v May 04, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Great data analysis with python book with 10 different scenarios. Datasets provided too so you can do your own analysis and the exercises. Perfect to practice and gain experience.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.