0

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Free Learning

The Data Analysis Workshop

You're reading from The Data Analysis Workshop Solve business problems with state-of-the-art data analysis models, developing expert data analysis skills along the way

Product type Paperback

Published in Jul 2020

Publisher Packt

ISBN-13 9781839211386

Length 626 pages

Edition 1st Edition

Languages

Python

Tools

Jupyter

Concepts

Data Analysis

Authors (3):

Konstantin Palagachev

Gururajan Govindan

Shubhangi Hora

View More author details

Table of Contents (12) Chapters

Preface

1. Bike Sharing Analysis

2. Absenteeism at Work FREE CHAPTER

3. Analyzing Bank Marketing Campaign Data

4. Tackling Company Bankruptcy

5. Analyzing the Online Shopper's Purchasing Intention

6. Analysis of Credit Card Defaulters

7. Analyzing the Heart Disease Dataset

8. Analyzing Online Retail II Dataset

9. Analysis of the Energy Consumed by Appliances

10. Analyzing Air Quality

Appendix

Data Preprocessing

Before proceeding onto univariate analysis, let's look at the unique values in the columns. The motive behind looking at the unique values in a column is to identify the subcategory in each column. By knowing the subcategory in each column, we would be in a position to understand which subcategory has a higher count or vice versa. For example, let's take the EDUCATION column. We are interested in finding what the different subcategories in the EDUCATION column are and which subcategory has the higher count; that is, do our customers have their highest education as College or University?

This step acts as a precursor before we build a profile of our customers.

Let's now find unique values in the SEX column.

We'll print the unique values in the SEX column and sort them in ascending order:

print('SEX ' + str(sorted(df['SEX'].unique())))

The output will be as follows:

SEX [1, 2]

The following code prints...

The rest of the chapter is locked

Register for a free Packt account to unlock a world of extra content!

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (3)

Gururajan Govindan

Gururajan Govindan

Gururajan Govindan is a data scientist, intrapreneur, and trainer with more than seven years of experience working across domains such as finance and insurance. He is also an author of The Data Analysis Workshop, a book focusing on data analytics. He is well known for his expertise in data-driven decision-making and machine learning with Python.

See other products by Gururajan Govindan

Shubhangi Hora

Shubhangi Hora

Shubhangi Hora is a data scientist, Python developer, and published writer. With a background in computer science and psychology, she is particularly passionate about healthcare-related AI, including mental health. Shubhangi is also a trained musician.

See other products by Shubhangi Hora

Konstantin Palagachev

Konstantin Palagachev

Konstantin Palagachev holds a Ph.D. in applied mathematics and optimization, with an interest in operations research and data analysis. He is recognized for his passion for delivering data-driven solutions and expertise in the area of urban mobility, autonomous driving, insurance, and finance. He is also a devoted coach and mentor, dedicated to sharing his knowledge and passion for data science.

See other products by Konstantin Palagachev

Other recommended products

Related to this chapter

Python Feature Engineering Cookbook

Python Feature Engineering Cookbook

Feature engineering is invaluable for developing and enriching your machine learning models. In this book, you will work with the best Python tools to streamline your feature engineering pipelines, feature engineering techniques and simplify and improve the quality of your code.

Jan 2020 12h 24m

Hands-On Gradient Boosting with XGBoost and scikit-learn

Hands-On Gradient Boosting with XGBoost and scikit-learn

This practical XGBoost guide will put your Python and scikit-learn knowledge to work by showing you how to build powerful, fine-tuned XGBoost models with impressive speed and accuracy. This book will help you to apply XGBoost's alternative base learners, use unique transformers for model deployment, discover tips from Kaggle masters, and much more!

Oct 2020 10h 20m

Data Science for Marketing Analytics

Data Science for Marketing Analytics

Data Science for Marketing Analytics opens doors to looking at data with a different approach and new tools. Drawing on machine learning and data science concepts, this book broadens the range of tools that you can use to transform the market analysis process.

Mar 2019 14h 0m

Forecasting Time Series Data with Facebook Prophet

Forecasting Time Series Data with Facebook Prophet

This book will help you get to grips with the task of time series forecasting using the leading open source forecasting tool available to the public, Facebook Prophet. You will learn how to implement the advanced features of Prophet to build forecast models and understand why and how to modify each of the default parameters to improve results.

The Supervised Learning Workshop

The Supervised Learning Workshop

Taking an engaging and practical approach, The Supervised Learning Workshop teaches you how to predict the output of new data, based on the relationship and behavior of?existing datasets. You'll learn at your own pace and use Python libraries and Jupyter to build intelligent predictive models.?

Feb 2020 17h 44m

Big Data Analysis with Python

Big Data Analysis with Python

Processing big data in real time is challenging due to scalability, information inconsistency, and fault tolerance. Big Data Analysis with Python teaches you how to use tools that can control the data avalanche for you. With this book, you'll learn effective techniques to aggregate data into useful dimensions for posterior analysis, extract statistical measurements, and transform datasets into features for other systems.

Apr 2019 9h 12m

Hands-On Exploratory Data Analysis with Python

Hands-On Exploratory Data Analysis with Python

This book provides practical knowledge about the main pillars of EDA including data cleaning, data preparation, data exploration, and data visualization. You can leverage the power of Python to understand, summarize and investigate your data in the best way possible. The book presents a unique approach to exploring hidden features in your data.

Mar 2020 11h 44m

Data Science for Marketing Analytics

Data Science for Marketing Analytics

This book on marketing analytics with Python will quickly get you up and running using practical data science and machine learning to improve your approach to marketing. You'll learn how to analyze sales, understand customer data, predict outcomes, and present conclusions with clear visualizations.

Sep 2021 21h 12m

Python for Finance Cookbook

Python for Finance Cookbook

Python is becoming the number one language for data science and also quantitative finance. This book provides you with solutions to common tasks from the intersection of quantitative finance and data science, using modern Python libraries.

Jan 2020 14h 24m

The Data Science Workshop

The Data Science Workshop

The Data Science Workshop equips you with the basic skills you need to start working on a variety of data science projects. You'll work through the essential building blocks of a data science project gradually through the book, and then put all the pieces together to consolidate your knowledge and apply your learnings in the real world.

Aug 2020 27h 28m

Personalised recommendations for you

Based on your interests and search pattern

Modern Computer Vision with PyTorch

Modern Computer Vision with PyTorch

This book provides a hands-on approach to solving over 30 prominent real-world computer vision problems using PyTorch 2.x on actual datasets. Here you'll learn to build a neural network from scratch and optimize hyperparameters, perform image classification, multi-object detection, segmentation, and more. You'll also explore facial expression manipulation and combining CV with NLP and RL techniques, build generative AI applications, and take your model to production on AWS. By the end of this book, you'll master modern NN architectures and confidently solve real-world CV problems.

Jun 2024 24h 52m

Data Governance Handbook

Data Governance Handbook

This book provides a highly focused view of real business outcomes powered by data governance, that resonate with non-data executives such as CFOs and CEOs. You'll also find useful insights into how to implement data governance initiatives.

May 2024 13h 8m

Data Engineering with Databricks Cookbook

Data Engineering with Databricks Cookbook

This book shows you how to use Apache Spark, Delta Lake, and Databricks to build data pipelines, manage and transform data, optimize performance, and more. Additionally, you'll implement DataOps and DevOps practices, and orchestrate data workflows.

May 2024 14h 36m

Azure Data Engineer Associate Certification Guide

Azure Data Engineer Associate Certification Guide

Unlock the power of Azure data engineering with this certification guide, elevating your skills in data processing, storage, and security with the help of practical insights, hands-on exercises, and the latest advancements.

May 2024 18h 16m

Microsoft Power BI Cookbook

Microsoft Power BI Cookbook

Microsoft Power BI is the most sought-after platform for BI professionals' visualization needs. Explore the latest Power BI features, future AI enhancements, and integration with other Power Platform tools via new recipes in this updated edition.

Jul 2024 19h 56m

Python Data Cleaning Cookbook

Python Data Cleaning Cookbook

The book shows you how to clean, wrangle, and view data from multiple perspectives, including dataset and column attributes. You will cover common and not-so-common challenges that are faced while cleaning messy data for complex situations and learn to manipulate data to get it down to a form that can be useful for making the right decisions.

May 2024 16h 12m

Microsoft Azure AI Fundamentals AI-900 Exam Guide

Microsoft Azure AI Fundamentals AI-900 Exam Guide

This AI-900 study guide will help you prepare and practice for the certification exam. You'll delve into AI workloads, ML principles, computer vision, NLP, knowledge mining, and generative AI using Azure cloud services.

May 2024 9h 36m

Using Stable Diffusion with Python

Using Stable Diffusion with Python

This book shows you how to use Python to control Stable Diffusion and generate high-quality images. In addition to covering the basic usage of the diffusers package, the book provides solutions for extending the package for more advanced purposes.

Jun 2024 11h 44m

Getting Started with DuckDB

Getting Started with DuckDB

This hands-on book teaches you to analyze large datasets with blazing speed and ease. You will learn how to use DuckDB to quickly load, query, transform, analyze, and visualize data effectively through a series of practical examples.

Jun 2024 12h 44m

Databricks Certified Associate Developer for Apache Spark Using Python

Databricks Certified Associate Developer for Apache Spark Using Python

This guide gets you ready for certification with expert-backed content, key exam concepts, and topic reviews. Additionally, you'll be able to make the most of Apache Spark 3.0 to modernize workloads and more using specific tools and techniques.