Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Free Learning

The Data Science Workshop

You're reading from The Data Science Workshop Learn how you can build machine learning models and create your own real-world data science projects

Product type Paperback

Published in Aug 2020

Publisher Packt

ISBN-13 9781800566927

Length 824 pages

Edition 2nd Edition

Languages

Python

Tools

Scikit-learn

Concepts

Data Science

Authors (5):

Robert Thas John

Thomas Joseph

Anthony So

Dr. Samuel Asare

Andrew Worsley

+1 more

View More author details

Table of Contents (16) Chapters

Preface

1. Introduction to Data Science in Python

2. Regression FREE CHAPTER

3. Binary Classification

4. Multiclass Classification with RandomForest

5. Performing Your First Cluster Analysis

6. How to Assess Performance

7. The Generalization of Machine Learning Models

8. Hyperparameter Tuning

9. Interpreting a Machine Learning Model

10. Analyzing a Dataset

11. Data Preparation

12. Feature Engineering

Introduction

13. Imbalanced Datasets

14. Dimensionality Reduction

15. Ensemble Learning

Handling Row Duplication

Most of the time, the datasets you will receive or have access to will not have been 100% cleaned. They usually have some issues that need to be fixed. One of these issues could be duplicated rows. Row duplication means that several observations contain the exact same information in the dataset. With the pandas package, it is extremely easy to find these cases.

Let's use the example that we saw in Chapter 10, Analyzing a Dataset.

Start by importing the dataset into a DataFrame:

import pandas as pd
file_url = 'https://github.com/PacktWorkshops/'\
           'The-Data-Science-Workshop/blob/'\
           'master/Chapter10/dataset/'\
           'Online%20Retail.xlsx?raw=true'
df = pd.read_excel(file_url)

The duplicated() method from pandas checks...

The rest of the chapter is locked

Register for a free Packt account to unlock a world of extra content!

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (5)

Anthony So

Anthony So

Anthony So is a renowned leader in data science. He has extensive experience in solving complex business problems using advanced analytics and AI in different industries including financial services, media, and telecommunications. He is currently the chief data officer of one of the most innovative fintech start-ups. He is also the author of several best-selling books on data science, machine learning, and deep learning. He has won multiple prizes at several hackathon competitions, such as Unearthed, GovHack, and Pepper Money. Anthony holds two master's degrees, one in computer science and the other in data science and innovation.

See other products by Anthony So

Thomas Joseph

Thomas Joseph

Thomas V. Joseph is a data science practitioner, researcher, trainer, mentor, and writer with more than 19 years of experience. He has extensive experience in solving business problems using machine learning toolsets across multiple industry segments.

See other products by Thomas Joseph

Dr. Samuel Asare

Dr. Samuel Asare

Dr. Samuel Asare is a professional engineer with enthusiasm for Python programming, research, and writing. He is highly skilled in applying data science methods to the extraction of useful insights from large data sets. He possesses solid skills in project management processes. Samuel has previously held positions, in industry and academia, as a process engineer and a lecturer of materials science and engineering respectively. Presently, he is pursuing his passion for solving industry problems, using data science methods, and writing.

See other products by Dr. Samuel Asare

Andrew Worsley

Andrew Worsley

Andrew David Worsley is an independent consultant and educator with expertise in the areas of machine learning, statistics, cloud computing, and artificial intelligence. He has practiced data science in several countries across a multitude of industries including retail, financial services, marketing, resources, and healthcare.

See other products by Andrew Worsley

Robert Thas John

Robert Thas John

Robert Thas John is a data engineer with a career that spans two decades. He manages a team of data engineers, analysts, and machine learning engineers – roles that he has held in the past. He leads a number of efforts aimed at increasing the adoption of machine learning on embedded devices through various programs from Google Developers and ARM Ltd, which licenses the chips found in Arduinos and other microcontrollers. He started his career as a software engineer with work that has spanned various industries. His first experience with embedded systems was in programming payment terminals.

See other products by Robert Thas John

Other recommended products

Related to this chapter

The Data Science Workshop

The Data Science Workshop

Cut through the noise and get real results with a step-by-step approach to data science

Jan 2020 27h 16m

The Data Science Workshop

The Data Science Workshop

Cut through the noise and get real results with a step-by-step approach to data science

Jan 2020 27h 16m

The Data Science Workshop

The Data Science Workshop

Cut through the noise and get real results with a step-by-step approach to data science

Jan 2020 27h 16m

The Data Science Workshop

The Data Science Workshop

Cut through the noise and get real results with a step-by-step approach to data science

Jan 2020 27h 16m

Python Feature Engineering Cookbook

Python Feature Engineering Cookbook

Feature engineering is invaluable for developing and enriching your machine learning models. In this book, you will work with the best Python tools to streamline your feature engineering pipelines, feature engineering techniques and simplify and improve the quality of your code.

Jan 2020 12h 24m

Python Feature Engineering Cookbook

Python Feature Engineering Cookbook

Feature engineering is invaluable for developing and enriching your machine learning models. In this book, you will work with the best Python tools to streamline your feature engineering pipelines, feature engineering techniques and simplify and improve the quality of your code.

Jan 2020 12h 24m

Python Feature Engineering Cookbook

Python Feature Engineering Cookbook

Feature engineering is invaluable for developing and enriching your machine learning models. In this book, you will work with the best Python tools to streamline your feature engineering pipelines, feature engineering techniques and simplify and improve the quality of your code.

Jan 2020 12h 24m

Python Feature Engineering Cookbook

Python Feature Engineering Cookbook

Feature engineering is invaluable for developing and enriching your machine learning models. In this book, you will work with the best Python tools to streamline your feature engineering pipelines, feature engineering techniques and simplify and improve the quality of your code.

Jan 2020 12h 24m

Python Feature Engineering Cookbook

Python Feature Engineering Cookbook

Feature engineering is invaluable for developing and enriching your machine learning models. In this book, you will work with the best Python tools to streamline your feature engineering pipelines, feature engineering techniques and simplify and improve the quality of your code.

Jan 2020 12h 24m

Python Feature Engineering Cookbook

Python Feature Engineering Cookbook

Feature engineering is invaluable for developing and enriching your machine learning models. In this book, you will work with the best Python tools to streamline your feature engineering pipelines, feature engineering techniques and simplify and improve the quality of your code.

Jan 2020 12h 24m

Python Feature Engineering Cookbook

Python Feature Engineering Cookbook

Feature engineering is invaluable for developing and enriching your machine learning models. In this book, you will work with the best Python tools to streamline your feature engineering pipelines, feature engineering techniques and simplify and improve the quality of your code.

Jan 2020 12h 24m

Python Feature Engineering Cookbook

Python Feature Engineering Cookbook

Feature engineering is invaluable for developing and enriching your machine learning models. In this book, you will work with the best Python tools to streamline your feature engineering pipelines, feature engineering techniques and simplify and improve the quality of your code.

Jan 2020 12h 24m

Personalised recommendations for you

Based on your interests and search pattern

Modern Computer Vision with PyTorch

Modern Computer Vision with PyTorch

This book provides a hands-on approach to solving over 30 prominent real-world computer vision problems using PyTorch 2.x on actual datasets. Here you'll learn to build a neural network from scratch and optimize hyperparameters, perform image classification, multi-object detection, segmentation, and more. You'll also explore facial expression manipulation and combining CV with NLP and RL techniques, build generative AI applications, and take your model to production on AWS. By the end of this book, you'll master modern NN architectures and confidently solve real-world CV problems.

Jun 2024 24h 52m

Modern Computer Vision with PyTorch

Modern Computer Vision with PyTorch

This book provides a hands-on approach to solving over 30 prominent real-world computer vision problems using PyTorch 2.x on actual datasets. Here you'll learn to build a neural network from scratch and optimize hyperparameters, perform image classification, multi-object detection, segmentation, and more. You'll also explore facial expression manipulation and combining CV with NLP and RL techniques, build generative AI applications, and take your model to production on AWS. By the end of this book, you'll master modern NN architectures and confidently solve real-world CV problems.

Jun 2024 24h 52m

Modern Computer Vision with PyTorch

Modern Computer Vision with PyTorch

This book provides a hands-on approach to solving over 30 prominent real-world computer vision problems using PyTorch 2.x on actual datasets. Here you'll learn to build a neural network from scratch and optimize hyperparameters, perform image classification, multi-object detection, segmentation, and more. You'll also explore facial expression manipulation and combining CV with NLP and RL techniques, build generative AI applications, and take your model to production on AWS. By the end of this book, you'll master modern NN architectures and confidently solve real-world CV problems.

Jun 2024 24h 52m

Modern Computer Vision with PyTorch

Modern Computer Vision with PyTorch

This book provides a hands-on approach to solving over 30 prominent real-world computer vision problems using PyTorch 2.x on actual datasets. Here you'll learn to build a neural network from scratch and optimize hyperparameters, perform image classification, multi-object detection, segmentation, and more. You'll also explore facial expression manipulation and combining CV with NLP and RL techniques, build generative AI applications, and take your model to production on AWS. By the end of this book, you'll master modern NN architectures and confidently solve real-world CV problems.

Jun 2024 24h 52m

Modern Computer Vision with PyTorch

Modern Computer Vision with PyTorch

This book provides a hands-on approach to solving over 30 prominent real-world computer vision problems using PyTorch 2.x on actual datasets. Here you'll learn to build a neural network from scratch and optimize hyperparameters, perform image classification, multi-object detection, segmentation, and more. You'll also explore facial expression manipulation and combining CV with NLP and RL techniques, build generative AI applications, and take your model to production on AWS. By the end of this book, you'll master modern NN architectures and confidently solve real-world CV problems.

Jun 2024 24h 52m

Modern Computer Vision with PyTorch

Modern Computer Vision with PyTorch

This book provides a hands-on approach to solving over 30 prominent real-world computer vision problems using PyTorch 2.x on actual datasets. Here you'll learn to build a neural network from scratch and optimize hyperparameters, perform image classification, multi-object detection, segmentation, and more. You'll also explore facial expression manipulation and combining CV with NLP and RL techniques, build generative AI applications, and take your model to production on AWS. By the end of this book, you'll master modern NN architectures and confidently solve real-world CV problems.

Jun 2024 24h 52m

Modern Computer Vision with PyTorch

Modern Computer Vision with PyTorch

This book provides a hands-on approach to solving over 30 prominent real-world computer vision problems using PyTorch 2.x on actual datasets. Here you'll learn to build a neural network from scratch and optimize hyperparameters, perform image classification, multi-object detection, segmentation, and more. You'll also explore facial expression manipulation and combining CV with NLP and RL techniques, build generative AI applications, and take your model to production on AWS. By the end of this book, you'll master modern NN architectures and confidently solve real-world CV problems.

Jun 2024 24h 52m

Modern Computer Vision with PyTorch

Modern Computer Vision with PyTorch

This book provides a hands-on approach to solving over 30 prominent real-world computer vision problems using PyTorch 2.x on actual datasets. Here you'll learn to build a neural network from scratch and optimize hyperparameters, perform image classification, multi-object detection, segmentation, and more. You'll also explore facial expression manipulation and combining CV with NLP and RL techniques, build generative AI applications, and take your model to production on AWS. By the end of this book, you'll master modern NN architectures and confidently solve real-world CV problems.

Jun 2024 24h 52m

Modern Computer Vision with PyTorch

Modern Computer Vision with PyTorch

This book provides a hands-on approach to solving over 30 prominent real-world computer vision problems using PyTorch 2.x on actual datasets. Here you'll learn to build a neural network from scratch and optimize hyperparameters, perform image classification, multi-object detection, segmentation, and more. You'll also explore facial expression manipulation and combining CV with NLP and RL techniques, build generative AI applications, and take your model to production on AWS. By the end of this book, you'll master modern NN architectures and confidently solve real-world CV problems.

Jun 2024 24h 52m

Modern Computer Vision with PyTorch

Modern Computer Vision with PyTorch

This book provides a hands-on approach to solving over 30 prominent real-world computer vision problems using PyTorch 2.x on actual datasets. Here you'll learn to build a neural network from scratch and optimize hyperparameters, perform image classification, multi-object detection, segmentation, and more. You'll also explore facial expression manipulation and combining CV with NLP and RL techniques, build generative AI applications, and take your model to production on AWS. By the end of this book, you'll master modern NN architectures and confidently solve real-world CV problems.

Jun 2024 24h 52m

Modern Computer Vision with PyTorch

Modern Computer Vision with PyTorch

This book provides a hands-on approach to solving over 30 prominent real-world computer vision problems using PyTorch 2.x on actual datasets. Here you'll learn to build a neural network from scratch and optimize hyperparameters, perform image classification, multi-object detection, segmentation, and more. You'll also explore facial expression manipulation and combining CV with NLP and RL techniques, build generative AI applications, and take your model to production on AWS. By the end of this book, you'll master modern NN architectures and confidently solve real-world CV problems.

Jun 2024 24h 52m

Modern Computer Vision with PyTorch

Modern Computer Vision with PyTorch

This book provides a hands-on approach to solving over 30 prominent real-world computer vision problems using PyTorch 2.x on actual datasets. Here you'll learn to build a neural network from scratch and optimize hyperparameters, perform image classification, multi-object detection, segmentation, and more. You'll also explore facial expression manipulation and combining CV with NLP and RL techniques, build generative AI applications, and take your model to production on AWS. By the end of this book, you'll master modern NN architectures and confidently solve real-world CV problems.

Jun 2024 24h 52m