Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Data Science  with Python

You're reading from   Data Science with Python Combine Python with machine learning principles to discover hidden patterns in raw data

Arrow left icon
Product type Paperback
Published in Jul 2019
Publisher Packt
ISBN-13 9781838552862
Length 426 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Authors (3):
Arrow left icon
Rohan Chopra Rohan Chopra
Author Profile Icon Rohan Chopra
Rohan Chopra
Mohamed Noordeen Alaudeen Mohamed Noordeen Alaudeen
Author Profile Icon Mohamed Noordeen Alaudeen
Mohamed Noordeen Alaudeen
Aaron England Aaron England
Author Profile Icon Aaron England
Aaron England
Arrow right icon
View More author details
Toc

Table of Contents (10) Chapters Close

About the Book 1. Introduction to Data Science and Data Pre-Processing FREE CHAPTER 2. Data Visualization 3. Introduction to Machine Learning via Scikit-Learn 4. Dimensionality Reduction and Unsupervised Learning 5. Mastering Structured Data 6. Decoding Images 7. Processing Human Language 8. Tips and Tricks of the Trade 1. Appendix

Roadmap for Building Machine Learning Models

The roadmap for building machine learning models is straightforward and consists of five major steps, which are explained here:

  • Data Pre-processing

    This is the first step in building a machine learning model. Data pre-processing refers to the transformation of data before feeding it into the model. It deals with the techniques that are used to convert unusable raw data into clean reliable data.

    Since data collection is often not performed in a controlled manner, raw data often contains outliers (for example, age = 120), nonsensical data combinations (for example, model: bicycle, type: 4-wheeler), missing values, scale problems, and so on. Because of this, raw data cannot be fed into a machine learning model because it might compromise the quality of the results. As such, this is the most important step in the process of data science.

  • Model Learning

    After pre-processing the data and splitting it into train/test sets (more on this later), we move on to modeling. Models are nothing but sets of well-defined methods called algorithms that use pre-processed data to learn patterns, which can later be used to make predictions. There are different types of learning algorithms, including supervised, semi-supervised, unsupervised, and reinforcement learning. These will be discussed later.

  • Model Evaluation

    In this stage, the models are evaluated with the help of specific performance metrics. With these metrics, we can go on to tune the hyperparameters of a model in order to improve it. This process is called hyperparameter optimization. We will repeat this step until we are satisfied with the performance.

  • Prediction

    Once we are happy with the results from the evaluation step, we will then move on to predictions. Predictions are made by the trained model when it is exposed to a new dataset. In a business setting, these predictions can be shared with decision makers to make effective business choices.

  • Model Deployment

    The whole process of machine learning does not just stop with model building and prediction. It also involves making use of the model to build an application with the new data. Depending on the business requirements, the deployment may be a report, or it may be some repetitive data science steps that are to be executed. After deployment, a model needs proper management and maintenance at regular intervals to keep it up and running.

This chapter will mainly focus on pre-processing. We will cover the different tasks involved in data pre-processing, such as data representation, data cleaning, and others.

You have been reading a chapter from
Data Science with Python
Published in: Jul 2019
Publisher: Packt
ISBN-13: 9781838552862
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image