You're reading from Learning Data Mining with Python Harness the power of Python to analyze data and create insightful predictive models

Product type Paperback

Published in Jul 2015

Publisher Packt

ISBN-13 9781784396053

Length 344 pages

Edition 1st Edition

Languages

Python

Tools

IPython

Concepts

Data Mining

Author (1):

Robert Layton

View More author details

Table of Contents (15) Chapters

Preface

1. Getting Started with Data Mining

2. Classifying with scikit-learn Estimators FREE CHAPTER

3. Predicting Sports Winners with Decision Trees

4. Recommending Movies Using Affinity Analysis

5. Extracting Features with Transformers

6. Social Media Insight Using Naive Bayes

7. Discovering Accounts to Follow Using Graph Mining

8. Beating CAPTCHAs with Neural Networks

9. Authorship Attribution

10. Clustering News Articles

11. Classifying Objects in Images Using Deep Learning

12. Working with Big Data

A. Next Steps…

Index

Preprocessing using pipelines

When taking measurements of real-world objects, we can often get features in very different ranges. For instance, if we are measuring the qualities of an animal, we might have several features, as follows:

Number of legs: This is between the range of 0-8 for most animals, while some have many more!
Weight: This is between the range of only a few micrograms, all the way to a blue whale with a weight of 190,000 kilograms!
Number of hearts: This can be between zero to five, in the case of the earthworm.

For a mathematical-based algorithm to compare each of these features, the differences in the scale, range, and units can be difficult to interpret. If we used the above features in many algorithms, the weight would probably be the most influential feature due to only the larger numbers and not anything to do with the actual effectiveness of the feature.

One of the methods to overcome this is to use a process called preprocessing to normalize the features so that they...

Tech Concepts

Programming languages

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

You're reading from Learning Data Mining with Python Harness the power of Python to analyze data and create insightful predictive models

Table of Contents (15) Chapters

Preprocessing using pipelines

Authors (1)

Personalised recommendations for you

You're reading from Learning Data Mining with Python Harness the power of Python to analyze data and create insightful predictive models

Table of Contents (15) Chapters

Preprocessing using pipelines

Authors (1)

Personalised recommendations for you

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access