Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Learning Data Mining with Python
Learning Data Mining with Python

Learning Data Mining with Python: Harness the power of Python to analyze data and create insightful predictive models

Arrow left icon
Profile Icon Robert Layton
Arrow right icon
zł197.99
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.7 (7 Ratings)
Paperback Jul 2015 344 pages 1st Edition
eBook
zł59.99 zł158.99
Paperback
zł197.99
Subscription
Free Trial
Arrow left icon
Profile Icon Robert Layton
Arrow right icon
zł197.99
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.7 (7 Ratings)
Paperback Jul 2015 344 pages 1st Edition
eBook
zł59.99 zł158.99
Paperback
zł197.99
Subscription
Free Trial
eBook
zł59.99 zł158.99
Paperback
zł197.99
Subscription
Free Trial

What do you get with Print?

Product feature icon Instant access to your digital copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Redeem a companion digital copy on all Print orders
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Table of content icon View table of contents Preview book icon Preview Book

Learning Data Mining with Python

Chapter 2. Classifying with scikit-learn Estimators

The scikit-learn library is a collection of data mining algorithms, written in Python and using a common programming interface. This allows users to easily try different algorithms as well as utilize standard tools for doing effective testing and parameter searching. There are a large number of algorithms and utilities in scikit-learn.

In this chapter, we focus on setting up a good framework for running data mining procedures. This will be used in later chapters, which are all focused on applications and techniques to use in those situations.

The key concepts introduced in this chapter are as follows:

  • Estimators: This is to perform classification, clustering, and regression
  • Transformers: This is to perform preprocessing and data alterations
  • Pipelines: This is to put together your workflow into a replicable format

scikit-learn estimators

Estimators are scikit-learn's abstraction, allowing for the standardized implementation of a large number of classification algorithms. Estimators are used for classification. Estimators have the following two main functions:

  • fit(): This performs the training of the algorithm and sets internal parameters. It takes two inputs, the training sample dataset and the corresponding classes for those samples.
  • predict(): This predicts the class of the testing samples that is given as input. This function returns an array with the predictions of each input testing sample.

Most scikit-learn estimators use the NumPy arrays or a related format for input and output.

There are a large number of estimators in scikit-learn. These include support vector machines (SVM), random forests, and neural networks. Many of these algorithms will be used in later chapters. In this chapter, we will use a different estimator from scikit-learn: nearest neighbor.

Note

For this chapter, you will...

Preprocessing using pipelines

When taking measurements of real-world objects, we can often get features in very different ranges. For instance, if we are measuring the qualities of an animal, we might have several features, as follows:

  • Number of legs: This is between the range of 0-8 for most animals, while some have many more!
  • Weight: This is between the range of only a few micrograms, all the way to a blue whale with a weight of 190,000 kilograms!
  • Number of hearts: This can be between zero to five, in the case of the earthworm.

For a mathematical-based algorithm to compare each of these features, the differences in the scale, range, and units can be difficult to interpret. If we used the above features in many algorithms, the weight would probably be the most influential feature due to only the larger numbers and not anything to do with the actual effectiveness of the feature.

One of the methods to overcome this is to use a process called preprocessing to normalize the features so that they...

Pipelines

As experiments grow, so does the complexity of the operations. We may split up our dataset, binarize features, perform feature-based scaling, perform sample-based scaling, and many more operations.

Keeping track of all of these operations can get quite confusing and can result in being unable to replicate the result. Problems include forgetting a step, incorrectly applying a transformation, or adding a transformation that wasn't needed.

Another issue is the order of the code. In the previous section, we created our X_transformed dataset and then created a new estimator for the cross validation. If we had multiple steps, we would need to track all of these changes to the dataset in the code.

Pipelines are a construct that addresses these problems (and others, which we will see in the next chapter). Pipelines store the steps in your data mining workflow. They can take your raw data in, perform all the necessary transformations, and then create a prediction. This allows us to use...

Summary

In this chapter, we used several of scikit-learn's methods for building a standard workflow to run and evaluate data mining models. We introduced the Nearest Neighbors algorithm, which is already implemented in scikit-learn as an estimator. Using this class is quite easy; first, we call the fit function on our training data, and second, we use the predict function to predict the class of testing samples.

We then looked at preprocessing by fixing poor feature scaling. This was done using a Transformer object and the MinMaxScaler class. These functions also have a fit method and then a transform, which takes a dataset as an input and returns a transformed dataset as an output.

In the next chapter, we will use these concepts in a larger example, predicting the outcome of sports matches using real-world data.

Left arrow icon Right arrow icon
Download code icon Download Code

Description

If you are a programmer who wants to get started with data mining, then this book is for you.

Who is this book for?

If you are a programmer who wants to get started with data mining, then this book is for you.

What you will learn

  • Apply data mining concepts to realworld problems
  • Predict the outcome of sports matches based on past results
  • Determine the author of a document based on their writing style
  • Use APIs to download datasets from social media and other online services
  • Find and extract good features from difficult datasets
  • Create models that solve realworld problems
  • Design and develop data mining applications using a variety of datasets
  • Set up reproducible experiments and generate robust results
  • Recommend movies, online celebrities, and news articles based on personal preferences
  • Compute on big data, including realtime data from the Internet
Estimated delivery fee Deliver to Poland

Premium delivery 7 - 10 business days

zł110.95
(Includes tracking information)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jul 29, 2015
Length: 344 pages
Edition : 1st
Language : English
ISBN-13 : 9781784396053
Category :
Languages :
Concepts :
Tools :

What do you get with Print?

Product feature icon Instant access to your digital copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Redeem a companion digital copy on all Print orders
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Estimated delivery fee Deliver to Poland

Premium delivery 7 - 10 business days

zł110.95
(Includes tracking information)

Product Details

Publication date : Jul 29, 2015
Length: 344 pages
Edition : 1st
Language : English
ISBN-13 : 9781784396053
Category :
Languages :
Concepts :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just zł20 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just zł20 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total zł179.97 zł476.97 zł297.00 saved
Python Machine Learning
zł197.99
Learning Data Mining with Python
zł197.99
Python Data Visualization Cookbook (Second Edition)
zł197.99
Total zł179.97zł476.97 zł297.00 saved Stars icon

Table of Contents

14 Chapters
1. Getting Started with Data Mining Chevron down icon Chevron up icon
2. Classifying with scikit-learn Estimators Chevron down icon Chevron up icon
3. Predicting Sports Winners with Decision Trees Chevron down icon Chevron up icon
4. Recommending Movies Using Affinity Analysis Chevron down icon Chevron up icon
5. Extracting Features with Transformers Chevron down icon Chevron up icon
6. Social Media Insight Using Naive Bayes Chevron down icon Chevron up icon
7. Discovering Accounts to Follow Using Graph Mining Chevron down icon Chevron up icon
8. Beating CAPTCHAs with Neural Networks Chevron down icon Chevron up icon
9. Authorship Attribution Chevron down icon Chevron up icon
10. Clustering News Articles Chevron down icon Chevron up icon
11. Classifying Objects in Images Using Deep Learning Chevron down icon Chevron up icon
12. Working with Big Data Chevron down icon Chevron up icon
A. Next Steps… Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.7
(7 Ratings)
5 star 28.6%
4 star 28.6%
3 star 28.6%
2 star 14.3%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




Anon Oct 24, 2015
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Pretty good book on the subject matter, I especially enjoyed the variety in examples for applications of machine learning. Other books similar to the subject like Mastering Machine Learning with Scikit-Learn are alright, but this is definitely a cool addition to such a library or collection of similar topic books.The author uses scikit-learn, python libraries in general. Pretty easy to understand, and definitely nice as a reference in case you are facing a similar problem at work or school and want to consult with a tutorial in a book.Definitely worth looking into.
Amazon Verified review Amazon
Amazon Reader Aug 23, 2015
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This is the most excellent book on Data Mining and Python I have come across. The books comes with plenty of code examples explained in simple and easy to understand language. I would highly recommend this book to novice users and enthusiasts.
Amazon Verified review Amazon
Mouha Apr 13, 2016
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
The Robert's book is one of those I've finished used and reuse. I have many books on AI, Machine learning /data mining . Very few give me access to the minimum knowledge so I'd be able to use AI by myself. Jeff Heaton book was one of them, now I can add this book because it allows you to understand the main algorithms in this area, in a way that even you are not strong in maths through Python code you can really apply each algorithms in a minute. Really easy and understandable. Some could argue that the author doesn't dive deeply in the explanation: I think this is on purpose, and btw there are so much book about the theory. I didn't put 5 start because of some (small) cons : In chapter 8 "Beating Captcha...." The author would have use a recent framework like FANN instead of Pybrain which seems to be abandoned since years. This is not a showstopper anyway. I was so happy to use NN which for me is a kind of magic sometimes.
Amazon Verified review Amazon
Dimitri Shvorob Aug 20, 2016
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
Wishing to learn Python's machine-learning toolkit - I am an emigrant from R Country - I rounded up several relevant books, and set out to narrow the field to one or two suitable for further study. My haul included (in no particular order)"Machine Learning in Python" by Bowles, published in 2015 by Wiley, 360 pages, $25 for the cheapest hardcopy now available from Amazon (including shipping)"Designing Machine Learning Systems with Python" by Julian, 2016, Packt, 232 pages, $42"Mastering Python for Data Science" by Madhavan, 2015, Packt, 294 pages, $39"Learning Data Mining with Python" by Layton, 2015, 369 pages, $43"Python Data Science Cookbook" by Subramanian, 2015, 347 pages, $48"Data Science From Scratch" by Grus, 2015, 330 pages, $24"Learning scikit-learn" by Moncecchi and Garreta, 2013, 118 pages, $28"Building Machine Learning Systems with Python" by Coelho and Richert, 2015, 305 pages, $49"Python Machine Learning" by Raschka, 2015, 454 pages, $34The whittling-down turned out to be harder than expected: Python titles are better than R counterparts, and Madhavan's book alone was easy to dismiss. Subramanian, Moncecchi-Garreta and Julian did not make the cut based on comparison with alternatives, but were not of themselves bad. Grus is the beginner's best bet - beginners can stop reading here - while Bowles is a book which I like a lot, but which may be a bit too specialist. As a reviewer, thinking about what other "intermediate" readers might find useful, I end up pointing to the trio of Raschka, Layton and Coelho-Richert as the books worth choosing from.I distinguish Raschka, in appreciation of his more pedagogical style - or maybe I am just giving the top spot to the thickest book! - but the other two titles are definitely worth checking out. Compared to Coelho-Richert (CR), Layton's book surveys a wider range of algorithms - a good third of CR's page count is devoted to text analysis, which means less space for everything else - but strangely neglects regression, my own primary interest. (This is why I dock one star). The writing is more "cohesive" and methodical - but while Coelho and Richert know to "liven up" the early chapters with visualizations, Layton does not use "matplotlib" till page 98. (And after that, you see charts in the chapter on graph mining - notably, a topic you don't find in the other two books). Get both, and see which one you prefer.
Amazon Verified review Amazon
Amazon Customer Aug 04, 2016
Full star icon Full star icon Full star icon Empty star icon Empty star icon 3
Fine for introducing the learner to data mining with Python...but not much else. Many typos in the code and text, key concepts and vocabulary poorly assumed to be understood by the reader. Not good continuity either. Definitely written by a committee.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the digital copy I get with my Print order? Chevron down icon Chevron up icon

When you buy any Print edition of our Books, you can redeem (for free) the eBook edition of the Print Book you’ve purchased. This gives you instant access to your book when you make an order via PDF, EPUB or our online Reader experience.

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact customercare@packt.com with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at customercare@packt.com using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on customercare@packt.com with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on customercare@packt.com within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on customercare@packt.com who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on customercare@packt.com within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela