Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Python Machine Learning by Example
Python Machine Learning by Example

Python Machine Learning by Example: Build intelligent systems using Python, TensorFlow 2, PyTorch, and scikit-learn , Third Edition

eBook
$17.99 $26.99
Paperback
$38.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
Table of content icon View table of contents Preview book icon Preview Book

Python Machine Learning by Example

Building a Movie Recommendation Engine with Naïve Bayes

As promised, in this chapter, we will kick off our supervised learning journey with machine learning classification, and specifically, binary classification. The goal of the chapter is to build a movie recommendation system. It is a good starting point to learn classification from a real-life example—movie streaming service providers are already doing this, and we can do the same. You will learn the fundamental concepts of classification, including what it does and its various types and applications, with a focus on solving a binary classification problem using a simple, yet powerful, algorithm, Naïve Bayes. Finally, the chapter will demonstrate how to fine-tune a model, which is an important skill that every data science or machine learning practitioner should learn.

We will go into detail on the following topics:

  • What is machine learning classification?
  • Types of classification
  • Applications...

Getting started with classification

Movie recommendation can be framed as a machine learning classification problem. If it is predicted that you like a movie, for example, then it will be on your recommended list, otherwise, it won't. Let's get started by learning the important concepts of machine learning classification.

Classification is one of the main instances of supervised learning. Given a training set of data containing observations and their associated categorical outputs, the goal of classification is to learn a general rule that correctly maps the observations (also called features or predictive variables) to the target categories (also called labels or classes). Putting it another way, a trained classification model will be generated after the model learns from the features and targets of training samples, as shown in the first half of Figure 2.1. When new or unseen data comes in, the trained model will be able to determine their...

Exploring Naïve Bayes

The Naïve Bayes classifier belongs to the family of probabilistic classifiers. It computes the probabilities of each predictive feature (also referred to as an attribute or signal) of the data belonging to each class in order to make a prediction of probability distribution over all classes. Of course, from the resulting probability distribution, we can conclude the most likely class that the data sample is associated with. What Naïve Bayes does specifically, as its name indicates, is as follows:

  • Bayes: As in, it maps the probability of observed input features given a possible class to the probability of the class given observed pieces of evidence based on Bayes' theorem.
  • Naïve: As in, it simplifies probability computation by assuming that predictive features are mutually independent.

I will explain Bayes' theorem with examples in the next section.

Learning Bayes' theorem...

Implementing Naïve Bayes

After calculating by hand the movie preference prediction example, as promised, we are going to code Naïve Bayes from scratch. After that, we will implement it using the scikit-learn package.

Implementing Naïve Bayes from scratch

Before we develop the model, let's define the toy dataset we just worked with:

>>> import numpy as np
>>> X_train = np.array([
...     [0, 1, 1],
...     [0, 0, 1],
...     [0, 0, 0],
...     [1, 1, 0]])
>>> Y_train = ['Y', 'N', 'Y', 'Y']
>>> X_test = np.array([[1, 1, 0]])

For the model, starting with the prior, we first group the data by label and record their indices by classes:

>>> def get_label_indices(labels):
...     """
...     Group samples based on their labels and return indices
...     @param labels: list of labels
...     @return: dict, {class1: [indices], class2: [indices]}...

Building a movie recommender with Naïve Bayes

After the toy example, it is now time to build a movie recommender (or, more specifically, movie preference classifier) using a real dataset. We herein use a movie rating dataset (https://grouplens.org/datasets/movielens/). The movie rating data was collected by the GroupLens Research group from the MovieLens website (http://movielens.org).

For demonstration purposes, we will use the small dataset, ml-latest-small (downloaded from the following link: http://files.grouplens.org/datasets/movielens/ml-latest-small.zip of ml-latest-small.zip (size: 1 MB)) as an example. It has around 100,00 ratings, ranging from 1 to 5, given by 6,040 users on 3,706 movies (last updated September 2018).

Unzip the ml-1m.zip file and you will see the following four files:

  • movies.dat: It contains the movie information in the format of MovieID::Title::Genres.
  • ratings.dat: It contains user movie ratings in the format of UserID...

Evaluating classification performance

Beyond accuracy, there are several metrics we can use to gain more insight and to avoid class imbalance effects. These are as follows:

  • Confusion matrix
  • Precision
  • Recall
  • F1 score
  • Area under the curve

confusion matrix summarizes testing instances by their predicted values and true values, presented as a contingency table:

Table 2.3: Contingency table for a confusion matrix

To illustrate this, we can compute the confusion matrix of our Naïve Bayes classifier. We use the confusion_matrix function from scikit-learn to compute it, but it is very easy to code it ourselves:

>>> from sklearn.metrics import confusion_matrix
>>> print(confusion_matrix(Y_test, prediction, labels=[0, 1]))
[[ 60  47]
 [148 431]]

As you can see from the resulting confusion matrix, there are 47 false positive cases (where the model misinterprets a dislike as a like...

Tuning models with cross-validation

We can simply avoid adopting the classification results from one fixed testing set, which we did in experiments previously. Instead, we usually apply the k-fold cross-validation technique to assess how a model will generally perform in practice.

In the k-fold cross-validation setting, the original data is first randomly divided into k equal-sized subsets, in which class proportion is often preserved. Each of these k subsets is then successively retained as the testing set for evaluating the model. During each trial, the rest of the k -1 subsets (excluding the one-fold holdout) form the training set for driving the model. Finally, the average performance across all k trials is calculated to generate an overall result:

Figure 2.9: Diagram of 3-fold cross-validation

Statistically, the averaged performance of k-fold cross-validation is a better estimate of how a model performs in general. Given...

Summary

In this chapter, you learned the fundamental and important concepts of machine learning classification, including types of classification, classification performance evaluation, cross-validation, and model tuning. You also learned about the simple, yet powerful, classifier Naïve Bayes. We went in depth through the mechanics and implementations of Naïve Bayes with a couple of examples, the most important one being the movie recommendation project.

Binary classification was the main talking point of this chapter, and multiclass classification will be the subject of the next chapter. Specifically, we will talk about SVMs for image classification.

Exercise

  1. As mentioned earlier, we extracted user-movie relationships only from the movie rating data where most ratings are unknown. Can you also utilize data from the files movies.dat and users.dat?
  2. Practice makes perfect—another great project to deepen your understanding could be heart disease classification. The dataset can be downloaded directly at https://www.kaggle.com/ronitf/heart-disease-uci, or from the original page at https://archive.ics.uci.edu/ml/datasets/Heart+Disease.
  3. Don't forget to fine-tune the model you obtained from Exercise 2 using the techniques you learned in this chapter. What is the best AUC it achieves?

References

To acknowledge the use of the MovieLens dataset in this chapter, I would like to cite the following paper:

F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4, Article 19 (December 2015), 19 pages. DOI=http://dx.doi.org/10.1145/2827872

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Dive into machine learning algorithms to solve the complex challenges faced by data scientists today
  • Explore cutting edge content reflecting deep learning and reinforcement learning developments
  • Use updated Python libraries such as TensorFlow, PyTorch, and scikit-learn to track machine learning projects end-to-end

Description

Python Machine Learning By Example, Third Edition serves as a comprehensive gateway into the world of machine learning (ML). With six new chapters, on topics including movie recommendation engine development with Naïve Bayes, recognizing faces with support vector machine, predicting stock prices with artificial neural networks, categorizing images of clothing with convolutional neural networks, predicting with sequences using recurring neural networks, and leveraging reinforcement learning for making decisions, the book has been considerably updated for the latest enterprise requirements. At the same time, this book provides actionable insights on the key fundamentals of ML with Python programming. Hayden applies his expertise to demonstrate implementations of algorithms in Python, both from scratch and with libraries. Each chapter walks through an industry-adopted application. With the help of realistic examples, you will gain an understanding of the mechanics of ML techniques in areas such as exploratory data analysis, feature engineering, classification, regression, clustering, and NLP. By the end of this ML Python book, you will have gained a broad picture of the ML ecosystem and will be well-versed in the best practices of applying ML techniques to solve problems.

Who is this book for?

If you’re a machine learning enthusiast, data analyst, or data engineer highly passionate about machine learning and want to begin working on machine learning assignments, this book is for you. Prior knowledge of Python coding is assumed and basic familiarity with statistical concepts will be beneficial, although this is not necessary.

What you will learn

  • Understand the important concepts in ML and data science
  • Use Python to explore the world of data mining and analytics
  • Scale up model training using varied data complexities with Apache Spark
  • Delve deep into text analysis and NLP using Python libraries such NLTK and Gensim
  • Select and build an ML model and evaluate and optimize its performance
  • Implement ML algorithms from scratch in Python, TensorFlow 2, PyTorch, and scikit-learn

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Oct 30, 2020
Length: 526 pages
Edition : 3rd
Language : English
ISBN-13 : 9781800203860
Category :
Languages :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning

Product Details

Publication date : Oct 30, 2020
Length: 526 pages
Edition : 3rd
Language : English
ISBN-13 : 9781800203860
Category :
Languages :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 151.97
Python Machine Learning by Example
$38.99
Python Machine Learning
$54.99
Machine Learning for Algorithmic Trading
$57.99
Total $ 151.97 Stars icon

Table of Contents

16 Chapters
Getting Started with Machine Learning and Python Chevron down icon Chevron up icon
Building a Movie Recommendation Engine with Naïve Bayes Chevron down icon Chevron up icon
Recognizing Faces with Support Vector Machine Chevron down icon Chevron up icon
Predicting Online Ad Click-Through with Tree-Based Algorithms Chevron down icon Chevron up icon
Predicting Online Ad Click-Through with Logistic Regression Chevron down icon Chevron up icon
Scaling Up Prediction to Terabyte Click Logs Chevron down icon Chevron up icon
Predicting Stock Prices with Regression Algorithms Chevron down icon Chevron up icon
Predicting Stock Prices with Artificial Neural Networks Chevron down icon Chevron up icon
Mining the 20 Newsgroups Dataset with Text Analysis Techniques Chevron down icon Chevron up icon
Discovering Underlying Topics in the Newsgroups Dataset with Clustering and Topic Modeling Chevron down icon Chevron up icon
Machine Learning Best Practices Chevron down icon Chevron up icon
Categorizing Images of Clothing with Convolutional Neural Networks Chevron down icon Chevron up icon
Making Predictions with Sequences Using Recurrent Neural Networks Chevron down icon Chevron up icon
Making Decisions in Complex Environments with Reinforcement Learning Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
(20 Ratings)
5 star 55%
4 star 25%
3 star 0%
2 star 5%
1 star 15%
Filter icon Filter
Top Reviews

Filter reviews by




siebo Dec 15, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Excellent book, I really enjoy Yuxi's writing style. He is equally adept at explaining algos, classifiers, and code architecture as he is navigating the business cases, design workflows and the stories behind the data. Essentially, everything you would expect in a top-flight Google engineer. If you work in AdTech or Analytics-driven Marketing, the book is an obvious buy for chapters 4-6 alone on predicting and optimizing ad click-through. Yuxi manages to fit a huge amount of content into this book, while delivering each topic in concise, approachable writing. At 500+ pages, he is not cutting any corners. My favorite chapters were the ones on Facial Recognition and predicting financial market pattern, as those are the most relevant to my own work. The Best Practices section in Ch. 11 will save you from a lot of issues. I think this book would be approachable to beginners, as the author starts from a foundation level and goes up from there. For advanced who want to dive deeper, Python Machine Learning (from the same publisher) is a good follow-on, and covers some of the topics from this book in greater detail.
Amazon Verified review Amazon
Annie Jun 19, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The concepts are explained clearly and step by step. Machine learning is not an easy topic, I find that this book is really helpful
Amazon Verified review Amazon
Brandon Dec 01, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The book is a great practical resource for those interested in applying ML techniques quickly. As other reviewers have mentioned, it is a bit light on theory and the more technical aspects of ML until you get deeper into the book. Having said that, the examples and code provided are very practical and to the point. I would recommend it if you are either already familiar with basic ML theory and the math behind it or if you have a software engineering background and are simply looking to quickly implement an ML solution. The author walks you through code segments and explains each step and by the end, you will have covered most of the more popular ML models and know which ones to use given your data and your goals.The chapters on SVMs and building and predicting ad-clicks are very practical. I like how the author walks through each model type and compares them so you have a baseline and can see the differences in implementation and result. That was a helpful exercise spread across a few chapters. Also, the Best Practices chapter near the end was a good, "I need an answer quickly" kind of reference.Overall, I liked the book and think it would be helpful from a more practical perspective. I think this book paired with another more theoretical resource would really round out those seeking to learn ML.
Amazon Verified review Amazon
crystalattice Nov 23, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Python ML By Example (BE) is a good complement to Python ML Third Edition (3E). The 3E book focuses on the theory and general application of ML programming, while the BE book focuses an specific application examples.While they both tackle ML programming, their approach is different. The BE book assumes you have a reasonable, foundational background in ML and uses that basis to create specific ML-based applications.For example, whereas 3E has a simple note about Naïve Bayes classification, the BE book has a whole chapter dedicated to the algorithm, discussing the different types of classification methods, how Naïve Bayes works, and then actually implementing a Naïve Bayes application. On the flip side, the 3E book has a whole chapter dedicated just to the different classifiers and different implementations of them using scikit-learn.It's almost like the 3E book is a textbook and the BE book is its complementary workbook for practice. While you may be able to be successful with either one, combining them really maximizes your ML learning.To speak about the BE book in more detail, the topics covered include:*Introduction to Python ML, including software installation*Using Naïve Bayes algorithm to create movie recommendation application*Using SVM for facial recognition*Using tree-based algorithms to predict ad click-through*Using Apache Spark to work with large data sets*Using regression algorithms and neural networks to predict the stock market*Using text analysis and NLP to data mine newsgroups*Using unsupervised learning models to identify newsgroups topics*Using different types of neural networks for different types of analysis approaches*Using reinforcement learning for decision making*ML best practicesIt is a long book (nearly 500 pages), but the material is invaluable for anyone in the ML field, especially if you don't have a lot of experience with the different algorithms. And in conjunction with 3E, you almost have a complete ML curriculum.
Amazon Verified review Amazon
Matt M Dec 08, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
It is a fantastic, well-written book in machine learning with python. It provides enough technical background about each technique's theory, followed by beautiful graphs/tables and its Python code.It covers various topics and practical examples, such as the powerful and popular classification models in supervised learning, regression algorithms, reinforcement learning, and much more.This book prepares you for real-world machine learning problems. I highly recommended it!
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.