Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Python Machine Learning by Example
Python Machine Learning by Example

Python Machine Learning by Example: Build intelligent systems using Python, TensorFlow 2, PyTorch, and scikit-learn , Third Edition

Arrow left icon
Profile Icon Yuxi (Hayden) Liu
Arrow right icon
€18.99 per month
Full star icon Full star icon Full star icon Full star icon Empty star icon 4 (20 Ratings)
Paperback Oct 2020 526 pages 3rd Edition
eBook
€15.99 €23.99
Paperback
€29.99
Subscription
Free Trial
Renews at €18.99p/m
Arrow left icon
Profile Icon Yuxi (Hayden) Liu
Arrow right icon
€18.99 per month
Full star icon Full star icon Full star icon Full star icon Empty star icon 4 (20 Ratings)
Paperback Oct 2020 526 pages 3rd Edition
eBook
€15.99 €23.99
Paperback
€29.99
Subscription
Free Trial
Renews at €18.99p/m
eBook
€15.99 €23.99
Paperback
€29.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Python Machine Learning by Example

Building a Movie Recommendation Engine with Naïve Bayes

As promised, in this chapter, we will kick off our supervised learning journey with machine learning classification, and specifically, binary classification. The goal of the chapter is to build a movie recommendation system. It is a good starting point to learn classification from a real-life example—movie streaming service providers are already doing this, and we can do the same. You will learn the fundamental concepts of classification, including what it does and its various types and applications, with a focus on solving a binary classification problem using a simple, yet powerful, algorithm, Naïve Bayes. Finally, the chapter will demonstrate how to fine-tune a model, which is an important skill that every data science or machine learning practitioner should learn.

We will go into detail on the following topics:

  • What is machine learning classification?
  • Types of classification
  • Applications...

Getting started with classification

Movie recommendation can be framed as a machine learning classification problem. If it is predicted that you like a movie, for example, then it will be on your recommended list, otherwise, it won't. Let's get started by learning the important concepts of machine learning classification.

Classification is one of the main instances of supervised learning. Given a training set of data containing observations and their associated categorical outputs, the goal of classification is to learn a general rule that correctly maps the observations (also called features or predictive variables) to the target categories (also called labels or classes). Putting it another way, a trained classification model will be generated after the model learns from the features and targets of training samples, as shown in the first half of Figure 2.1. When new or unseen data comes in, the trained model will be able to determine their...

Exploring Naïve Bayes

The Naïve Bayes classifier belongs to the family of probabilistic classifiers. It computes the probabilities of each predictive feature (also referred to as an attribute or signal) of the data belonging to each class in order to make a prediction of probability distribution over all classes. Of course, from the resulting probability distribution, we can conclude the most likely class that the data sample is associated with. What Naïve Bayes does specifically, as its name indicates, is as follows:

  • Bayes: As in, it maps the probability of observed input features given a possible class to the probability of the class given observed pieces of evidence based on Bayes' theorem.
  • Naïve: As in, it simplifies probability computation by assuming that predictive features are mutually independent.

I will explain Bayes' theorem with examples in the next section.

Learning Bayes' theorem...

Implementing Naïve Bayes

After calculating by hand the movie preference prediction example, as promised, we are going to code Naïve Bayes from scratch. After that, we will implement it using the scikit-learn package.

Implementing Naïve Bayes from scratch

Before we develop the model, let's define the toy dataset we just worked with:

>>> import numpy as np
>>> X_train = np.array([
...     [0, 1, 1],
...     [0, 0, 1],
...     [0, 0, 0],
...     [1, 1, 0]])
>>> Y_train = ['Y', 'N', 'Y', 'Y']
>>> X_test = np.array([[1, 1, 0]])

For the model, starting with the prior, we first group the data by label and record their indices by classes:

>>> def get_label_indices(labels):
...     """
...     Group samples based on their labels and return indices
...     @param labels: list of labels
...     @return: dict, {class1: [indices], class2: [indices]}...

Building a movie recommender with Naïve Bayes

After the toy example, it is now time to build a movie recommender (or, more specifically, movie preference classifier) using a real dataset. We herein use a movie rating dataset (https://grouplens.org/datasets/movielens/). The movie rating data was collected by the GroupLens Research group from the MovieLens website (http://movielens.org).

For demonstration purposes, we will use the small dataset, ml-latest-small (downloaded from the following link: http://files.grouplens.org/datasets/movielens/ml-latest-small.zip of ml-latest-small.zip (size: 1 MB)) as an example. It has around 100,00 ratings, ranging from 1 to 5, given by 6,040 users on 3,706 movies (last updated September 2018).

Unzip the ml-1m.zip file and you will see the following four files:

  • movies.dat: It contains the movie information in the format of MovieID::Title::Genres.
  • ratings.dat: It contains user movie ratings in the format of UserID...

Evaluating classification performance

Beyond accuracy, there are several metrics we can use to gain more insight and to avoid class imbalance effects. These are as follows:

  • Confusion matrix
  • Precision
  • Recall
  • F1 score
  • Area under the curve

confusion matrix summarizes testing instances by their predicted values and true values, presented as a contingency table:

Table 2.3: Contingency table for a confusion matrix

To illustrate this, we can compute the confusion matrix of our Naïve Bayes classifier. We use the confusion_matrix function from scikit-learn to compute it, but it is very easy to code it ourselves:

>>> from sklearn.metrics import confusion_matrix
>>> print(confusion_matrix(Y_test, prediction, labels=[0, 1]))
[[ 60  47]
 [148 431]]

As you can see from the resulting confusion matrix, there are 47 false positive cases (where the model misinterprets a dislike as a like...

Tuning models with cross-validation

We can simply avoid adopting the classification results from one fixed testing set, which we did in experiments previously. Instead, we usually apply the k-fold cross-validation technique to assess how a model will generally perform in practice.

In the k-fold cross-validation setting, the original data is first randomly divided into k equal-sized subsets, in which class proportion is often preserved. Each of these k subsets is then successively retained as the testing set for evaluating the model. During each trial, the rest of the k -1 subsets (excluding the one-fold holdout) form the training set for driving the model. Finally, the average performance across all k trials is calculated to generate an overall result:

Figure 2.9: Diagram of 3-fold cross-validation

Statistically, the averaged performance of k-fold cross-validation is a better estimate of how a model performs in general. Given...

Summary

In this chapter, you learned the fundamental and important concepts of machine learning classification, including types of classification, classification performance evaluation, cross-validation, and model tuning. You also learned about the simple, yet powerful, classifier Naïve Bayes. We went in depth through the mechanics and implementations of Naïve Bayes with a couple of examples, the most important one being the movie recommendation project.

Binary classification was the main talking point of this chapter, and multiclass classification will be the subject of the next chapter. Specifically, we will talk about SVMs for image classification.

Exercise

  1. As mentioned earlier, we extracted user-movie relationships only from the movie rating data where most ratings are unknown. Can you also utilize data from the files movies.dat and users.dat?
  2. Practice makes perfect—another great project to deepen your understanding could be heart disease classification. The dataset can be downloaded directly at https://www.kaggle.com/ronitf/heart-disease-uci, or from the original page at https://archive.ics.uci.edu/ml/datasets/Heart+Disease.
  3. Don't forget to fine-tune the model you obtained from Exercise 2 using the techniques you learned in this chapter. What is the best AUC it achieves?

References

To acknowledge the use of the MovieLens dataset in this chapter, I would like to cite the following paper:

F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4, Article 19 (December 2015), 19 pages. DOI=http://dx.doi.org/10.1145/2827872

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Dive into machine learning algorithms to solve the complex challenges faced by data scientists today
  • Explore cutting edge content reflecting deep learning and reinforcement learning developments
  • Use updated Python libraries such as TensorFlow, PyTorch, and scikit-learn to track machine learning projects end-to-end

Description

Python Machine Learning By Example, Third Edition serves as a comprehensive gateway into the world of machine learning (ML). With six new chapters, on topics including movie recommendation engine development with Naïve Bayes, recognizing faces with support vector machine, predicting stock prices with artificial neural networks, categorizing images of clothing with convolutional neural networks, predicting with sequences using recurring neural networks, and leveraging reinforcement learning for making decisions, the book has been considerably updated for the latest enterprise requirements. At the same time, this book provides actionable insights on the key fundamentals of ML with Python programming. Hayden applies his expertise to demonstrate implementations of algorithms in Python, both from scratch and with libraries. Each chapter walks through an industry-adopted application. With the help of realistic examples, you will gain an understanding of the mechanics of ML techniques in areas such as exploratory data analysis, feature engineering, classification, regression, clustering, and NLP. By the end of this ML Python book, you will have gained a broad picture of the ML ecosystem and will be well-versed in the best practices of applying ML techniques to solve problems.

Who is this book for?

If you’re a machine learning enthusiast, data analyst, or data engineer highly passionate about machine learning and want to begin working on machine learning assignments, this book is for you. Prior knowledge of Python coding is assumed and basic familiarity with statistical concepts will be beneficial, although this is not necessary.

What you will learn

  • Understand the important concepts in ML and data science
  • Use Python to explore the world of data mining and analytics
  • Scale up model training using varied data complexities with Apache Spark
  • Delve deep into text analysis and NLP using Python libraries such NLTK and Gensim
  • Select and build an ML model and evaluate and optimize its performance
  • Implement ML algorithms from scratch in Python, TensorFlow 2, PyTorch, and scikit-learn

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Oct 30, 2020
Length: 526 pages
Edition : 3rd
Language : English
ISBN-13 : 9781800209718
Category :
Languages :
Tools :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Oct 30, 2020
Length: 526 pages
Edition : 3rd
Language : English
ISBN-13 : 9781800209718
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 115.97
Python Machine Learning by Example
€29.99
Python Machine Learning
€41.99
Machine Learning for Algorithmic Trading
€43.99
Total 115.97 Stars icon

Table of Contents

16 Chapters
Getting Started with Machine Learning and Python Chevron down icon Chevron up icon
Building a Movie Recommendation Engine with Naïve Bayes Chevron down icon Chevron up icon
Recognizing Faces with Support Vector Machine Chevron down icon Chevron up icon
Predicting Online Ad Click-Through with Tree-Based Algorithms Chevron down icon Chevron up icon
Predicting Online Ad Click-Through with Logistic Regression Chevron down icon Chevron up icon
Scaling Up Prediction to Terabyte Click Logs Chevron down icon Chevron up icon
Predicting Stock Prices with Regression Algorithms Chevron down icon Chevron up icon
Predicting Stock Prices with Artificial Neural Networks Chevron down icon Chevron up icon
Mining the 20 Newsgroups Dataset with Text Analysis Techniques Chevron down icon Chevron up icon
Discovering Underlying Topics in the Newsgroups Dataset with Clustering and Topic Modeling Chevron down icon Chevron up icon
Machine Learning Best Practices Chevron down icon Chevron up icon
Categorizing Images of Clothing with Convolutional Neural Networks Chevron down icon Chevron up icon
Making Predictions with Sequences Using Recurrent Neural Networks Chevron down icon Chevron up icon
Making Decisions in Complex Environments with Reinforcement Learning Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
(20 Ratings)
5 star 55%
4 star 25%
3 star 0%
2 star 5%
1 star 15%
Filter icon Filter
Top Reviews

Filter reviews by




siebo Dec 15, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Excellent book, I really enjoy Yuxi's writing style. He is equally adept at explaining algos, classifiers, and code architecture as he is navigating the business cases, design workflows and the stories behind the data. Essentially, everything you would expect in a top-flight Google engineer. If you work in AdTech or Analytics-driven Marketing, the book is an obvious buy for chapters 4-6 alone on predicting and optimizing ad click-through. Yuxi manages to fit a huge amount of content into this book, while delivering each topic in concise, approachable writing. At 500+ pages, he is not cutting any corners. My favorite chapters were the ones on Facial Recognition and predicting financial market pattern, as those are the most relevant to my own work. The Best Practices section in Ch. 11 will save you from a lot of issues. I think this book would be approachable to beginners, as the author starts from a foundation level and goes up from there. For advanced who want to dive deeper, Python Machine Learning (from the same publisher) is a good follow-on, and covers some of the topics from this book in greater detail.
Amazon Verified review Amazon
Annie Jun 19, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The concepts are explained clearly and step by step. Machine learning is not an easy topic, I find that this book is really helpful
Amazon Verified review Amazon
Brandon Dec 01, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The book is a great practical resource for those interested in applying ML techniques quickly. As other reviewers have mentioned, it is a bit light on theory and the more technical aspects of ML until you get deeper into the book. Having said that, the examples and code provided are very practical and to the point. I would recommend it if you are either already familiar with basic ML theory and the math behind it or if you have a software engineering background and are simply looking to quickly implement an ML solution. The author walks you through code segments and explains each step and by the end, you will have covered most of the more popular ML models and know which ones to use given your data and your goals.The chapters on SVMs and building and predicting ad-clicks are very practical. I like how the author walks through each model type and compares them so you have a baseline and can see the differences in implementation and result. That was a helpful exercise spread across a few chapters. Also, the Best Practices chapter near the end was a good, "I need an answer quickly" kind of reference.Overall, I liked the book and think it would be helpful from a more practical perspective. I think this book paired with another more theoretical resource would really round out those seeking to learn ML.
Amazon Verified review Amazon
crystalattice Nov 23, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Python ML By Example (BE) is a good complement to Python ML Third Edition (3E). The 3E book focuses on the theory and general application of ML programming, while the BE book focuses an specific application examples.While they both tackle ML programming, their approach is different. The BE book assumes you have a reasonable, foundational background in ML and uses that basis to create specific ML-based applications.For example, whereas 3E has a simple note about Naïve Bayes classification, the BE book has a whole chapter dedicated to the algorithm, discussing the different types of classification methods, how Naïve Bayes works, and then actually implementing a Naïve Bayes application. On the flip side, the 3E book has a whole chapter dedicated just to the different classifiers and different implementations of them using scikit-learn.It's almost like the 3E book is a textbook and the BE book is its complementary workbook for practice. While you may be able to be successful with either one, combining them really maximizes your ML learning.To speak about the BE book in more detail, the topics covered include:*Introduction to Python ML, including software installation*Using Naïve Bayes algorithm to create movie recommendation application*Using SVM for facial recognition*Using tree-based algorithms to predict ad click-through*Using Apache Spark to work with large data sets*Using regression algorithms and neural networks to predict the stock market*Using text analysis and NLP to data mine newsgroups*Using unsupervised learning models to identify newsgroups topics*Using different types of neural networks for different types of analysis approaches*Using reinforcement learning for decision making*ML best practicesIt is a long book (nearly 500 pages), but the material is invaluable for anyone in the ML field, especially if you don't have a lot of experience with the different algorithms. And in conjunction with 3E, you almost have a complete ML curriculum.
Amazon Verified review Amazon
Matt M Dec 08, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
It is a fantastic, well-written book in machine learning with python. It provides enough technical background about each technique's theory, followed by beautiful graphs/tables and its Python code.It covers various topics and practical examples, such as the powerful and popular classification models in supervised learning, regression algorithms, reinforcement learning, and much more.This book prepares you for real-world machine learning problems. I highly recommended it!
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.