Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Python Machine Learning By Example
Python Machine Learning By Example

Python Machine Learning By Example: Unlock machine learning best practices with real-world use cases , Fourth Edition

Arrow left icon
Profile Icon Yuxi (Hayden) Liu
Arrow right icon
$19.99 per month
Full star icon Full star icon Full star icon Full star icon Half star icon 4.9 (8 Ratings)
Paperback Jul 2024 518 pages 4th Edition
eBook
$24.99 $36.99
Paperback
$45.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon Yuxi (Hayden) Liu
Arrow right icon
$19.99 per month
Full star icon Full star icon Full star icon Full star icon Half star icon 4.9 (8 Ratings)
Paperback Jul 2024 518 pages 4th Edition
eBook
$24.99 $36.99
Paperback
$45.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
$24.99 $36.99
Paperback
$45.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Python Machine Learning By Example

Building a Movie Recommendation Engine with Naïve Bayes

As promised, in this chapter, we will kick off our supervised learning journey with machine learning classification, and specifically, binary classification. The goal of the chapter is to build a movie recommendation system, which is a good starting point for learning classification from a real-life example—movie streaming service providers are already doing this, and we can do the same.

In this chapter, you will learn the fundamental concepts of classification, including what it does and its various types and applications, with a focus on solving a binary classification problem using a simple, yet powerful, algorithm, Naïve Bayes. Finally, the chapter will demonstrate how to fine-tune a model, which is an important skill that every data science or machine learning practitioner should learn.

We will go into detail on the following topics:

  • Getting started with classification
  • Exploring...

Getting started with classification

Movie recommendation can be framed as a machine learning classification problem. If it is predicted that you’ll like a movie because you’ve liked or watched similar movies, for example, then it will be on your recommended list; otherwise, it won’t. Let’s get started by learning the important concepts of machine learning classification.

Classification is one of the main instances of supervised learning. Given a training set of data containing observations and their associated categorical outputs, the goal of classification is to learn a general rule that correctly maps the observations (also called features or predictive variables) to the target categories (also called labels or classes). Putting it another way, a trained classification model will be generated after the model learns from the features and targets of training samples, as shown in the first half of Figure 2.1. When new or unseen data comes in, the trained...

Exploring Naïve Bayes

The Naïve Bayes classifier belongs to the family of probabilistic classifiers. It computes the probabilities of each predictive feature (also referred to as an attribute or signal) of the data belonging to each class in order to make a prediction of the probability distribution over all classes. Of course, from the resulting probability distribution, we can conclude the most likely class that the data sample is associated with. What Naïve Bayes does specifically, as its name indicates, is as follows:

  • Bayes: As in, it maps the probability of observed input features given a possible class to the probability of the class given observed pieces of evidence based on Bayes’ theorem.
  • Naïve: As in, it simplifies probability computation by assuming that predictive features are mutually independent.

I will explain Bayes’ theorem with examples in the next section.

Bayes’ theorem by example

It is important...

Implementing Naïve Bayes

After calculating the movie preference example by hand, as promised, we are going to implement Naïve Bayes from scratch. After that, we will implement it using the scikit-learn package.

Implementing Naïve Bayes from scratch

Before we develop the model, let’s define the toy dataset we just worked with:

>>> import numpy as np
>>> X_train = np.array([
...     [0, 1, 1],
...     [0, 0, 1],
...     [0, 0, 0],
...     [1, 1, 0]])
>>> Y_train = ['Y', 'N', 'Y', 'Y']
>>> X_test = np.array([[1, 1, 0]])

For the model, starting with the prior, we first group the data by label and record their indices by classes:

>>> def get_label_indices(labels):
...     """
...     Group samples based on their labels and return indices
...     @param labels: list of labels
...     @return: dict, {class1: [indices], class2: [indices]}
...     ...

Building a movie recommender with Naïve Bayes

After the toy example, it is now time to build a movie recommender (or, more specifically, movie preference classifier) using a real dataset. We herein use a movie rating dataset (https://grouplens.org/datasets/movielens/). The movie rating data was collected by the GroupLens Research group from the MovieLens website (http://movielens.org).

For demonstration purposes, we will use the stable small dataset, MovieLens 1M Dataset (which can be downloaded from https://files.grouplens.org/datasets/movielens/ml-1m.zip or https://grouplens.org/datasets/movielens/1m/) for ml-1m.zip (size: 1 MB) file). It has around 1 million ratings, ranging from 1 to 5 with half-star increments, given by 6,040 users on 3,706 movies (last updated September 2018).

Unzip the ml-1m.zip file and you will see the following four files:

  • movies.dat: It contains the movie information in the format of MovieID::Title::Genres.
  • ratings.dat: It...

Evaluating classification performance

Beyond accuracy, there are several metrics we can use to gain more insight and avoid class imbalance effects. These are as follows:

  • Confusion matrix
  • Precision
  • Recall
  • F1 score
  • The area under the curve

A confusion matrix summarizes testing instances by their predicted values and true values, presented as a contingency table:

Figure 2.8: Contingency table for a confusion matrix

To illustrate this, we can compute the confusion matrix of our Naïve Bayes classifier. We use the confusion_matrix function from scikit-learn to compute it, but it is very easy to code it ourselves:

>>> from sklearn.metrics import confusion_matrix
>>> print(confusion_matrix(Y_test, prediction, labels=[0, 1]))
[[ 60  47]
 [148 431]]

As you can see from the resulting confusion matrix, there are 47 false positive cases (where the model misinterprets a dislike as a like for a movie), and 148...

Tuning models with cross-validation

Limiting the evaluation to a single fixed set may be misleading since it’s highly dependent on the specific data points chosen for that set. We can simply avoid adopting the classification results from one fixed testing set, which we did in experiments previously. Instead, we usually apply the k-fold cross-validation technique to assess how a model will generally perform in practice.

In the k-fold cross-validation setting, the original data is first randomly divided into k equal-sized subsets, in which class proportion is often preserved. Each of these k subsets is then successively retained as the testing set for evaluating the model. During each trial, the rest of the k -1 subsets (excluding the one-fold holdout) form the training set for driving the model. Finally, the average performance across all k trials is calculated to generate an overall result:

Figure 2.10: Diagram of 3-fold cross-validation

Statistically, the...

Summary

In this chapter, you learned about the fundamental concepts of machine learning classification, including types of classification, classification performance evaluation, cross-validation, and model tuning. You also learned about the simple, yet powerful, classifier, Naïve Bayes. We went in depth through the mechanics and implementations of Naïve Bayes with a couple of examples, the most important one being the movie recommendation project.

Binary classification using Naïve Bayes was the main talking point of this chapter. In the next chapter, we will solve ad click-through prediction using another binary classification algorithm: a decision tree.

Exercises

  1. As mentioned earlier, we extracted user-movie relationships only from the movie rating data where most ratings are unknown. Can you also utilize data from the movies.dat and users.dat files?
  2. Practice makes perfect—another great project to deepen your understanding could be heart disease classification. The dataset can be downloaded directly from https://archive.ics.uci.edu/ml/datasets/Heart+Disease.
  3. Don’t forget to fine-tune the model you obtained from Exercise 2 using the techniques you learned in this chapter. What is the best AUC it achieves?

References

To acknowledge the use of the MovieLens dataset in this chapter, I would like to cite the following paper:

F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4, Article 19 (December 2015), 19 pages. DOI: http://dx.doi.org/10.1145/2827872.

Join our book’s Discord space

Join our community’s Discord space for discussions with the authors and other readers:

https://packt.link/yuxi

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Discover new and updated content on NLP transformers, PyTorch, and computer vision modeling
  • Includes a dedicated chapter on best practices and additional best practice tips throughout the book to improve your ML solutions
  • Implement ML models, such as neural networks and linear and logistic regression, from scratch
  • Purchase of the print or Kindle book includes a free PDF copy

Description

The fourth edition of Python Machine Learning By Example is a comprehensive guide for beginners and experienced machine learning practitioners who want to learn more advanced techniques, such as multimodal modeling. Written by experienced machine learning author and ex-Google machine learning engineer Yuxi (Hayden) Liu, this edition emphasizes best practices, providing invaluable insights for machine learning engineers, data scientists, and analysts. Explore advanced techniques, including two new chapters on natural language processing transformers with BERT and GPT, and multimodal computer vision models with PyTorch and Hugging Face. You’ll learn key modeling techniques using practical examples, such as predicting stock prices and creating an image search engine. This hands-on machine learning book navigates through complex challenges, bridging the gap between theoretical understanding and practical application. Elevate your machine learning and deep learning expertise, tackle intricate problems, and unlock the potential of advanced techniques in machine learning with this authoritative guide.

Who is this book for?

This expanded fourth edition is ideal for data scientists, ML engineers, analysts, and students with Python programming knowledge. The real-world examples, best practices, and code prepare anyone undertaking their first serious ML project.

What you will learn

  • Follow machine learning best practices throughout data preparation and model development
  • Build and improve image classifiers using convolutional neural networks (CNNs) and transfer learning
  • Develop and fine-tune neural networks using TensorFlow and PyTorch
  • Analyze sequence data and make predictions using recurrent neural networks (RNNs), transformers, and CLIP
  • Build classifiers using support vector machines (SVMs) and boost performance with PCA
  • Avoid overfitting using regularization, feature selection, and more

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jul 31, 2024
Length: 518 pages
Edition : 4th
Language : English
ISBN-13 : 9781835085622
Vendor :
Google
Category :
Languages :
Tools :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Jul 31, 2024
Length: 518 pages
Edition : 4th
Language : English
ISBN-13 : 9781835085622
Vendor :
Google
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 150.97
Python Machine Learning By Example
$45.99
Mastering NLP from Foundations to LLMs
$52.99
Mastering PyTorch
$51.99
Total $ 150.97 Stars icon
Banner background image

Table of Contents

17 Chapters
Getting Started with Machine Learning and Python Chevron down icon Chevron up icon
Building a Movie Recommendation Engine with Naïve Bayes Chevron down icon Chevron up icon
Predicting Online Ad Click-Through with Tree-Based Algorithms Chevron down icon Chevron up icon
Predicting Online Ad Click-Through with Logistic Regression Chevron down icon Chevron up icon
Predicting Stock Prices with Regression Algorithms Chevron down icon Chevron up icon
Predicting Stock Prices with Artificial Neural Networks Chevron down icon Chevron up icon
Mining the 20 Newsgroups Dataset with Text Analysis Techniques Chevron down icon Chevron up icon
Discovering Underlying Topics in the Newsgroups Dataset with Clustering and Topic Modeling Chevron down icon Chevron up icon
Recognizing Faces with Support Vector Machine Chevron down icon Chevron up icon
Machine Learning Best Practices Chevron down icon Chevron up icon
Categorizing Images of Clothing with Convolutional Neural Networks Chevron down icon Chevron up icon
Making Predictions with Sequences Using Recurrent Neural Networks Chevron down icon Chevron up icon
Advancing Language Understanding and Generation with the Transformer Models Chevron down icon Chevron up icon
Building an Image Search Engine Using CLIP: a Multimodal Approach Chevron down icon Chevron up icon
Making Decisions in Complex Environments with Reinforcement Learning Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.9
(8 Ratings)
5 star 87.5%
4 star 12.5%
3 star 0%
2 star 0%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




Jacob Smith Sep 21, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book is an absolute gem for anyone looking to dive deep into the world of machine learning using Python! From the moment I opened it, I was impressed by the clear, concise explanations and the practical examples that make even the most complex topics easy to understand.The author does a fantastic job of breaking down key machine learning algorithms, explaining not just the "how" but the "why" behind each method. The inclusion of real-world datasets and hands-on exercises makes it easy to follow along and apply what you've learned immediately.
Amazon Verified review Amazon
Ayon Roy Sep 05, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Starting my journey in machine learning was both exciting and overwhelming. I struggled to bridge the gap between theory and practical application in real-world projects. That’s why Yuxi Hayden Liu’s "Python Machine Learning by Example" has been a game-changer for me. This book offers a structured approach, making it easier to transition from learning to execution.Liu covers essential topics like overfitting, underfitting, and cross-validation right from the start, ensuring that you grasp the fundamentals. What truly sets this book apart is the hands-on projects that accompany each concept. From building a movie recommendation engine using Naive Bayes to predicting stock prices and exploring deep learning through artificial neural networks, Liu walks you through each step—from data preparation to model evaluation.The book is rich with best practices, such as feature engineering, algorithm selection, and monitoring model performance. By the end, you'll not only have a solid understanding of basic and advanced topics, including CNNs, transformer models, and reinforcement learning, but you’ll also feel confident applying them in real-world scenarios.Yuxi Hayden Liu’s industry experience shines through, making this book an invaluable guide for anyone feeling lost in their machine learning journey. Highly recommended for both students and professionals looking to elevate their skills. Happy reading!
Amazon Verified review Amazon
C. C Chin Oct 14, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Need hands on ML newbie!!Also Python newbie too but got computer science degree!!Ready all 5* reviews, book perfect for Machine learning newbie and Python newbie and AWS MLS-C01 exam and entry level machine learning specalty exam and Sagemaker studio!!All new for me!!!Need examples to make practice exams answers to help for AWS mls-c01 machine learning specalty exam AWS Sagemaker studio too, since all new to me!!!Got book October 13, 2024!! And pdf too!!Reading now to do ML example!!Got Oliver beginner book, udemy classBook 3 months old pretty new, October 14,2024!!!Explain Oliver beginner book got 3 of those!!
Amazon Verified review Amazon
saandeep sreerambatla Jul 31, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
"Python Machine Learning by Example, Fourth Edition" by Yuxi (Hayden) Liu is a fantastic resource for anyone interested in machine learning, whether you're just starting out or already have some experience. This book strikes a great balance between explaining the theory behind machine learning and showing you how to apply it in real-world scenarios, making it an essential addition to any data scientist’s collection.The book is well-organized, kicking off with the basics of machine learning and Python programming. Liu does an excellent job of explaining why machine learning is so important today and then helps you set up your Python environment. This ensures that even those with minimal programming experience can keep up.What really stands out about this book is its hands-on approach. Each chapter is packed with real-world examples that help bring complex machine learning concepts to life. For instance, the chapters on building a movie recommendation engine with Naïve Bayes and predicting stock prices with regression algorithms are particularly insightful, showing you exactly how these models work and how to apply them to real problems.The book also covers advanced topics like deep learning, natural language processing (NLP), and reinforcement learning. The sections on convolutional neural networks (CNNs) for image classification and recurrent neural networks (RNNs) for sequence prediction are especially useful. They provide a deep dive into these advanced models, complete with code examples using TensorFlow and PyTorch, which are incredibly helpful for anyone looking to implement these techniques in their own projects.Another great feature of this book is the focus on best practices. Liu includes 21 best practices that cover the entire machine learning workflow, from data preparation to model deployment and monitoring. This is invaluable for anyone looking to build robust and scalable machine learning solutions.It's worth noting that the book assumes you have a basic understanding of Python and some familiarity with statistical concepts. This might be a bit challenging for complete beginners, but it doesn't take away from the overall value of the book. Instead, it sets a realistic expectation for the level of expertise needed to fully benefit from the content.In conclusion, "Python Machine Learning by Example, Fourth Edition" is an excellent resource that bridges the gap between theory and practice. Yuxi (Hayden) Liu's clear explanations, practical examples, and focus on best practices make this book a must-read for anyone serious about mastering machine learning with Python. Whether you're a data analyst, a machine learning engineer, or a data scientist, this book will provide you with the tools and knowledge you need to succeed.
Amazon Verified review Amazon
Thomas M. Aug 21, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I highly recommend Liu's Python ML by Example! As a long term practitioner of all things analytics and data science, it was refreshing to come back to the foundations with this book. I wish I had this resource available when I was originally getting started in the field, as Liu has a knack for covering a broad range of salient topics in ML, while still offering plenty of depth for those looking to go into the weeds of how algorithms work. Super practical, this book focuses on real-life examples, spanning marketing & ads, content recommendations, text sentiment, image classification and beyond. The book also navigates tabular ML and deep learning concepts flawlessly. Liu doesn't stop at the fundamentals; the book also covers advanced topics like deep learning, natural language processing (NLP), and reinforcement learning. The sections on convolutional neural networks (CNNs) for image classification and recurrent neural networks (RNNs) for sequence prediction offer valuable insights into these cutting-edge techniques. These topics area all presented in ways that even new-to-ML readers would be able to grasp. These days, no ML book is complete without including GenAI as a topic, which the author integrates seamlessly. All around a super well rounded and practical read!
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.