Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits

You're reading from   Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits A practical guide to implementing supervised and unsupervised machine learning algorithms in Python

Arrow left icon
Product type Paperback
Published in Jul 2020
Publisher Packt
ISBN-13 9781838826048
Length 384 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Tarek Amr Tarek Amr
Author Profile Icon Tarek Amr
Tarek Amr
Arrow right icon
View More author details
Toc

Table of Contents (18) Chapters Close

Preface 1. Section 1: Supervised Learning
2. Introduction to Machine Learning FREE CHAPTER 3. Making Decisions with Trees 4. Making Decisions with Linear Equations 5. Preparing Your Data 6. Image Processing with Nearest Neighbors 7. Classifying Text Using Naive Bayes 8. Section 2: Advanced Supervised Learning
9. Neural Networks – Here Comes Deep Learning 10. Ensembles – When One Model Is Not Enough 11. The Y is as Important as the X 12. Imbalanced Learning – Not Even 1% Win the Lottery 13. Section 3: Unsupervised Learning and More
14. Clustering – Making Sense of Unlabeled Data 15. Anomaly Detection – Finding Outliers in Data 16. Recommender System – Getting to Know Their Taste 17. Other Books You May Enjoy

Introduction to scikit-learn

Since you have already picked up this book, you probably don't need me to convince you why machine learning is important. However, you may still have doubts about why to use scikit-learn in particular. You may encounter names such as TensorFlow, PyTorch, and Spark more often during your daily news consumption than scikit-learn. So, let me convince you of my preference for the latter.

It plays well with the Python data ecosystem

scikit-learn is a Python toolkit built on top of NumPy, SciPy, and Matplotlib. These choices mean that it fits well into your daily data pipeline. As a data scientist, Python is most likely your language of choice since it is good for both offline analysis and real-time implementations. You will also be using tools such as pandas to load data from your database, which allows you to perform a vast amount of transformation to your data. Since both pandas and scikit-learn are built on top of NumPy, they play very well with each other. Matplotlib is the de facto data visualization tool for Python, which means you can use its sophisticated data visualization capabilities to explore your data and unravel your model's ins and outs.

Since it is an open source tool that is heavily used in the community, it is very common to see other data tools use an almost identical interface to scikit-learn. Many of these tools are built on top of the same scientific Python libraries, and they are collectively known as SciKits (short for SciPyToolkits)—hence, the scikit prefix in scikit-learn. For example, scikit-image is a library for image processing, while categorical-encoding and imbalanced-learn are separate libraries for data preprocessing that are built as add-ons to scikit-learn.

We are going to use some of these tools in this book, and you will notice how easy it is to integrate these different tools into your workflow when using scikit-learn.

Being a key player in the Python data ecosystem is what makes scikit-learn the de facto toolset for machine learning. This is the tool that you will most likely hand your job application assignment to, as well as use for Kaggle competitions and to solve most of your professional day-to-day machine learning problems for your job.

Practical level of abstraction

scikit-learn implements a vast amount of machine learning, data processing, and model selection algorithms. These implementations are abstract enough, so you only need to apply minor changes when switching from one algorithm to another. This is a key feature since you will need to quickly iterate between different algorithms when developing a model to pick the best one for your problem. Having that said, this abstraction doesn't shield you from the algorithms' configurations. In other words, you are still in full control of your hyperparameters and settings.

When not to use scikit-learn

Most likely, the reasons to not use scikit-learn will include combinations of deep learning or scale. scikit-learn's implementation of neural networks is limited. Unlike scikit-learn, TensorFlow and PyTorch allow you to use a custom architecture, and they support GPUs for a massive training scale. All of scikit-learn's implementations run in memory on a single machine. I'd say that way more than 90% of businesses are at a scale where these constraints are fine. Data scientists can still fit their data in memory in large enough machines thanks to the cloud optionsavailable. They can cleverly engineer workarounds to deal with scaling issues, but if these limitations become something that they can no longer deal with, then they will need other tools to do the trick for them.

There are solutions being developed that allow scikit-learn to scale to multiple machines, such as Dask. Many scikit-learn algorithms allow parallel execution using joblib, which natively provides thread-based and process-based parallelism. Dask can scale these joblib-backed algorithms out to a cluster of machines by providing an alternative joblib backend.
You have been reading a chapter from
Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
Published in: Jul 2020
Publisher: Packt
ISBN-13: 9781838826048
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image