Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Machine Learning for OpenCV

You're reading from   Machine Learning for OpenCV Intelligent image processing with Python

Arrow left icon
Product type Paperback
Published in Jul 2017
Publisher Packt
ISBN-13 9781783980284
Length 382 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Authors (2):
Arrow left icon
Michael Beyeler Michael Beyeler
Author Profile Icon Michael Beyeler
Michael Beyeler
Michael Beyeler (USD) Michael Beyeler (USD)
Author Profile Icon Michael Beyeler (USD)
Michael Beyeler (USD)
Arrow right icon
View More author details
Toc

Table of Contents (13) Chapters Close

Preface 1. A Taste of Machine Learning FREE CHAPTER 2. Working with Data in OpenCV and Python 3. First Steps in Supervised Learning 4. Representing Data and Engineering Features 5. Using Decision Trees to Make a Medical Diagnosis 6. Detecting Pedestrians with Support Vector Machines 7. Implementing a Spam Filter with Bayesian Learning 8. Discovering Hidden Structures with Unsupervised Learning 9. Using Deep Learning to Classify Handwritten Digits 10. Combining Different Algorithms into an Ensemble 11. Selecting the Right Model with Hyperparameter Tuning 12. Wrapping Up

Representing text features

Similar to categorical features, scikit-learn offers an easy way to encode another common feature type, text features. When working with text features, it is often convenient to encode individual words or phrases as numerical values.

Let's consider a dataset that contains a small corpus of text phrases:

In [1]: sample = [
... 'feature engineering',
... 'feature selection',
... 'feature extraction'
... ]

One of the simplest methods of encoding such data is by word count; for each phrase, we simply count the occurrences of each word within it. In scikit-learn, this is easily done using CountVectorizer, which functions akin to DictVectorizer:

In [2]: from sklearn.feature_extraction.text import CountVectorizer
... vec = CountVectorizer()
... X = vec.fit_transform(sample)
... X
Out[2]: <3x4...
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image