Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from Python Feature Engineering Cookbook Over 70 recipes for creating, engineering, and transforming features to build machine learning models

Product type Paperback

Published in Jan 2020

Publisher Packt

ISBN-13 9781789806311

Length 372 pages

Edition 1st Edition

Languages

Python

Tools

NumPy

Concepts

Machine Learning

Author (1):

Soledad Galli

View More author details

Table of Contents (13) Chapters

Preface

1. Foreseeing Variable Problems When Building ML Models

2. Imputing Missing Data FREE CHAPTER

3. Encoding Categorical Variables

4. Transforming Numerical Variables

5. Performing Variable Discretization

6. Working with Outliers

7. Deriving Features from Dates and Time Variables

8. Performing Feature Scaling

9. Applying Mathematical Computations to Features

10. Creating Features with Transactional and Time Series Data

11. Extracting Features from Text Variables

12. Other Books You May Enjoy

Leave a review - let other readers know what you think

Counting characters, words, and vocabulary

One of the salient characteristics of text is its complexity. Long descriptions are more likely to contain more information than short descriptions. Texts rich in different, unique words are more likely to be richer in detail than texts that repeat the same words over and over. In the same way, when we speak, we use many short words such as articles and prepositions to build the sentence structure, yet the main concept is often derived by the nouns and adjectives we use, which tend to be longer words. So, as you can see, even without reading the text, we can start inferring how much information the text provides by determining the number of words, the number of unique words, the lexical diversity, and the length of those words. In this recipe, we will learn how to extract these features from a text variable using pandas.

...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (1)

Galli

Soledad Galli is a bestselling data science instructor, author, and open-source Python developer. As the leading instructor at Train in Data, she teaches intermediate and advanced courses in machine learning that have enrolled over 64,000 students worldwide and continue to receive positive reviews. Sole is also the developer and maintainer of the Python open-source library Feature-engine, which provides an extensive array of methods for feature engineering and selection. With extensive experience as a data scientist in finance and insurance sectors, Sole has developed and deployed machine learning models for assessing insurance claims, evaluating credit risk, and preventing fraud. She is a frequent speaker at podcasts, meetups, and webinars, sharing her expertise with the broader data science community.

See other products by Galli