Extracting features from text
In Chapter 11, Extracting Features from Text Variables, we will discuss various features that we can extract from text pieces utilizing pandas
and scikit-learn
. We can also extract multiple features from text automatically by utilizing featuretools
.
The featuretools
library supports the creation of several basic features from text as part of its default functionality, such as the number of characters, the number of words, the mean character count per word, and the median word length in a piece of text, among others.
Note
For a full list of the default text primitives, visit https://featuretools.alteryx.com/en/stable/api_reference.html#naturallanguage-transform-primitives.
In addition, there is an accompanying Python library, nlp_primitives
, which contains additional primitives to create more advanced features based on NLP. Among these functions, we find primitives for determining the diversity score, the polarity score, or the count of stop...