Feature Extraction from Texts
Let's understand feature extraction with real-life examples. Features represent the characteristics of a person or a thing. These characteristics may or may not uniquely represent a person or a thing. For instance, the general characteristics that a person possesses, such as the number of ears, hands, and legs, are generally not enough to identify that person uniquely. But characteristics such as fingerprints and DNA sequences can be used to recognize that person distinctly. Similarly, in feature extraction, we try to extract attributes from texts that represent those texts uniquely. These attributes are called features. Machine learning algorithms take only numeric features as input. So, it is of utmost importance to represent texts as numeric features. When dealing with texts, we extract both general and specific features. Sometimes, individual words constituting texts do not affect some features directly, such as the language of the text and the...