Preparing text data
Text data usually must go through some processing before it can be effectively analyzed. This is because text data is often messy. It could contain irrelevant information and sometimes isn’t in a structure that can be easily analyzed. Some common steps for preparing text data include the following:
- Expanding contractions: A contraction is a shortened version of a word. It is formed by removing some letters and replacing them with apostrophes. Examples include don’t instead of do not, and would’ve instead of would have. Usually, when preparing text data, all contractions should be expanded to their original form.
- Removing punctuations: Punctuations are useful for separating sentences, clauses, or phrases. However, they are mostly not needed for text analysis because they do not convey any significant meaning.
- Converting to lowercase: Text data is typically a combination of capital and lowercase letters. However, this needs to...