Congratulations, you have successfully walked through the foundations of Natural Language Processing (NLP), along with key features that are available when working with unstructured data. We explored the Natural Language Toolkit(NLTK) Python library, which offers many options to work with free text by downloading different corpora to analyze large bodies of text. We learned how to split raw text into meaningful units called tokens so it can be interpreted and refined. We learned about regex and pattern matching using words as it applies to NLP. We also explored how to count the frequency of words in a collection of text using probability and statistical modules. Next, we learned how to normalize words using stemming and lemmatization functions, which shows how variations in words can impact your data analysis. We explained the concepts of n-grams and how to use stopwords to remove the noise that is common when working with large bodies of free text data.
In the...