Summary
In this chapter, we learned how computers understand human language. We first learned what RegEx is and how it helps data scientists analyze and clean text data. Next, we learned about stop words, what they are, and why they are removed from the data to reduce the dimensionality. Next, we next learned about sentence tokenization and its importance, followed by word embedding. Embedding is a topic that we covered in Chapter 5: Mastering Structured Data; here, we learned how to create word embedding to boost our NLP model's performance. To create better models, we looked at a RNNs, a special type of neural network that retains memory of past inputs. Finally, we learned about LSTM cells and how they are better than normal RNN cells.
Now that you have completed this chapter, you are capable of handling textual data and creating machine learning models for NLP. In the next chapter, you will learn how to make models faster using transfer learning and a few tricks of the craft.