In the previous chapter, several topics were covered concerning the undertaking of DL distributed training in a Spark cluster. The concepts presented there are common to any network model. Starting from this chapter, specific use cases for RNNs or LSTMs will be looked at first, and then CNNs will be covered. This chapter starts by introducing the following core concepts of Natural Language Processing (NLP):
- Tokenizers
- Sentence segmentation
- Part-of-speech tagging
- Named entity extraction
- Chunking
- Parsing
The theory behind the concepts in the preceding list will be detailed before finally presenting two complete Scala examples of NLP, one using Apache Spark and the Stanford core NLP library, and the other using the Spark core and the Spark-nlp library (which is built on top of Apache Spark MLLib). The goal of the chapter is to make readers familiar with NLP, before moving...