Dividing text into sentences
When we work with text, we can work with text units on different scales: the document itself, such as a newspaper article, the paragraph, the sentence, or the word. Sentences are the main unit of processing in many NLP tasks. For example, when we send data over to Large Language Models (LLMs), we frequently want to add some context to the prompt. In some cases, we would like that context to include sentences from a text so that the model can extract some important information from that text. In this section, we will show you how to divide a text into sentences.
Getting ready
For this part, we will be using the text of the book The Adventures of Sherlock Holmes. You can find the whole text in the book’s GitHub file (https://github.com/PacktPublishing/Python-Natural-Language-Processing-Cookbook-Second-Edition/blob/main/data/sherlock_holmes.txt). For this recipe we will need just the beginning of the book, which can be found in the file at https...