Breaking text into sentences is difficult for a number of reasons:
- Punctuation is frequently ambiguous
- Abbreviations often contain periods
- Sentences may be embedded within each other by the use of quotes
- With more specialized text, such as tweets and chat sessions, we may
need to consider the use of new lines or the completion of clauses
Punctuation ambiguity is best illustrated by the period. It is frequently used to demark the end of a sentence. However, it can be used in a number of other contexts as well, including abbreviations, numbers, email addresses, and ellipses. Other punctuation characters, such as question and exclamation marks, are also used in embedded quotes and specialized text, such as code that may be in a document.
Periods are used in a number of situations:
- To terminate a sentence
- To end an abbreviation
- To end an abbreviation and...