Most programmers are familiar with the simplest way of processing natural language: regular expressions. There are many regular expression implementations for different programming languages that differ in small details. Because of these details, the same regular expression on various platforms can produce different results or not work at all. The two most popular standards are POSIX and Perl. The Foundation framework, however, contains its own version of regular expressions, based on the ICU C++ library. It is an extension of the POSIX standard for Unicode strings.
Why are we even talking about regular expressions here? Regular expressions are a great example of what NLP specialists call heuristics—manually written rules, ad hoc solutions, and describing a complex structure in such a way that all exceptions and variations...