Exploring regular expressions
Regular expressions are a widely used rule-based technique that is often used for recognizing fixed expressions. By fixed expressions, we mean words and phrases that are formed according to their own internal rules, which are largely different from the normal patterns of the language.
One type of fixed expression is monetary amounts. There are only a few variations in formats for monetary amounts – the number of decimal places, the symbol for the type of currency, and whether the numbers are separated by commas or periods. The application might only have a requirement to recognize specific currencies, which would simplify the rules further. Other common fixed expressions include dates, times, telephone numbers, addresses, email addresses, measurements, and numbers. Regular expressions in NLP are most frequently used in preprocessing text that will be further analyzed with other techniques.
Different programming languages have slightly different...