Common pitfalls and ways to avoid them while writing regular expressions
Let's discuss some common mistakes people make while building regular expressions to solve various problems.
Do not forget to escape regex metacharacters outside a character class
You learned that all the special metacharacters, such as *
, +
, ?
, .
, |
, (
, )
, [
, {
, ^
, $
, and so on, need to be escaped if the intent is to match them literally. I often see cases where programmers leave them unescaped, thus giving a totally different meaning to the regular expression. The Java regex API that we discussed in Chapter 5, Introduction to Java Regular Expressions APIs - Pattern and Matcher Classes, throws a non-checked exception if a regex pattern is wrongly formatted and cannot be compiled.
Avoid escaping every non-word character
Some programmers overdo escaping, thinking that they need to escape every non-word character such as colon, hyphen, semicolon, forward slash, and whitespace, which is not correct. They end up writing a regular...