Parse strings with Regular Expressions
Regular Expressions (commonly abbreviated as regex) are commonly used for lexical analysis and pattern-matching on streams of text. They are common in Unix text-processing utilities, such as grep
, awk
, and sed
, and are an integral part of the Perl language. There are a few common variations in the syntax. A POSIX standard was approved in 1992, while other common variations include Perl and ECMAScript (JavaScript) dialects. The C++ regex
library defaults to the ECMAScript dialect.
The regex
library was first introduced to the STL with C++11. It can be very useful for finding patterns in text files.
To learn more about Regular Expression syntax and usage, I recommend the book, Mastering Regular Expressions by Jeffrey Friedl.
How to do it…
For this recipe, we will extract hyperlinks from an HTML file. A hyperlink is coded in HTML like this:
<a href="http://example.com/file.html">Text goes here</a>
We...