Interpreting HTML stream
In this recipe, we will see how an HTML code may be read and interpreted using regular expressions. We will create a program that will read an HTML stream in a string and will display the tag names along with the content of the tags. The FIND
and replace
statements are used together with a do
loop. (This recipe will focus on reading tags beginning with <tag>
and ending with <\tag>
).
How to do it...
For creating a program for interpreting HTML code, follow the steps shown in the following steps:
Declare three strings by the name
htmlstream
,tagcontents
, andtagname
.We then assign a suitable HTML code to the
htmlstream
variable.Within a do loop, a
FIND REGEX
statement is added that finds tag names and their contents. The regex used in this case for matching an HTML tag is'<(\u\w*)[^>]*>(.*)</\1>'
.Once a tag is processed, a
replace all occurrences
statement is used for replacing the tag with'$$$'
.The tag name and tag contents are printed.
Once...