Web pages are simply text with HTML tags, JavaScript, and CSS. The HTML tags define the content of the web page, which we can parse for specific content. Bash scripts can parse web pages. An HTML file can be viewed in a web browser to see it properly formatted or processed with tools described in the previous chapter.
Parsing a text document is simpler than parsing HTML data because we aren't required to strip off the HTML tags. Lynx is a command-line web browser that downloads a web page as plain text.