XPath is a query language for selecting nodes from an XML document and is a must-learn query language for anyone performing web scraping. XPath offers a number of benefits to its user over other model-based tools:
- Can easily navigate through the DOM tree
- More sophisticated and powerful than other selectors like CSS selectors and regular expressions
- It has a great set (200+) of built-in functions and is extensible with custom functions
- It is widely supported by parsing libraries and scraping platforms
XPath contains seven data models (we have seen some of them previously):
- root node (top level parent node)
- element nodes (<a>..</a>)
- attribute nodes (href="example.html")
- text nodes ("this is a text")
- comment nodes (<!-- a comment -->)
- namespace nodes
- processing instruction nodes
XPath expressions can...