So far, we have learned about web-development technologies, data-finding techniques, and accessing web content using the Python programming language.
Web-based content exists in parts or elements using some predefined document expressions. Analyzing these parts for patterns is a major task for processing convenient scraping. Elements can be searched and identified with XPath and CSS selectors that are processed with scraping logic for required content. lxml will be used to process elements inside markup documents. We will be using browser-based development tools for content reading and element identification.
In this chapter, we will learn the following:
- Introduction to XPath and CSS selectors
- Using browser developer tools
- Learning and scraping using the Python lxml library