Consider a situation where you want to glean all the hyperlinks from a web page. In this section, we will do this by programming. On the other hand, this can also be done manually by viewing the source of the web page. However, that will take some time.
So let's get acquainted with a very beautiful parser called lxml.
Let's see the code:
- The following modules will be used:
from lxml.html import fromstring
import requests
- When you enter the desired website, the request module obtains the data of the website:
domain = raw_input("Enter the domain : ")
url = 'http://whois.domaintools.com/'+domain
user_agent='wswp'
headers = {'User-Agent': user_agent}
resp = requests.get(url, headers=headers)
html = resp.text
- The following piece...