Information gathering of a website from SmartWhois by the parser BeautifulSoup
Consider a situation where you want to glean all the hyperlinks from the webpage. In this section, we will do this by programming. On the other hand, this can also be done manually by viewing the view source of the web page. However this will take some time.
So let's get acquainted with a very beautiful parser called BeautifulSoup. This parser is from a third-party source and is very easy to work with. In our code, we will use version 4 of BeautifulSoup.
The requirement is the title of the HTML page and hyperlinks.
The code is as follows:
import urllib from bs4 import BeautifulSoup url = raw_input("Enter the URL ") ht= urllib.urlopen(url) html_page = ht.read() b_object = BeautifulSoup(html_page) print b_object.title print b_object.title.text for link in b_object.find_all('a'): print(link.get('href'))
The from bs4 import BeautifulSoup
statement is used to import the BeautifulSoup library. The url
variable stores the...