In this section, we will learn how to find the email addresses from a web page. In order to find the email addresses, we will use the regular expressions. The approach is very simple: first, get all the data from a given web page, then use email regular expression to obtain email addresses.
Let's see the code:
import urllib
import re
from bs4 import BeautifulSoup
url = raw_input("Enter the URL ")
ht= urllib.urlopen(url)
html_page = ht.read()
email_pattern=re.compile(r'\b[\w.-]+?@\w+?\.\w+?\b')
for match in re.findall(email_pattern,html_page ):
print match
The preceding code is very simple. The html_page variable contains all the web page data. The r'\b[\w.-]+?@\w+?\.\w+?\b' regular expression represents the email address.
Now let's see the output:
The preceding result is absolutely correct. The given URL...