Parsing RSS and Atom feeds
Really Simple Syndication (RSS) and Atom feeds (refer to http://en.wikipedia.org/wiki/RSS) are often used for blogs and news. These type of feeds follow the publish/subscribe model. For instance, Packt Publishing has an RSS feed with article and book announcements. We can subscribe to the feed to get timely updates. The Python feedparser module allows us to parse RSS and Atom feeds easily without dealing with a lot of technical details. The feedparser module can be installed with pip
as follows:
$ sudo pip install feedparser $ pip freeze|grep feedparser feedparser==5.1.3
After parsing an RSS file, we can access the underlying data using a dotted notation. Parse the Packt Publishing RSS feed and print the number of entries:
import feedparser as fp rss = fp.parse("http://www.packtpub.com/rss.xml") print "# Entries", len(rss.entries)
The number of entries is printed (the number may vary for each program run):
# Entries 50
Print entry titles and summaries if the entry...