Parsing RSS and Atom feeds
Really Simple Syndication (RSS) and Atom feeds (refer to http://en.wikipedia.org/wiki/RSS) are often used for blogs and news. These types of feeds follow the publish/subscribe model. For instance, Packt Publishing has an RSS feed with article and book announcements. We can subscribe to the feed to get timely updates. The Python feedparser
module allows us to parse RSS and Atom feeds easily without dealing with a lot of technical details. The feedparser
module can be installed with pip
as follows:
$ pip3 install feedparser
After parsing an RSS file, we can access the underlying data using a dotted notation. Parse the Packt Publishing RSS feed and print the number of entries:
import feedparser as fp rss = fp.parse("http://www.packtpub.com/rss.xml") print("# Entries", len(rss.entries))
The number of entries is printed (the number may vary for each program run):
# Entries 10
Print entry titles and summaries if the entry contains the...