Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Getting Started with Beautiful Soup

You're reading from   Getting Started with Beautiful Soup Learn how to extract information from websites using Beautiful Soup and the Python urllib2 module. This practical, hands-on guide covers everything you need to know to get a head start in website scraping.

Arrow left icon
Product type Paperback
Published in Jan 2014
Publisher Packt
ISBN-13 9781783289554
Length 130 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Vineeth G Nair Vineeth G Nair
Author Profile Icon Vineeth G Nair
Vineeth G Nair
Arrow right icon
View More author details
Toc

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "The prettify() method can be called either on a Beautiful Soup object or any of the Tag objects."

A block of code is set as follows:

html_markup = """<html>
  <body>&  &amp;  ampersand
    ¢  &cent;  cent
    ©  &copy;  copyright
    ÷  &divide;  divide
    >  &gt;  greater than
  </body>
</html>
"""
soup = BeautifulSoup(html_markup,"lxml")
print(soup.prettify())

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

UserWarning: "http://www.packtpub.com/books" looks like a URL. Beautiful Soup is not an HTTP client. You should probably use an HTTP client to get the document behind the URL, and feed that document to Beautiful Soup

Any command-line input or output is written as follows:

sudo easy_install beautifulsoup4

New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: "The output methods in Beautiful Soup escape only the HTML entities of >,<, and & as &gt;, &lt;, and &amp."

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime