Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from Hands-On Web Scraping with Python Perform advanced scraping operations using various Python libraries and tools such as Selenium, Regex, and others

Product type Paperback

Published in Jul 2019

Publisher Packt

ISBN-13 9781789533392

Length 350 pages

Edition 1st Edition

Languages

Python

Tools

Selenium

Concepts

Data Analysis

Author (1):

Anish Chapagain

View More author details

Table of Contents (16) Chapters

Preface

1. Section 1: Introduction to Web Scraping FREE CHAPTER

2. Web Scraping Fundamentals

3. Section 2: Beginning Web Scraping

4. Python and the Web – Using urllib and Requests

5. Using LXML, XPath, and CSS Selectors

6. Scraping Using pyquery – a Python Library

7. Web Scraping Using Scrapy and Beautiful Soup

8. Section 3: Advanced Concepts

9. Working with Secure Web

10. Data Extraction Using Web-Based APIs

11. Using Selenium to Scrape the Web

12. Using Regex to Extract Data

13. Section 4: Conclusion

14. Next Steps

15. Other Books You May Enjoy

Leave a review - let other readers know what you think

Web Scraping Using Scrapy and Beautiful Soup

So far, we have learned about web-development technologies, data-finding techniques, and accessing various Python libraries to scrape data from the web.

In this chapter, we will be learning about and exploring two Python libraries that are popular for document parsing and scraping activities: Scrapy and Beautiful Soup.

Beautiful Soup deals with document parsing. Parsing a document is done for element traversing and extracting its content. Scrapy is a web crawling framework written in Python. It provides a project-oriented scope for web scraping. Scrapy provides plenty of built-in resources for email, selectors, items, and so on, and can be used from simple to API-based content extraction.

In this chapter, we will learn about the following:

Web scraping using Beautiful Soup
Web scraping using Scrapy
Deploying a web crawler (learning...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €18.99/month. Cancel anytime

Authors (1)

Chapagain

Anish Chapagain is a software engineer with a passion for data science, and artificial intelligence, its processes and Python programming, which began around 2007. He has been working with web scraping, data analysis, visualization and reporting-related tasks, projects for more than 10 years, and is also working as freelancer. Anish previously worked as a trainer, web/software developer, team leader, and as a banker, where he was exposed to data and gained further insights into topics like data mining, data analysis, reporting, information processing and knowledge discovery. He has an MSc in computer systems from Bangor University (United Kingdom), and an Executive MBA from Himalayan Whitehouse International College, Kathmandu, Nepal.

See other products by Chapagain