Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from Hands-On Web Scraping with Python Perform advanced scraping operations using various Python libraries and tools such as Selenium, Regex, and others

Product type Paperback

Published in Jul 2019

Publisher Packt

ISBN-13 9781789533392

Length 350 pages

Edition 1st Edition

Languages

Python

Tools

Selenium

Concepts

Data Analysis

Author (1):

Anish Chapagain

View More author details

Table of Contents (16) Chapters

Preface

1. Section 1: Introduction to Web Scraping FREE CHAPTER

2. Web Scraping Fundamentals

3. Section 2: Beginning Web Scraping

4. Python and the Web – Using urllib and Requests

5. Using LXML, XPath, and CSS Selectors

6. Scraping Using pyquery – a Python Library

7. Web Scraping Using Scrapy and Beautiful Soup

8. Section 3: Advanced Concepts

9. Working with Secure Web

10. Data Extraction Using Web-Based APIs

11. Using Selenium to Scrape the Web

12. Using Regex to Extract Data

13. Section 4: Conclusion

14. Next Steps

15. Other Books You May Enjoy

Leave a review - let other readers know what you think

URL handling and operations with urllib and requests

For our primary motive of extracting data from a web page, it's necessary to work with URLs. In the examples we've seen so far, we have noticed some pretty simple URLs being used with Python to communicate with their source or contents. The web scraping process often requires the use of different URLs from various domains that do not exist in the same format or pattern.

Developers might also face many cases where there will be a requirement for URL manipulation (altering, cleaning) to access the resource quickly and conveniently. URL handling and operations are used to set up, alter query parameters, or clean up unnecessary parameters. It also passes the required request headers with the appropriate values and identification of the proper HTTP method for making requests. There will be many cases where you will find...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (1)

Chapagain

Anish Chapagain is a software engineer with a passion for data science, and artificial intelligence, its processes and Python programming, which began around 2007. He has been working with web scraping, data analysis, visualization and reporting-related tasks, projects for more than 10 years, and is also working as freelancer. Anish previously worked as a trainer, web/software developer, team leader, and as a banker, where he was exposed to data and gained further insights into topics like data mining, data analysis, reporting, information processing and knowledge discovery. He has an MSc in computer systems from Bangor University (United Kingdom), and an Executive MBA from Himalayan Whitehouse International College, Kathmandu, Nepal.

See other products by Chapagain