Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Free Learning

You're reading from Hands-On Web Scraping with Python Perform advanced scraping operations using various Python libraries and tools such as Selenium, Regex, and others

Product type Paperback

Published in Jul 2019

Publisher Packt

ISBN-13 9781789533392

Length 350 pages

Edition 1st Edition

Languages

Python

Tools

Selenium

Concepts

Data Analysis

Author (1):

Anish Chapagain

View More author details

Table of Contents (16) Chapters

Preface

1. Section 1: Introduction to Web Scraping FREE CHAPTER

2. Web Scraping Fundamentals

3. Section 2: Beginning Web Scraping

4. Python and the Web – Using urllib and Requests

5. Using LXML, XPath, and CSS Selectors

6. Scraping Using pyquery – a Python Library

7. Web Scraping Using Scrapy and Beautiful Soup

8. Section 3: Advanced Concepts

9. Working with Secure Web

10. Data Extraction Using Web-Based APIs

11. Using Selenium to Scrape the Web

12. Using Regex to Extract Data

13. Section 4: Conclusion

14. Next Steps

15. Other Books You May Enjoy

Leave a review - let other readers know what you think

Managing scraped data

In this section, we will explore some tools and learn more about handling and managing the data that we have scraped or extracted from certain websites.

Data that's collected from websites using scraping scripts is known as raw data. This data might require some additional tasks to be performed on top of it before it can be processed further so that we can gain an insight on it. Therefore, raw data should be verified and processed (if required), which can be done by doing the following:

Cleaning: As the name suggests, this step is used to remove unwanted pieces of information, such as space and whitespace characters, and unwanted portions of text. The following code shows some relevant steps that were used in examples in previous chapters, such as Chapter 9, Using Regex to Extract Data, and Chapter 3, Using LXML, XPath, and CSS Selectors. Functions...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (1)

Chapagain

Anish Chapagain is a software engineer with a passion for data science, and artificial intelligence, its processes and Python programming, which began around 2007. He has been working with web scraping, data analysis, visualization and reporting-related tasks, projects for more than 10 years, and is also working as freelancer. Anish previously worked as a trainer, web/software developer, team leader, and as a banker, where he was exposed to data and gained further insights into topics like data mining, data analysis, reporting, information processing and knowledge discovery. He has an MSc in computer systems from Bangor University (United Kingdom), and an Executive MBA from Himalayan Whitehouse International College, Kathmandu, Nepal.

See other products by Chapagain