What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!

Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!

50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

Thousands of reference materials covering every tech concept you need to stay up to date.

Subscribe now

View plans & pricing

Data Acquisition and Extraction

In this chapter, we will cover:

How to parse websites and navigate the DOM using BeautifulSoup
Searching the DOM with Beautiful Soup's find methods
Querying the DOM with XPath and lxml
Querying data with XPath and CSS Selectors
Using Scrapy selectors
Loading data in Unicode / UTF-8 format

Key benefits

Hands-on recipes for advancing your web scraping skills to expert level

One-stop solution guide to address complex and challenging web scraping tasks using Python

Understand web page structures and collect data from a website with ease

Description

Python Web Scraping Cookbook is a solution-focused book that will teach you techniques to develop high-performance scrapers and deal with crawlers, sitemaps, forms automation, Ajax-based sites, caches, and more. You'll explore a number of real-world scenarios where every part of the development/product life cycle will be fully covered. You will not only develop the skills needed to design and develop reliable performance data flows, but also deploy your codebase to AWS. If you are involved in software engineering, product development, or data mining (or are interested in building data-driven products), you will find this book useful as each recipe has a clear purpose and objective. Right from extracting data from the websites to writing a sophisticated web crawler, the book's independent recipes will be a godsend. This book covers Python libraries, requests, and BeautifulSoup. You will learn about crawling, web spidering, working with Ajax websites, paginated items, and more. You will also learn to tackle problems such as 403 errors, working with proxy, scraping images, and LXML. By the end of this book, you will be able to scrape websites more efficiently and able to deploy and operate your scraper in the cloud.

What you will learn

Use a variety of tools to scrape any website and data, including BeautifulSoup, Scrapy, Selenium and many more

Master expression languages, such as XPath and CSS, and regular expressions to extract web data

Deal with scraping traps such as hidden form fields, throttling, pagination, and different status codes

Build robust scraping pipelines with SQS and RabbitMQ

Scrape assets like image media and learn what to do when Scraper fails to run

Explore ETL techniques of building a customized crawler, parser, and convert structured and unstructured data from websites

Deploy and run your scraper as a service in AWS Elastic Container Service

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!

Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!

50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

Thousands of reference materials covering every tech concept you need to stay up to date.

Subscribe now

View plans & pricing

Frequently bought together

€29.99

€32.99

€32.99

Total € 95.97

Tonya Oliver Mar 26, 2018

It's probably worth the read. I don't like the fact that Amazon is forcing me to write this review with no less than 18 words. It's too bad because the book isn't being reviewed here it's Amazon. How's that for 18 words Amazon?

Amazon Verified review

John Ewers Apr 17, 2018

I bought book trying to learn how I could download a few tables from the web into python. I had 2 issues that I needed help with: 1) sites that i need data from require passwords 2) sites have javascript that needs to run before I can grab data. After 4 hours, I got absolutely nothing out of this book and went to youtube / stack overflow (w/ these tools, I figured out my problem in less time than I spent w/ this book)The book starts off by going over a few details on many different scraping libraries. There isn't enough detail to do anything useful w/ webscraping, you just become aware of the existence of this libraries. The 2nd 2/3rds of the book focus exclusively with 'scrappy'. This appears to be a good resource for crawling (finding new websites to go onto); however, not so good for scraping known sites (certainly not for beginner / intermediate python users). If you want to go crawling, this may be a good book for you. I was stunned that reading HTML behind the sites you want to scrape was barely mentioned. This is a key element of any "how to" you can find on youtube and wo a lot of html experience, one of the more challenging parts of scraping.One of my biggest issues was w/ passwords. Book only offered 1/2 a page on this w/ an extremely simple example. Solution did not work on any of the 3 sites I tried it on. Also, I coudl not find 1 mention of what to do w/ javascript.Overall, useless book for me

Patrick Klein Jan 08, 2022

This "book" feels like a collection of Stack Overflow answers to very basic topics with the added disadvantages that it's harder to navigate and you can't just copy-paste.I'm sending this one back.

What you are looking for	Example
All tags	`*`
A specific tag (that is, `tr`)	`.planet`
A class name (that is, `"planet"`)	`tr.planet`
A tag with an `ID "planet3"`	`tr#planet3`
A child `tr` of a table	`table tr`
A descendant `tr` of a table	`table tr`
A tag with an attribute (that is, `tr` with `id="planet4"`)	`a[id=Mars]`

Python Web Scraping Cookbook: Over 90 proven recipes to get you scraping with Python, microservices, Docker, and AWS

What do you get with a Packt Subscription?

Python Web Scraping Cookbook

Data Acquisition and Extraction

Introduction

How to parse websites and navigate the DOM using BeautifulSoup

Getting ready

Searching the DOM with Beautiful Soup's find methods

Getting ready

How to do it...

Querying the DOM with XPath and lxml

Querying data with XPath and CSS selectors

Getting ready

Using Scrapy selectors

Getting ready

How to do it...

Loading data in unicode / UTF-8

Getting ready

Page 1 of 8

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with a Packt Subscription?

Product Details

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

People who bought this also bought

About the author

FAQs

Python Web Scraping Cookbook: Over 90 proven recipes to get you scraping with Python, microservices, Docker, and AWS

What do you get with a Packt Subscription?

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with a Packt Subscription?

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

People who bought this also bought

About the author

FAQs