You're reading from Python Web Scraping Hands-on data scraping and crawling using PyQT, Selnium, HTML and Python

Product type Paperback

Published in May 2017

Publisher

ISBN-13 9781786462589

Length 220 pages

Edition 2nd Edition

Languages

HTML

Tools

PyQt

Concepts

Data Mining

Author (1):

Katharine Jarmul

View More author details

Table of Contents (10) Chapters

Preface

1. Introduction to Web Scraping FREE CHAPTER

2. Scraping the Data

3. Caching Downloads

4. Concurrent Downloading

5. Dynamic Content

6. Interacting with Forms

7. Solving CAPTCHA

8. Scrapy

9. Putting It All Together

Reverse engineering a dynamic web page

So far, we tried to scrape data from a web page the same way as introduced in Chapter 2, Scraping the Data. This method did not work because the data is loaded dynamically using JavaScript. To scrape this data, we need to understand how the web page loads the data, a process which can be described as reverse engineering. Continuing the example from the preceding section, in our browser tools, if we click on the Network tab and then perform a search, we will see all of the requests made for a given page. There are a lot! If we scroll up through the requests, we see mainly photos (from loading country flags), and then we notice one with an interesting name: search.json with a path of /ajax:

If we click on that URL using Chrome, we can see more details (there is similar functionality for this in all major browsers, so your view may vary; however the main features should function...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (1)

Jarmul

Katharine Jarmul is a data scientist and Pythonista based in Berlin, Germany. She runs a data science consulting company, Kjamistan, that provides services such as data extraction, acquisition, and modelling for small and large companies. She has been writing Python since 2008 and scraping the web with Python since 2010, and has worked at both small and large start-ups who use web scraping for data analysis and machine learning. When she's not scraping the web, you can follow her thoughts and activities via Twitter (@kjam)

See other products by Jarmul