Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from Learn Python by Building Data Science Applications A fun, project-based guide to learning Python 3 while building real-world apps

Product type Paperback

Published in Aug 2019

Publisher Packt

ISBN-13 9781789535365

Length 482 pages

Edition 1st Edition

Languages

Python

Tools

Pygame

Concepts

Application Development

Authors (2):

Philipp Kats

David Katz

View More author details

Table of Contents (26) Chapters

Preface

1. Section 1: Getting Started with Python

2. Preparing the Workspace FREE CHAPTER

3. First Steps in Coding - Variables and Data Types

4. Functions

5. Data Structures

6. Loops and Other Compound Statements

7. First Script – Geocoding with Web APIs

8. Scraping Data from the Web with Beautiful Soup 4

9. Simulation with Classes and Inheritance

10. Shell, Git, Conda, and More – at Your Command

11. Section 2: Hands-On with Data

12. Python for Data Applications

13. Data Cleaning and Manipulation

14. Data Exploration and Visualization

15. Training a Machine Learning Model

16. Improving Your Model – Pipelines and Experiments

17. Section 3: Moving to Production

18. Packaging and Testing with Poetry and PyTest

19. Data Pipelines with Luigi

20. Let's Build a Dashboard

21. Serving Models with a RESTful API

22. Serverless API Using Chalice

23. Best Practices and Python Performance

24. Assessments

25. Other Books You May Enjoy

Leave a review - let other readers know what you think

Scraping Data from the Web with Beautiful Soup 4

In the previous chapter, we wrote a piece of code that communicates with the Nominatim web service in order to collect information. Frequently, however, there is no API in place, and data could be scattered throughout hundreds of web pages, or, even worse, files with a complex structure (PDFs). In this chapter, we'll explore another data collection path—scraping raw HTML pages. In order to do so, we will use another library, Beautiful Soup 4, which can parse raw HTML files into objects, and help us to sift through them, extracting bits of information. Using this tool, we will collect a relatively large dataset of historic battles of World War II, which we will, in the chapters to come, process, clean, and analyze.

In this chapter, we will cover the following topics:

When there is no API
Scraping WWII battles
Beyond...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (2)

Kats

Philipp Kats is a researcher at the Urban Complexity Lab, NYU CUSP, a research fellow at Kazan Federal University, and a data scientist at StreetEasy, with many years of experience in software development. His interests include data analysis, urban studies, data journalism, and visualization. Having a bachelor's degree in architectural design and a having followed the rocky path (at first) of being a self-taught developer, Philipp knows the pain points of learning programming and is eager to share his experience.

See other products by Kats

Katz

Doron Katz, originally from Sydney, Australia, completed his bachelor's in internet science (the University of Wollongong), before pursuing a master's in management (Charles Sturt University), including a certification in Microsoft Solutions Development. On moving to San Francisco, Doron has worked with various companies, from start-ups to larger organizations, as a software engineer and project manager. Additionally, he is a regular contributor to various distinguished technical publications and has published numerous white papers and also coauthored Developing an iOS Edge.

See other products by Katz