Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Newsletter Hub

Free Learning

You're reading from Learning pandas High performance data manipulation and analysis using Python

Product type Paperback

Published in Jun 2017

Publisher

ISBN-13 9781787123137

Length 446 pages

Edition 2nd Edition

Languages

Python

Tools

Pandas

Concepts

Data Analysis

Author (1):

Michael Heydt

View More author details

Table of Contents (16) Chapters

Preface

1. pandas and Data Analysis FREE CHAPTER

2. Up and Running with pandas

3. Representing Univariate Data with the Series

4. Representing Tabular and Multivariate Data with the DataFrame

5. Manipulating DataFrame Structure

6. Indexing Data

7. Categorical Data

8. Numerical and Statistical Methods

9. Accessing Data

10. Tidying Up Your Data

11. Combining, Relating, and Reshaping Data

12. Data Aggregation

13. Time-Series Modelling

14. Visualization

15. Historical Stock Price Analysis

Reading HTML data from the web

Pandas has support for reading data from HTML files (or HTML from URLs). Underneath the covers, pandas makes use of the LXML, Html5Lib, and BeautifulSoup4 packages. These packages provide some impressive capabilities for reading and writing HTML tables.

Your default installation of Anaconda may not include these packages. If you get errors using this function, install the appropriate library based on the error, using the Anaconda Navigator:

Else, you can use pip:

The pd.read_html() function will read HTML from a file (or URL) and parse all HTML tables found in the content into one or more pandas DataFrame objects. The function always returns a list of DataFrame objects (actually, zero or more, depending on the number of tables found in the HTML).

To demonstrate, we will read table data from the FDIC failed bank list, located at https://www.fdic...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (1)

Michael Heydt

Michael Heydt is an independent consultant, programmer, educator, and trainer. He has a passion for learning and sharing his knowledge of new technologies. Michael has worked in multiple industry verticals, including media, finance, energy, and healthcare. Over the last decade, he worked extensively with web, cloud, and mobile technologies and managed user experiences, interface design, and data visualization for major consulting firms and their clients. Michael's current company, Seamless Thingies , focuses on IoT development and connecting everything with everything. Michael is the author of numerous articles, papers, and books, such as D3.js By Example, Instant Lucene. NET, Learning Pandas, and Mastering Pandas for Finance, all by Packt. Michael is also a frequent speaker at .NET user groups and various mobile, cloud, and IoT conferences and delivers webinars on advanced technologies.

See other products by Michael Heydt