Reading data from URLs
Files can be downloaded and stored locally on your machine, or stored on a remote server or cloud location. In the earlier two recipes, Reading from CSVs and other delimited files, and Reading data from an Excel file, both files were stored locally.
Many of the pandas reader functions can read data from remote locations by passing a URL path. For example, read_csv()
and read_excel()
can take a URL to read a file accessible via the internet. In this recipe, you will read a CSV file using pandas.read_csv()
and Excel files using pandas.read_excel()
from remote locations, such as GitHub and AWS S3 (private and public buckets). You will also read data directly from an HTML page into a pandas DataFrame.
Getting ready
You will need to install the AWS SDK for Python (Boto3) for reading files from S3 buckets. Additionally, you will learn how to use the storage_options
parameter available in many of the reader functions in pandas to read from S3 without the Boto3 library...