Microsoft Excel – finding tables in non-default locations
In the previous recipe, Microsoft Excel – basic reading/writing, we used the Microsoft Excel I/O functions without thinking about where within the worksheet our data was. By default, pandas will read from / write to the first cell on the first sheet of data, but it is not uncommon to receive Microsoft Excel files where the data you want to read is located elsewhere within the document.
For this example, we have a Microsoft Excel workbook where the very first tab, Sheet1, is used as a cover sheet:
Figure 4.1: Workbook where Sheet1 contains no useful data
The second sheet is where we have useful information:
Figure 4.2: Workbook where another sheet has relevant data
How to do it
To still be able to read this data, you can use a combination of the sheet_name=
, skiprows=
, and usecols=
arguments to pd.read_excel
:
pd.read_excel(
"data/beatles.xlsx",
dtype_backend...