Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Extending Excel with Python and R

You're reading from   Extending Excel with Python and R Unlock the potential of analytics languages for advanced data manipulation and visualization

Arrow left icon
Product type Paperback
Published in Apr 2024
Publisher Packt
ISBN-13 9781804610695
Length 344 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Authors (2):
Arrow left icon
Steven Sanderson Steven Sanderson
Author Profile Icon Steven Sanderson
Steven Sanderson
David Kun David Kun
Author Profile Icon David Kun
David Kun
Arrow right icon
View More author details
Toc

Table of Contents (20) Chapters Close

Preface 1. Part 1:The Basics – Reading and Writing Excel Files from R and Python
2. Chapter 1: Reading Excel Spreadsheets FREE CHAPTER 3. Chapter 2: Writing Excel Spreadsheets 4. Chapter 3: Executing VBA Code from R and Python 5. Chapter 4: Automating Further – Task Scheduling and Email 6. Part 2: Making It Pretty – Formatting, Graphs, and More
7. Chapter 5: Formatting Your Excel Sheet 8. Chapter 6: Inserting ggplot2/matplotlib Graphs 9. Chapter 7: Pivot Tables and Summary Tables 10. Part 3: EDA, Statistical Analysis, and Time Series Analysis
11. Chapter 8: Exploratory Data Analysis with R and Python 12. Chapter 9: Statistical Analysis: Linear and Logistic Regression 13. Chapter 10: Time Series Analysis: Statistics, Plots, and Forecasting 14. Part 4: The Other Way Around – Calling R and Python from Excel
15. Chapter 11: Calling R/Python Locally from Excel Directly or via an API 16. Part 5: Data Analysis and Visualization with R and Python for Excel Data – A Case Study
17. Chapter 12: Data Analysis and Visualization with R and Python in Excel – A Case Study 18. Index 19. Other Books You May Enjoy

Opening an Excel sheet from Python and reading the data

When working with Excel files in Python, it’s common to need to open a specific sheet and read the data into Python for further analysis. This can be achieved using popular libraries such as pandas and openpyxl, as discussed in the previous section.

You can most likely use other Python and package versions, but the code in this section has not been tested with anything other than what we’ve stated here.

Using pandas

pandas is a powerful data manipulation library that simplifies the process of working with structured data, including Excel spreadsheets. To read an Excel sheet using pandas, you can use the read_excel function. Let’s consider an example of using the iris_data.xlsx file with a sheet named setosa:

import pandas as pd
# Read the Excel file
df = pd.read_excel('iris_data.xlsx', sheet_name='setosa')
# Display the first few rows of the DataFrame
print(df.head())

You will need to run this code either with the Python working directory set to the location where the Excel file is located, or you will need to provide the full path to the file in the read_excel() command:

Figure 1.6 – Using the pandas package to read the Excel file

Figure 1.6 – Using the pandas package to read the Excel file

In the preceding code snippet, we imported the pandas library and utilized the read_excel function to read setosa from the iris_data.xlsx file. The resulting data is stored in a pandas DataFrame, which provides a tabular representation of the data. By calling head() on the DataFrame, we displayed the first few rows of the data, giving us a quick preview.

Using openpyxl

openpyxl is a powerful library for working with Excel files, offering more granular control over individual cells and sheets. To open an Excel sheet and access its data using openpyxl, we can utilize the load_workbook function. Please note that openpyxl cannot handle .xls files, only the more modern .xlsx and .xlsm versions.

Let’s consider an example of using the iris_data.xlsx file with a sheet named versicolor:

import openpyxl
import pandas as pd
# Load the workbook
wb = openpyxl.load_workbook('iris_data.xlsx')
# Select the sheet
sheet = wb['versicolor']
# Extract the values (including header)
sheet_data_raw = sheet.values
# Separate the headers into a variable
header = next(sheet_data_raw)[0:]
# Create a DataFrame based on the second and subsequent lines of data with the header as column names
sheet_data = pd.DataFrame(sheet_data_raw, columns=header)
print(sheet_data.head())

The preceding code results in the following output:

Figure 1.7 – Using the openpyxl package to read the Excel file

Figure 1.7 – Using the openpyxl package to read the Excel file

In this code snippet, we import the load_workbook function from the openpyxl library. Then, we load the workbook by providing the iris_data.xlsx filename. Next, we select the desired sheet by accessing it using its name – in this case, this is versicolor. Once we’ve done this, we read the raw data using the values property of the loaded sheet object. This is a generator and can be accessed via a for cycle or by converting it into a list or a DataFrame, for example. In this example, we have converted it into a pandas DataFrame because it is the format that is the most comfortable to work with later.

Both pandas and openpyxl offer valuable features for working with Excel files in Python. While pandas simplifies data manipulation with its DataFrame structure, openpyxl provides more fine-grained control over individual cells and sheets. Depending on your specific requirements, you can choose the library that best suits your needs.

By mastering the techniques of opening Excel sheets and reading data into Python, you will be able to extract valuable insights from your Excel data, perform various data transformations, and prepare it for further analysis or visualization. These skills are essential for anyone seeking to leverage the power of Python and Excel in their data-driven workflows.

You have been reading a chapter from
Extending Excel with Python and R
Published in: Apr 2024
Publisher: Packt
ISBN-13: 9781804610695
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime