Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Polars Cookbook

You're reading from   Polars Cookbook Over 60 practical recipes to transform, manipulate, and analyze your data using Python Polars 1.x

Arrow left icon
Product type Paperback
Published in Aug 2024
Publisher Packt
ISBN-13 9781805121152
Length 394 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Yuki Kakegawa Yuki Kakegawa
Author Profile Icon Yuki Kakegawa
Yuki Kakegawa
Arrow right icon
View More author details
Toc

Table of Contents (15) Chapters Close

Preface 1. Chapter 1: Getting Started with Python Polars FREE CHAPTER 2. Chapter 2: Reading and Writing Files 3. Chapter 3: An Introduction to Data Analysis in Python Polars 4. Chapter 4: Data Transformation Techniques 5. Chapter 5: Handling Missing Data 6. Chapter 6: Performing String Manipulations 7. Chapter 7: Working with Nested Data Structures 8. Chapter 8: Reshaping and Tidying Data 9. Chapter 9: Time Series Analysis 10. Chapter 10: Interoperability with Other Python Libraries 11. Chapter 11: Working with Common Cloud Data Sources 12. Chapter 12: Testing and Debugging in Polars 13. Index 14. Other Books You May Enjoy

Using SQL for data transformations

SQL is an essential tool in data analysis and transformations. Many data pipelines you see at workplaces are written in SQL and most data professionals are accustomed to using SQL for analytics work.

The good news is that you can use SQL in Python Polars as well. This opens the door for those who might not be as familiar with a DataFrame library. In this recipe, we’ll cover how to configure Polars to use SQL and how you can implement simple SQL queries such as aggregations.

Getting ready

We’ll use the Contoso dataset for this recipe as well. Run the following code to read the dataset:

df = pl.read_csv('../data/contoso_sales.csv', try_parse_dates=True)

How to do it…

Here’s how to use SQL in Polars:

  1. Define the SQL context and register your DataFrame:
    ctx = pl.SQLContext(eager=True)
    ctx.register('df', df)
  2. Create a simple query and execute it:
    ctx.execute(
        """
          select
            `Customer Name`,
            Brand,
            Category
          from df limit 5
        """
    )

    The preceding code will return the following output:

Figure 4.38 – The result of a SQL query

Figure 4.38 – The result of a SQL query

  1. Apply a group by aggregation, showing the top five average quantities by brand:
    ctx.execute(
        """
          select
            Brand,
            avg(Quantity) as `Avg Quantity`
          from df
          group by
            Brand
          order by
            `Avg Quantity` desc
          limit 5
        """
    )

    The preceding code will return the following output:

Figure 4.39 – Top five average quantities by brand

Figure 4.39 – Top five average quantities by brand

  1. Run a SQL query in one go, showing the first five rows for a few selected columns using a LazyFrame:
    pl.SQLContext(lf=df.lazy()).execute(
        """
            select
                Brand,
                Category
            from lf
            limit 5
        """
    ).collect()

    The preceding code will return the following output:

Figure 4.40 – The first five rows with the Brand and Category columns selected

Figure 4.40 – The first five rows with the Brand and Category columns selected

How it works…

By default, your SQL query returns a LazyFrame as its output. You can specify to return a DataFrame by specifying a parameter either when you define your SQL context or execute your query.

Also, know that when your column name contains a space, you need to use a special character, ` (a backtick or grave accent).

One thing to keep in mind when using SQL in Polars is that, as of this writing, not all data operations are available such as rank and row number. I suggest you read Polars’ documentation to learn more about what Polars SQL can and cannot do: https://pola-rs.github.io/polars/user-guide/sql/intro/.

Tip

You can use a library such as DuckDB to run SQL queries on top of Polars’ DataFrames. It basically overcomes the limitation of available SQL transformations in Polars. We’ll cover how to use other Python libraries in combination with Polars in Chapter 10, Interoperability with Other Python Libraries.

See also

Please refer to these resources to learn more about using SQL in Polars:

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image