Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon

Setting Up Polars for Data Analysis

Save for later
  • 7 min read
  • 23 Feb 2024

article-image

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!

This article is an excerpt from the book, Data Analysis with Polars, by Luca Zanna. Leverage Polars, the lightning-fast dataframe library, to take your Python data analysis skills to the next level

Introduction

In the ever-evolving landscape of data analysis, harnessing the right tools and methodologies can make all the difference. Welcome to a world where Polars, a powerful data manipulation library, takes center stage. This article is your gateway to unlocking the potential of Polars, and it begins by unraveling the essential components of the data analysis journey. From setting up virtual environments to simplifying data analysis in the cloud with Google Colab, we explore how Polars streamlines your path to insights. Whether you're a seasoned data analyst or just starting your journey, this guide will equip you with the knowledge and tools needed to make your data analysis endeavors efficient and rewarding. Join us as we delve into the fascinating realm of Polars and embrace a new era of data exploration.

Installation and virtual environments 

We will not go through the installation of Python as that is outside the scope of the book. A visit to python.org will give all the information necessary to install Python. 

Now on to virtual environments. 

Understanding Virtual Environments and Their Benefits 

Imagine you have built a fantastic data analysis project using Polars. Your project uses: 

  • Python 3.8
  • Polars version 0.15.1 
  • Numpy 1.23.0 

Now, you start a new project, and you want to use a newer Polars (0.16.14), along with Numpy and Arrow. So, the new project requires: 

  • Python 3.10 
  • Polars 0.16.14 
  • Numpy 1.24.0 
  • Pyarrow 11.0.0 

Upgrading Polars and Numpy libraries globally isn't a good idea. If Polars functions have changed between versions, your first project might stop working or give incorrect results with the new version. 

This is where virtual environments come in. Virtual environments create separate 'spaces' for each project: one for your first data analysis project and another for your new data pipeline project. 

You can set up a virtual environment manually or have your IDE set-up a virtual environment for you. If you decide to set it up manually, you can check out the guide at https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/#creating-a-virtual-environment

Installing and using Polars on a machine 

To install Polars, first make sure you are in a virtual environment. Then, type: 

pip install polars 

If you already have Polars installed and want to upgrade it, type: 

pip install polars --upgrade 

In the book we will use other libraries, including numpy, pandas, matplotlib. You can install them with the syntax above, and you can also install multiple libraries at the same tine:  

pip install numpy pandas matplotlib 

Let’s now get our development environment set-up. We will use Visual Studio code, but you are free to use any other IDE that you like. 

1. Type code . in the command line to open Visual Studio Code. 

2. Right-click on the left, choose New File, and create first_dataframe.ipynb

setting-up-polars-for-data-analysis-img-0 
                                                                  Figure – Creating a new file in Visual Studio Code 

Files with extension .ipynb are Jupyter Notebook files, which are great for data analysis. 

To work with these files you need to install the Jupyter extension on VS Code. You can do that by clicking on ‘Extensions’ on the left bar, searching for Jupyter, installing it, and activating it. 

setting-up-polars-for-data-analysis-img-1 
Figure – Install Jupyter extension in Visual Studio Code 

3. Now back to our file. 

Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime

The first thing to ensure is that we are using Python from our virtual environment. Click on Select Kernel at the top right, then click on the Python that starts with env/: that will be the Python for our virtual environment. 
Avoid the paths starting with /usr and /bin as those are the system Python instead of our virtual environment. 

setting-up-polars-for-data-analysis-img-2 
Figure – Select the Python interpreter in Visual Studio code 

Now, we're ready for Polars. 

4. Type import polars as pl in the first cell and press Shift + Enter to run it. 

5. Create a dataframe in the next cell by typing: 

df = pl.DataFrame({ 
    'a': ['Hello', 'World!'] 
}) 

6. Press Shift + Enter to run the cell. 

This creates a dataframe called df with one column named 'a' and two rows: 'Hello' and 'World!' 

To see the dataframe, type df in the next cell and run it. 

setting-up-polars-for-data-analysis-img-3 
Figure – Visual Studio code with first Polars dataframe 

We created our first Polars dataframe. 

Using Polars on the cloud with Google Colab 

Instead of installing Polars on your computer, you can also use it in the cloud. One popular cloud service for running code is Google Colab. This way, you don't need to install anything on your machine. 

To access Google Colab, visit https://colab.research.google.com/ in your web browser. Click on "New Notebook," and you'll see a page that looks similar to VS Code. 

Now, let's create the same Polars dataframe example in Google Colab: 

1. In the first cell, type the following command to ensure we have the latest version of Polars: 

%pip install polars --upgrade 

2. Next, enter this code to import Polars and create a dataframe: 

import polars as pl 
df = pl.DataFrame({ 
    'a': ['Hello', 'World !'] 
}) 

Finally, display the dataframe by typing: 

df 

And that's it! You now have your first Polars dataframe in Google Colab. setting-up-polars-for-data-analysis-img-4 
                                                                   Figure – Google Colab with first Polars dataframe 

Conclusion

In closing, Polars offers a bridge to the future of data analysis. With the knowledge and hands-on experience gained from this article, you're well-prepared to conquer the intricacies of data manipulation and visualization. The ability to effortlessly create, manipulate, and analyze data using Polars is a powerful tool in your arsenal. Whether you're a data enthusiast or a seasoned analyst, embracing Polars sets you on a path toward efficiency, precision, and data-driven success. As the data landscape continues to evolve, you're now equipped to stay ahead, make informed decisions, and revolutionize your approach to data exploration.

Author Bio

Luca Zanna is a Data Engineer and Data Analyst with over 15 years of experience. He started his career as a financial data analyst after a Master's in Management and passing the Certified Public Accountant (CPA) exam. Luca spent a decade working on financial analysis systems at L’Oréal: developing the systems and training financial analysts across Europe and Asia.

Currently, Luca helps companies with building data infrastructure to better leverage their data. Luca is also a corporate teacher for topics such as data analysis, SQL, Python, and cloud data engineering.