Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!
This article is an excerpt from the book, Data Analysis with Polars, by Luca Zanna. Leverage Polars, the lightning-fast dataframe library, to take your Python data analysis skills to the next level
In the ever-evolving landscape of data analysis, harnessing the right tools and methodologies can make all the difference. Welcome to a world where Polars, a powerful data manipulation library, takes center stage. This article is your gateway to unlocking the potential of Polars, and it begins by unraveling the essential components of the data analysis journey. From setting up virtual environments to simplifying data analysis in the cloud with Google Colab, we explore how Polars streamlines your path to insights. Whether you're a seasoned data analyst or just starting your journey, this guide will equip you with the knowledge and tools needed to make your data analysis endeavors efficient and rewarding. Join us as we delve into the fascinating realm of Polars and embrace a new era of data exploration.
We will not go through the installation of Python as that is outside the scope of the book. A visit to python.org will give all the information necessary to install Python.
Now on to virtual environments.
Imagine you have built a fantastic data analysis project using Polars. Your project uses:
Now, you start a new project, and you want to use a newer Polars (0.16.14), along with Numpy and Arrow. So, the new project requires:
Upgrading Polars and Numpy libraries globally isn't a good idea. If Polars functions have changed between versions, your first project might stop working or give incorrect results with the new version.
This is where virtual environments come in. Virtual environments create separate 'spaces' for each project: one for your first data analysis project and another for your new data pipeline project.
You can set up a virtual environment manually or have your IDE set-up a virtual environment for you. If you decide to set it up manually, you can check out the guide at https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/#creating-a-virtual-environment.
To install Polars, first make sure you are in a virtual environment. Then, type:
pip install polars
If you already have Polars installed and want to upgrade it, type:
pip install polars --upgrade
In the book we will use other libraries, including numpy, pandas, matplotlib. You can install them with the syntax above, and you can also install multiple libraries at the same tine:
pip install numpy pandas matplotlib
Let’s now get our development environment set-up. We will use Visual Studio code, but you are free to use any other IDE that you like.
1. Type code . in the command line to open Visual Studio Code.
2. Right-click on the left, choose New File, and create first_dataframe.ipynb.
Figure – Creating a new file in Visual Studio Code
Files with extension .ipynb are Jupyter Notebook files, which are great for data analysis.
To work with these files you need to install the Jupyter extension on VS Code. You can do that by clicking on ‘Extensions’ on the left bar, searching for Jupyter, installing it, and activating it.
Figure – Install Jupyter extension in Visual Studio Code
3. Now back to our file.
The first thing to ensure is that we are using Python from our virtual environment. Click on Select Kernel at the top right, then click on the Python that starts with env/: that will be the Python for our virtual environment.
Avoid the paths starting with /usr and /bin as those are the system Python instead of our virtual environment.
Figure – Select the Python interpreter in Visual Studio code
Now, we're ready for Polars.
4. Type import polars as pl in the first cell and press Shift + Enter to run it.
5. Create a dataframe in the next cell by typing:
df = pl.DataFrame({
'a': ['Hello', 'World!']
})
6. Press Shift + Enter to run the cell.
This creates a dataframe called df with one column named 'a' and two rows: 'Hello' and 'World!'
To see the dataframe, type df in the next cell and run it.
Figure – Visual Studio code with first Polars dataframe
We created our first Polars dataframe.
Instead of installing Polars on your computer, you can also use it in the cloud. One popular cloud service for running code is Google Colab. This way, you don't need to install anything on your machine.
To access Google Colab, visit https://colab.research.google.com/ in your web browser. Click on "New Notebook," and you'll see a page that looks similar to VS Code.
Now, let's create the same Polars dataframe example in Google Colab:
1. In the first cell, type the following command to ensure we have the latest version of Polars:
%pip install polars --upgrade
2. Next, enter this code to import Polars and create a dataframe:
import polars as pl
df = pl.DataFrame({
'a': ['Hello', 'World !']
})
Finally, display the dataframe by typing:
df
And that's it! You now have your first Polars dataframe in Google Colab.
Figure – Google Colab with first Polars dataframe
In closing, Polars offers a bridge to the future of data analysis. With the knowledge and hands-on experience gained from this article, you're well-prepared to conquer the intricacies of data manipulation and visualization. The ability to effortlessly create, manipulate, and analyze data using Polars is a powerful tool in your arsenal. Whether you're a data enthusiast or a seasoned analyst, embracing Polars sets you on a path toward efficiency, precision, and data-driven success. As the data landscape continues to evolve, you're now equipped to stay ahead, make informed decisions, and revolutionize your approach to data exploration.
Luca Zanna is a Data Engineer and Data Analyst with over 15 years of experience. He started his career as a financial data analyst after a Master's in Management and passing the Certified Public Accountant (CPA) exam. Luca spent a decade working on financial analysis systems at L’Oréal: developing the systems and training financial analysts across Europe and Asia.
Currently, Luca helps companies with building data infrastructure to better leverage their data. Luca is also a corporate teacher for topics such as data analysis, SQL, Python, and cloud data engineering.