Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon

The Future of Data Analysis with PandasAI

Save for later
  • 6 min read
  • 06 Oct 2023

article-image

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!

Introduction

Data analysis often involves complex, tedious coding tasks that make it seem reserved only for experts. But imagine a future where anyone could gain insights through natural conversations - where your data speaks plainly instead of through cryptic tables.

PandasAI makes this future a reality. In this comprehensive guide, we'll walk through all aspects of adding conversational capabilities to data analysis workflows using this powerful new library. You'll learn:

● Installing and configuring PandasAI

● Querying data and generating visualizations in plain English

● Connecting to databases, cloud storage, APIs, and more

● Customizing PandasAI config

● Integrating PandasAI into production workflows

● Use cases across industries like finance, marketing, science, and more

Follow along to master conversational data analysis with PandasAI!

Installation and Configuration

Install PandasAI

Let's start by installing PandasAI using pip or poetry.

To install with pip:

pip install pandasai

Make sure you are using an up-to-date version of pip to avoid any installation issues.

For managing dependencies, we recommend using poetry:

# Install poetry
pip install --user poetry

# Install pandasai
poetry add pandasai

This will install PandasAI and all its dependencies for you.

For advanced usage, install all optional extras:

poetry add pandasai –all-extras

This includes dependencies for additional capabilities you may need later like connecting to databases, using different NLP models, advanced visualization, etc.

With PandasAI installed, we are ready to start importing it and exploring its conversational interface!

Import and Initialize PandasAI

Let's initialize a PandasAI DataFrame from a CSV file:

from pandasai import SmartDataframe

df = SmartDataframe("sales.csv")

This creates a SmartDataFrame that wraps the underlying Pandas DataFrame but adds conversational capabilities.

We can customize initialization through configuration options:

from pandasai.llm import OpenAI

llm = OpenAI(“<your api key>”)
config = {
  "llm":
}

df = SmartDataFrame("sales.csv", config=config)

This initializes the DataFrame using OpenAI model.

For easy multi-table analysis, use SmartDatalake:

from pandasai import SmartDatalake
dl = SmartDatalake(["sales.csv", "inventory.csv"])

SmartDatalake conversates across multiple related data sources.

We can also connect to live data sources like databases during initialization:

from pandasai.connectors import MySQLConnector

mysql_conn = MySQLConnector(config={
  "host": "localhost",
  "port": 3306,
  "database": "mydb",
  "username": "root",
  "password": "root",
   "table": "loans",
})

df = SmartDataframe(mysql_conn)

This connects to a MySQL database so we can analyze the live data interactively.

Conversational Data Exploration

Ask Questions in Plain English

The most exciting part of PandasAI is exploring data through natural language. Let's go through some examples!

Calculate totals:

df.chat("What is the total revenue for 2022?")

# Prints revenue total

Filter data:

df.chat("Show revenue for electronics category")

# Filters and prints electronics revenue

Aggregate by groups:

df.chat("Break down revenue by product category and segment")

# Prints table with revenue aggregated by category and segment

Visualize data:

Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
df.chat("Plot monthly revenue over time")

# Plots interactive line chart

Ask for insights:

df.chat("Which segment has fastest revenue growth?")

# Prints segments sorted by revenue growth

PandasAI understands the user's questions in plain English and automatically generates relevant answers, tables and charts.

We can ask endless questions and immediately get data-driven insights without writing any SQL queries or analysis code!

Connect to Data Sources

A key strength of PandasAI is its broad range of built-in data connectors. This enables conversational analytics on diverse data sources.

Databases

from pandasai.connectors import PostgreSQLConnector

pg_conn = PostgreSQLConnector(config={
  "host": "localhost",
   "port": 5432,
   "database": "mydb",
   "username": "root",
   "password": "root",
   "table": "payments",
})

df = SmartDataframe(pg_conn)
df.chat("Which products had the most orders last month?")

Finance Data

from pandasai.connectors import YahooFinanceConnector

yf_conn = YahooFinanceConnector("AAPL")

df = SmartDataframe(yf_conn)
df.chat("How did Apple stock perform last quarter?")

The connectors provide out-of-the-box access to data across domains for easy conversational analytics.

Advanced Usage

Customize Configuration

While PandasAI is designed for simplicity, its architecture is customizable and extensible.

We can configure aspects like:

Language Model

Use different NLP models:

from pandasai.llm import OpenAI, VertexAI

df = SmartDataframe(data, config={"llm": VertexAI()})

Custom Instructions

Add data preparation logic:

config["custom_instructions"] = """
Prepare data:
  - Filter outliers
  - Impute missing values

These options provide advanced control for tailored workflows.

Integration into Pipelines

Since PandasAI is built on top of Pandas, it integrates smoothly into data pipelines:

import pandas as pd
from pandasai import SmartDataFrame

# Load raw data
data = pd.read_csv("sales.csv")

# Clean data
clean_data = clean_data(data)

# PandasAI for analysis
df = SmartDataframe(clean_data)
df.chat("Which products have trending sales?")

# Further processing
final_data = process_data(df)

PandasAI's conversational interface can power the interactive analysis stage in ETL pipelines.

Use Cases Across Industries

Thanks to its versatile conversational interface, PandasAI can adapt to workflows across multiple industries. Here are a few examples:

Sales Analytics - Analyze sales numbers, find growth opportunities, and predict future performance.

df.chat("How do sales for women's footwear compare to last summer?")

Financial Analysis - Conduct investment research, portfolio optimization, and risk analysis.

df.chat("Which stocks have the highest expected returns given acceptable risk?")

Scientific Research - Explore and analyze the results of experiments and simulations.

df.chat("Compare the effects of the three drug doses on tumor size.")

Marketing Analytics - Measure campaign effectiveness, analyze customer journeys, and optimize spending.

df.chat("Which marketing channels give the highest ROI for millennial customers?")

And many more! PandasAI fits into any field that leverages data analysis, unlocking the power of conversational analytics for all.

Conclusion

This guide covered a comprehensive overview of PandasAI's capabilities for effortless conversational data analysis. We walked through:

● Installation and configuration

● Asking questions in plain English

● Connecting to databases, cloud storage, APIs

● Customizing NLP and visualization

● Integration into production pipelines

PandasAI makes data analysis intuitive and accessible to all. By providing a natural language interface, it opens up insights from data to a broad range of users.

Start adding a conversational layer to your workflows with PandasAI today! Democratize data science and transform how your business extracts value from data through the power of AI.

Author Bio

Gabriele Venturi is a software engineer and entrepreneur who started coding at the young age of 12. Since then, he has launched several projects across gaming, travel, finance, and other spaces - contributing his technical skills to various startups across Europe over the past decade.
Gabriele's true passion lies in leveraging AI advancements to simplify data analysis. This mission led him to create PandasAI, released open source in April 2023. PandasAI integrates large language models into the popular Python data analysis library Pandas. This enables an intuitive conversational interface for exploring data through natural language queries.
By open-sourcing PandasAI, Gabriele aims to share the power of AI with the community and push boundaries in conversational data analytics. He actively contributes as an open-source developer dedicated to advancing what's possible with generative AI.