Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!
Data analysis often involves complex, tedious coding tasks that make it seem reserved only for experts. But imagine a future where anyone could gain insights through natural conversations - where your data speaks plainly instead of through cryptic tables.
PandasAI makes this future a reality. In this comprehensive guide, we'll walk through all aspects of adding conversational capabilities to data analysis workflows using this powerful new library. You'll learn:
● Installing and configuring PandasAI
● Querying data and generating visualizations in plain English
● Connecting to databases, cloud storage, APIs, and more
● Customizing PandasAI config
● Integrating PandasAI into production workflows
● Use cases across industries like finance, marketing, science, and more
Follow along to master conversational data analysis with PandasAI!
Let's start by installing PandasAI using pip or poetry.
To install with pip:
pip install pandasai
Make sure you are using an up-to-date version of pip to avoid any installation issues.
For managing dependencies, we recommend using poetry:
# Install poetry
pip install --user poetry
# Install pandasai
poetry add pandasai
This will install PandasAI and all its dependencies for you.
For advanced usage, install all optional extras:
poetry add pandasai –all-extras
This includes dependencies for additional capabilities you may need later like connecting to databases, using different NLP models, advanced visualization, etc.
With PandasAI installed, we are ready to start importing it and exploring its conversational interface!
Let's initialize a PandasAI DataFrame from a CSV file:
from pandasai import SmartDataframe
df = SmartDataframe("sales.csv")
This creates a SmartDataFrame that wraps the underlying Pandas DataFrame but adds conversational capabilities.
We can customize initialization through configuration options:
from pandasai.llm import OpenAI
llm = OpenAI(“<your api key>”)
config = {
"llm":
}
df = SmartDataFrame("sales.csv", config=config)
This initializes the DataFrame using OpenAI model.
For easy multi-table analysis, use SmartDatalake:
from pandasai import SmartDatalake
dl = SmartDatalake(["sales.csv", "inventory.csv"])
SmartDatalake conversates across multiple related data sources.
We can also connect to live data sources like databases during initialization:
from pandasai.connectors import MySQLConnector
mysql_conn = MySQLConnector(config={
"host": "localhost",
"port": 3306,
"database": "mydb",
"username": "root",
"password": "root",
"table": "loans",
})
df = SmartDataframe(mysql_conn)
This connects to a MySQL database so we can analyze the live data interactively.
The most exciting part of PandasAI is exploring data through natural language. Let's go through some examples!
Calculate totals:
df.chat("What is the total revenue for 2022?")
# Prints revenue total
Filter data:
df.chat("Show revenue for electronics category")
# Filters and prints electronics revenue
Aggregate by groups:
df.chat("Break down revenue by product category and segment")
# Prints table with revenue aggregated by category and segment
Visualize data:
df.chat("Plot monthly revenue over time")
# Plots interactive line chart
Ask for insights:
df.chat("Which segment has fastest revenue growth?")
# Prints segments sorted by revenue growth
PandasAI understands the user's questions in plain English and automatically generates relevant answers, tables and charts.
We can ask endless questions and immediately get data-driven insights without writing any SQL queries or analysis code!
A key strength of PandasAI is its broad range of built-in data connectors. This enables conversational analytics on diverse data sources.
from pandasai.connectors import PostgreSQLConnector
pg_conn = PostgreSQLConnector(config={
"host": "localhost",
"port": 5432,
"database": "mydb",
"username": "root",
"password": "root",
"table": "payments",
})
df = SmartDataframe(pg_conn)
df.chat("Which products had the most orders last month?")
from pandasai.connectors import YahooFinanceConnector
yf_conn = YahooFinanceConnector("AAPL")
df = SmartDataframe(yf_conn)
df.chat("How did Apple stock perform last quarter?")
The connectors provide out-of-the-box access to data across domains for easy conversational analytics.
While PandasAI is designed for simplicity, its architecture is customizable and extensible.
We can configure aspects like:
Language Model
Use different NLP models:
from pandasai.llm import OpenAI, VertexAI
df = SmartDataframe(data, config={"llm": VertexAI()})
Custom Instructions
Add data preparation logic:
config["custom_instructions"] = """
Prepare data:
- Filter outliers
- Impute missing values
These options provide advanced control for tailored workflows.
Since PandasAI is built on top of Pandas, it integrates smoothly into data pipelines:
import pandas as pd
from pandasai import SmartDataFrame
# Load raw data
data = pd.read_csv("sales.csv")
# Clean data
clean_data = clean_data(data)
# PandasAI for analysis
df = SmartDataframe(clean_data)
df.chat("Which products have trending sales?")
# Further processing
final_data = process_data(df)
PandasAI's conversational interface can power the interactive analysis stage in ETL pipelines.
Thanks to its versatile conversational interface, PandasAI can adapt to workflows across multiple industries. Here are a few examples:
Sales Analytics - Analyze sales numbers, find growth opportunities, and predict future performance.
df.chat("How do sales for women's footwear compare to last summer?")
Financial Analysis - Conduct investment research, portfolio optimization, and risk analysis.
df.chat("Which stocks have the highest expected returns given acceptable risk?")
Scientific Research - Explore and analyze the results of experiments and simulations.
df.chat("Compare the effects of the three drug doses on tumor size.")
Marketing Analytics - Measure campaign effectiveness, analyze customer journeys, and optimize spending.
df.chat("Which marketing channels give the highest ROI for millennial customers?")
And many more! PandasAI fits into any field that leverages data analysis, unlocking the power of conversational analytics for all.
This guide covered a comprehensive overview of PandasAI's capabilities for effortless conversational data analysis. We walked through:
● Installation and configuration
● Asking questions in plain English
● Connecting to databases, cloud storage, APIs
● Customizing NLP and visualization
● Integration into production pipelines
PandasAI makes data analysis intuitive and accessible to all. By providing a natural language interface, it opens up insights from data to a broad range of users.
Start adding a conversational layer to your workflows with PandasAI today! Democratize data science and transform how your business extracts value from data through the power of AI.
Gabriele Venturi is a software engineer and entrepreneur who started coding at the young age of 12. Since then, he has launched several projects across gaming, travel, finance, and other spaces - contributing his technical skills to various startups across Europe over the past decade.
Gabriele's true passion lies in leveraging AI advancements to simplify data analysis. This mission led him to create PandasAI, released open source in April 2023. PandasAI integrates large language models into the popular Python data analysis library Pandas. This enables an intuitive conversational interface for exploring data through natural language queries.
By open-sourcing PandasAI, Gabriele aims to share the power of AI with the community and push boundaries in conversational data analytics. He actively contributes as an open-source developer dedicated to advancing what's possible with generative AI.