What is PandasAI?

Key takeways:

  1. PandasAI enhances traditional data analysis workflows by integrating AI capabilities directly with pandas DataFrames, making it easier to manipulate and analyze data.

  2. Users can interact with their data using natural language prompts, simplifying data exploration and analysis without needing to write complex code.

  3. PandasAI provides automated insights and interpretations of data, helping users uncover patterns and trends quickly.

  4. The platform allows for easy integration of machine learning models, enabling predictive analytics and decision-making directly within the data environment.

  5. PandasAI simplifies the process of generating visualizations by allowing users to request charts and graphs through natural language prompts.

  6. Built on open-source technologies, PandasAI can be customized and adapted to suit various data analysis needs.

Unlock the power of data analysis with PandasAI, a revolutionary tool designed to enhance your Python experience. By seamlessly integrating artificial intelligence with the popular pandas library, PandasAI allows users to perform complex data manipulations and gain insights using natural language queries. Whether you’re a data scientist, analyst, or a beginner, this innovative solution simplifies data exploration, visualization, and machine learning integration. In this Answer, we’ll learn how PandasAI can transform your data analysis workflow and help you make informed decisions effortlessly.

Generative AI and large language models (LLMs) have ushered in a new era of artificial intelligence and machine learning, enabling the development of advanced applications like PandasAI. As a powerful fusion of Python’s renowned pandas library and OpenAI's GPT, PandasAI revolutionizes data analysis and visualization tasks, offering a remarkably efficient and user-friendly approach.

PandasAI

PandasAI is a cutting-edge tool that seamlessly blends Python’s pandas library with the power of generative AI LLMs. This unique combination empowers users to perform data analysis and visualization tasks with remarkable ease and efficiency. Unlike traditional data analysis methods that involve manual manipulation and coding, PandasAI allows users to interact with data through natural language prompts.

Fun Fact: The name "Pandas" comes from "panel data," a term used for multi-dimensional structured data, especially in finance and statistics. Although it might make you think of the adorable animal, the name actually highlights the library's ability to handle complex datasets with ease. But hey, the panda reference does make it more memorable!

How to install PandasAI

First of all, we need to install pandasai on our local system. To install it, we’ll use the following command:

pip install pandasai

PandasAI tutorial

To use PandasAI, we begin by creating a DataFrame, which is essential for its implementation. To achieve this, we first load the Iris dataset from scikit-learn and then proceed to create the DataFrame using the loaded data.

import pandas as pd
from sklearn.datasets import load_iris
import numpy as np
# Load the Iris dataset
iris = load_iris()
data = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
columns= iris['feature_names'] + ['target'])
print(data.head())

Code explanation

  • Line 6: We load the Iris dataset using the load_iris function from scikit-learn.

  • Line 8: We use np.c_ to concatenate the features and the target variable into a single array. Then, a DataFrame is created with this combined array, and the column names are set using iris['feature_names'] + ['target'].

  • Line 11: Finally, we print the first few rows of the DataFrame data using the head() method. This gives a glimpse of the data in the Iris dataset.

from pandasai import PandasAI
# Use your API key to instantiate an LLM
from pandasai.llm.openai import OpenAI
llm = OpenAI(api_token=f"{'USE_YOUR_API_HERE'}")
pandas_ai = PandasAI(llm)
prompt = 'Show the info of data in tabular form'
pandas_ai(data, prompt=prompt)

Code explanation

  • Line 1: We import the PandasAI class from the pandasai library.

  • Line 5: We use the OpenAI class to instantiate an LLM. The api_token parameter is used to specify your OpenAI API key. You can get your API key from the OpenAI website.

  • Line 6: We create an instance of the PandasAI class. This instance is passed the LLM that we just instantiated.

  • Lines 8–9: Finally, we define a prompt that is used to generate the code that will be used to analyze the data.

To compare the output generated by PandasAI, we can use the .info function.

data.info()

Now let’s move forward to some manipulation that we can perform using PandasAI. Let’s create a confusion matrix using it.

from pandasai import PandasAI
# Use your API key to instantiate an LLM
from pandasai.llm.openai import OpenAI
llm = OpenAI(api_token=f"{'USE_YOUR_API_HERE'}")
pandas_ai = PandasAI(llm)
prompt = "Show the correlation matrix of the data in the tabular form"
a = pandas_ai(data, prompt=prompt)
print(a)

Note: The output of the above correlation matrix shows that petal length and petal width are strongly correlated with the target variable.

PandasAI vs. pandas

Both PandasAI and pandas are powerful tools for data analysis in Python, but they serve different purposes and offer unique advantages. Here’s a comparison to highlight their distinctions:

  1. Core functionality: Pandas focuses on data manipulation and analysis, while PandasAI integrates AI to automate and enhance these tasks.

  2. Automation: Pandas requires manual coding for operations, whereas PandasAI automates processes like data cleaning and predictions using AI.

  3. User communication: Pandas relies on explicit coding, while PandasAI enables interaction through natural language and high-level prompts.

  4. AI integration: Pandas doesn’t include AI, but PandasAI embeds AI models directly into workflows for smarter analysis.

If you’re interested in expanding your knowledge of the pandas library, you’re in the right place! Check out these courses to elevate your data manipulation skills:

  1. Mastering Data Analysis with Python Pandas

  2. Advanced pandas—Going Beyond the Basics

  3. Data Analysis & Processing with Pandas

  4. Data Analytics Interview Prep Using Pandas

  5. Pandas: Python for Data Analysis

PandasAI use cases

Let’s explore the top features of PandasAI, demonstrated through practical use cases.

AI-powered data analysis

Imagine you have a dataset of customer reviews and want to analyze their sentiment. With PandasAI, you can easily apply a sentiment analysis model directly to the pandas DataFrame to categorize reviews as positive, negative, or neutral.

df['sentiment'] = pandas_ai.analyze_sentiment(df['reviews'])

PandasAI automates this AI-driven analysis without the need for manual model building.

Natural language queries

Suppose you have sales data and want a quick summary of the total sales. Instead of writing code, you can ask PandasAI directly in plain English:

pandas_ ai.run (df, prompt="What are the total sales in this dataset?")

PandasAI processes the query and provides the result, making data interaction intuitive.

Automated decision-making

You might have a dataset with missing values, and instead of manually imputing them, you can let PandasAI handle it. It can automatically decide the best method to fill in the missing values (e.g., using the mean, median, or mode).

pandas_ai.run(df, prompt="Fill the missing values in this dataset")

The AI will take care of the missing data intelligently based on the context.

Machine learning integration

If you’re predicting house prices based on features like location, size, and amenities, you can integrate a machine learning model directly into PandasAI. After training, you can use the model to make predictions within the DataFrame.

df['predicted_price'] = pandas_ai.run(df, model=my_ml_model, task="predict", features=['location', 'size', 'amenities'])

This allows for seamless integration of machine learning predictions with your data.

Seamless pandas compatibility

Suppose you’re already using pandas for basic data manipulation, such as filtering a dataset of sales by region. You can easily add AI-driven enhancements on top without changing your existing workflow.

df_filtered = df[df['region'] == 'North']
pandas_ ai.run (df_filtered, prompt="Analyze this region's sales performance")

You get the best of both pandas and AI without needing to learn a new tool from scratch.

Custom AI model support

Let’s say you’ve built a custom image classification model, and you want to run it on image data stored in a pandas DataFrame. PandasAI can handle custom models and apply them directly to the relevant columns.

df['image_label'] = pandas_ai.run(df, model=my_custom_model, task="classify_images", column='image_data')

It supports a variety of AI models, making it flexible for specific tasks.

Data visualization simplification

You have sales data over several months and want to visualize it. With PandasAI, you can simply ask it to plot the data without manually writing code for Matplotlib or seaborn.

pandas_ai.run(df, prompt="Plot the monthly sales trends")
  • PandasAI automates the creation of the chart, providing instant visualization with minimal effort.

Improved data interpretation

If you're analyzing a stock price dataset, you may want to identify patterns, such as periods of high volatility. PandasAI can detect these patterns and provide an interpretation based on the data trends

pandas_ai.run(df, prompt="Identify periods of high volatility in stock prices")

It helps interpret complex data patterns, providing valuable insights with AI assistance.

Why choose PandasAI over direct prompting?

PandasAI serves as a bridge between traditional pandas functionality and LLM-based enhancements, offering several key advantages:

  • Seamless integration with DataFrames, so you can continue to manipulate and analyze your data while leveraging LLM capabilities without switching contexts.

  • Simplified querying, data visualization, and machine learning without needing to manually prompt the LLM repeatedly.

  • Time-saving automation that makes it easier to conduct data analysis without needing to write detailed code or structure prompts carefully.

In essence, PandasAI is designed to help users automate complex data tasks. It combines AI’s power with pandas' flexibility in an easy-to-use, integrated tool, making it particularly useful for data analysts and professionals who want to streamline their workflows.

Curious about the world of generative AI? Explore these fantastic courses we offer to deepen your understanding and skills.

Frequently asked questions

Haven’t found what you were looking for? Contact Us


How does PandasAI simplify data manipulation in Python?

PandasAI simplifies data manipulation by allowing users to interact with data using natural language queries, reducing the need for complex coding. It integrates seamlessly with pandas DataFrames, automating tasks like data cleaning and visualization for a more efficient workflow.


How can I integrate PandasAI with Python and other tools?

To integrate PandasAI, install the library using pip and import it into your Python script. For data visualization, you can use it alongside other tools like Matplotlib or seaborn and run it in Jupyter Notebooks for an interactive experience.


Is Pandas AI open source?

Yes, PandasAI is open source and available on platforms like GitHub, enabling users to explore, modify, and contribute to the library.


Does PandasAI use OpenAI?

PandasAI can use OpenAI models, but it is not limited to them. Users can also integrate various language models from sources like Hugging Face, allowing flexibility in choosing the best LLM for their needs.


Free Resources

Copyright ©2025 Educative, Inc. All rights reserved