PandasAI is a Python library that extends the capabilities of pandas by providing natural language processing (NLP) capabilities. It uses a large language model (LLM) to generate Python code to answer questions about data, perform data analysis, and generate visualizations. In this answer, we will learn how to use PandasAI for data analysis with a dataframe.
SmartDataframe
?SmartDataframe
is a class in PandasAI that provides a high-level interface to the library. It allows users to interact with their data in natural language to answer their questions or perform the desired task. We can interact with it in natural language to answer questions about our data, perform data analysis, and generate visualizations.
Let’s see the implementation of SmartDataframe
in Python to interact with it in natural language.
import pandas as pd from pandasai import SmartDataframe # SampleDataFrame df = { "Movie Title": ["The Shawshank Redemption", "The Godfather", "Pulp Fiction", "The Dark Knight", "Forrest Gump", "Inception", "Schindler's List", "The Matrix", "Fight Club", "The Lord of the Rings: The Fellowship of the Ring"], "Year": [1994, 1972, 1994, 2008, 1994, 2010, 1993, 1999, 1999, 2001], "IMDb Rating": [9.3, 9.2, 8.9, 9.0, 8.8, 8.8, 8.9, 8.7, 8.8, 8.8], "Runtime (minutes)": [142, 175, 154, 152, 142, 148, 195, 136, 139, 178], "Genre": ["Drama", "Crime", "Crime", "Action", "Drama", "Action", "Biography", "Action", "Drama", "Adventure"] } from pandasai.llm import OpenAI llm = OpenAI(api_token="OpenAI_API_key") df = SmartDataframe(df, config={"llm": llm}) answer = df.chat('What are the five best movies?') print(answer)
Note: Make sure to replace
OPENAI_API_KEY
with your actual OpenAI API key.
Line 2: We import SmartDataframe
from pandasai
to answer our questions i.e., for data analysis.
Lines 5–11: We create a sample dataframe of movies including its IMDb rating
, its Genre
, its Runtime (minutes)
and its Year
of release.
Lines 12–13: We import and initialize the OpenAI
language model (referred to as llm
here) from the pandasai.llm
module.
Lines 15–17: We instantiated a SmartDataframe
object to interact with it in natural language to answer questions about our data.
Now, let's try a new prompt and observe the response generated by the LLM of PandasAI.
import pandas as pd from pandasai import SmartDataframe # SampleDataFrame df = { "Movie Title": ["The Shawshank Redemption", "The Godfather", "Pulp Fiction", "The Dark Knight", "Forrest Gump", "Inception", "Schindler's List", "The Matrix", "Fight Club", "The Lord of the Rings: The Fellowship of the Ring"], "Year": [1994, 1972, 1994, 2008, 1994, 2010, 1993, 1999, 1999, 2001], "IMDb Rating": [9.3, 9.2, 8.9, 9.0, 8.8, 8.8, 8.9, 8.7, 8.8, 8.8], "Runtime (minutes)": [142, 175, 154, 152, 142, 148, 195, 136, 139, 178], "Genre": ["Drama", "Crime", "Crime", "Action", "Drama", "Action", "Biography", "Action", "Drama", "Adventure"] } from pandasai.llm import OpenAI llm = OpenAI(api_token="OpenAI_API_key") df = SmartDataframe(df, config={"llm": llm}) answer = df.chat('Which is the second best movie of Crime genre?') print(answer)
Free Resources