🐛 Bug Report: Inconsistent Output Type When Using LLM With PandasAI

May 14, 2025 by ADMIN 69 views

Introduction

PandasAI is a powerful library that combines the capabilities of Pandas and Large Language Models (LLMs) to provide a seamless experience for data analysis and manipulation. However, a recent bug has been reported that affects the output type of LLM when used with PandasAI. In this article, we will delve into the details of this bug, explore the steps to reproduce it, and discuss a suggested fix to ensure consistent return types.

Steps to Reproduce

To reproduce this bug, follow these steps:

Step 1: Install Required Libraries

First, ensure that you have the necessary libraries installed. You can install them using pip:

pip install pandasai pandas

Step 2: Import Libraries and Create a Sample DataFrame

Next, import the required libraries and create a sample DataFrame:

from pandasai import SmartDataframe
from pandasai.llm import LiteLLM

# Sample DataFrame
platform_users_df = SmartDataframe({
    "Name": ["Alice", "Bob", "Charlie"],
    "Country": ["United States", "United States", "Spain"]
})

Step 3: Set Up LLM

Now, set up the LLM using the LiteLLM class:

# Set up LLM
model = "gemini/gemini-2.0-flash"
llm = LiteLLM(model=model)
platform_users_df.config.set({"llm": llm})

Step 4: Ask a Question

Finally, ask a question using the chat method:

platform_users_df.chat("Which country has the most users?")

Expected Behaviour

When you run the code, you would expect the LLM to return a plain string answer, like this:

Returns a plain string: "United States has the most users (2)."

Actual Behavior

However, the actual behavior is inconsistent. In some executions, the LLM returns a Pandas Series or structured object, while in others, it returns a plain string. This inconsistency makes it difficult to use the result programmatically.

Example Output

Here's an example of the inconsistent output:

Execution #1:

0    United States
1    United States
2         Spain
Name: Country, dtype: object

Execution #2:

Returns a plain string: "United States has the most users (2)."

Suggested Fix

To fix this issue, we suggest ensuring consistent return types. Here are a few possible solutions:

Solution 1: Always Return a Pandas Object

One possible solution is to always return a Pandas object, even if the answer is a plain string. This would ensure that the output is consistent and can be easily processed programmatically.

Solution 2: Provide an Option or Parameter to Return Structured Data

Another possible solution is to provide an option or parameter to return structured data instead of a plain string. This would give users more control over the output and allow them to choose the format that best suits their needs.

Conclusion

In conclusion, the inconsistent output type of LLM when used with PandasAI is a significant issue that affects the usability of the library. By following the steps to reproduce this bug and exploring the suggested fix, we can ensure that the output is consistent and can be easily processed programmatically. We hope that this article has been helpful in highlighting this issue and providing a possible solution.

Future Work

In the future, we plan to work on implementing the suggested fix and ensuring that the output is consistent across all executions. We also plan to provide more options and parameters to give users more control over the output and allow them to choose the format that best suits their needs.

Acknowledgments

We would like to thank the PandasAI community for reporting this issue and providing valuable feedback. We are committed to providing a high-quality library that meets the needs of our users, and we appreciate your continued support.

References

PandasAI Documentation
Large Language Models
🐛 Bug Report: Inconsistent Output Type When Using LLM with PandasAI - Q&A ================================================================================

Introduction

In our previous article, we discussed the inconsistent output type of LLM when used with PandasAI. In this article, we will answer some frequently asked questions (FAQs) related to this issue.

Q: What is the cause of this inconsistency?

A: The cause of this inconsistency is not yet fully understood. However, it is believed to be related to the way the LLM is integrated with PandasAI. Further investigation is needed to determine the root cause of this issue.

Q: How can I reproduce this bug?

A: To reproduce this bug, follow the steps outlined in our previous article:

Step 1: Install Required Libraries

First, ensure that you have the necessary libraries installed. You can install them using pip:

pip install pandasai pandas

Step 2: Import Libraries and Create a Sample DataFrame

Next, import the required libraries and create a sample DataFrame:

from pandasai import SmartDataframe
from pandasai.llm import LiteLLM

# Sample DataFrame
platform_users_df = SmartDataframe({
    "Name": ["Alice", "Bob", "Charlie"],
    "Country": ["United States", "United States", "Spain"]
})

Step 3: Set Up LLM

Now, set up the LLM using the LiteLLM class:

# Set up LLM
model = "gemini/gemini-2.0-flash"
llm = LiteLLM(model=model)
platform_users_df.config.set({"llm": llm})

Step 4: Ask a Question

Finally, ask a question using the chat method:

platform_users_df.chat("Which country has the most users?")

Q: What are the expected and actual behaviors?

A: The expected behavior is that the LLM returns a plain string answer. However, the actual behavior is inconsistent, with the LLM returning either a Pandas Series or structured object or a plain string.

Q: How can I fix this issue?

A: To fix this issue, we suggest ensuring consistent return types. Here are a few possible solutions:

Solution 1: Always Return a Pandas Object

One possible solution is to always return a Pandas object, even if the answer is a plain string. This would ensure that the output is consistent and can be easily processed programmatically.

Solution 2: Provide an Option or Parameter to Return Structured Data

Q: What is the impact of this inconsistency?

A: The inconsistency in the output type of LLM when used with PandasAI can have a significant impact on the usability of the library. It can make it difficult to use the result programmatically and may require additional processing to convert the output to a consistent format.

Q: Is this issue specific to PandasAI?

A: No, this is not specific to PandasAI. Similar inconsistencies have been reported in other libraries that integrate LLMs with data analysis tools.

Conclusion

In conclusion, the inconsistent output type of LLM when used with PandasAI is a significant issue that affects the usability of the library. By understanding the cause of this issue and exploring possible solutions, we can ensure that the output is consistent and can be easily processed programmatically. We hope that this article has been helpful in answering your questions and providing a better understanding of this issue.