Conversation_item_added Prints Raw Tool_outputs Dict Instead Of Conversational Reply

May 9, 2025 by ADMIN 85 views

**Conversation Item Added Prints Raw Tool Outputs Instead of Conversational Reply**

Introduction

In a Livekit conversation, the conversation_item_added event is triggered when a new item is added to the conversation. This event is used to handle the output from the agent, which can be either a natural-language response or a raw tool output. However, in some cases, the conversation_item_added event prints the raw tool outputs dictionary instead of speaking a natural-language response. This article will explore the issue and provide a solution to fix it.

Understanding the Issue

The issue arises when the agent emits the raw tool outputs dictionary instead of speaking a natural-language response. This can be seen in the code snippet below:

@session.on("conversation_item_added")
def _on_agent_output(ev: ConversationItemAddedEvent):
    if ev.item.role == "assistant":
        print(f"BOT ▶ {ev.item.text_content}", flush=True)
        post_conversation_message(
            conversation_id,
            sender="agent",
            text=ev.item.text_content,
            building_id=building_id,
            lock_id=lock_id,
        )

In this code snippet, the conversation_item_added event is triggered when a new item is added to the conversation. The if statement checks if the item's role is "assistant". If it is, the code prints the text content of the item and posts a conversation message. However, when the agent emits the raw tool outputs dictionary, the code prints the dictionary instead of speaking a natural-language response.

Livekit Version and Session Definition

The Livekit version being used is 1.0.20. The session definition is as follows:

session = AgentSession(
    llm=google.beta.realtime.RealtimeModel(model="gemini-2.0-flash-exp", voice="Charon"),
    turn_detection=EnglishModel()
)

In this session definition, the AgentSession is created with a RealtimeModel from Google's beta API and an EnglishModel for turn detection.

Solution

To fix the issue, we need to check if the item's text content is a dictionary before printing it. We can do this by using the isinstance function to check if the text content is a dictionary. If it is, we can print the natural-language response instead of the raw tool outputs dictionary.

Here's the updated code snippet:

@session.on("conversation_item_added")
def _on_agent_output(ev: ConversationItemAddedEvent):
    if ev.item.role == "assistant":
        text_content = ev.item.text_content
        if isinstance(text_content, dict):
            # Get the natural-language response from the dictionary
            response = text_content.get("message")
            print(f"BOT ▶ {response}", flush=True)
            post_conversation_message(
                conversation_id,
                sender="agent",
                text=response,
                building_id=building_id,
                lock_id=lock_id,
            )
        else:
            print(f"BOT ▶ {text_content}", flush=True)
            post_conversation_message(
                conversation_id,
                sender="agent",
                text=text_content,
                building_id=building_id,
                lock_id=lock_id,
            )
`

In this updated code snippet, we first check if the text content is a dictionary using the `isinstance` function. If it is, we get the natural-language response from the dictionary using the `get` method. We then print the natural-language response and post a conversation message. If the text content is not a dictionary, we print the text content and post a conversation message.

**Conclusion**
----------

In conclusion, the `conversation_item_added` event prints the raw tool outputs dictionary instead of speaking a natural-language response when the agent emits the raw tool outputs dictionary. To fix this issue, we need to check if the item's text content is a dictionary before printing it. We can do this by using the `isinstance` function to check if the text content is a dictionary. If it is, we can print the natural-language response instead of the raw tool outputs dictionary.<br/>
**Conversation Item Added Prints Raw Tool Outputs Instead of Conversational Reply: Q&A**
====================================================================================

**Q: What is the `conversation_item_added` event in Livekit?**
---------------------------------------------------------

A: The `conversation_item_added` event is triggered when a new item is added to the conversation in Livekit. This event is used to handle the output from the agent, which can be either a natural-language response or a raw tool output.

**Q: Why is the `conversation_item_added` event printing the raw tool outputs dictionary instead of speaking a natural-language response?**
--------------------------------------------------------------------------------------------------------------------------------

A: The `conversation_item_added` event is printing the raw tool outputs dictionary instead of speaking a natural-language response because the agent is emitting the raw tool outputs dictionary instead of speaking a natural-language response.

**Q: What is the issue with the raw tool outputs dictionary being printed instead of a natural-language response?**
----------------------------------------------------------------------------------------------------------------

A: The issue with the raw tool outputs dictionary being printed instead of a natural-language response is that it is not a natural-language response and may not be understood by the user. A natural-language response is a response that is written in a way that is easy for humans to understand, whereas a raw tool outputs dictionary is a dictionary of data that may not be easily understood by humans.

**Q: How can I fix the issue of the `conversation_item_added` event printing the raw tool outputs dictionary instead of speaking a natural-language response?**
-----------------------------------------------------------------------------------------------------------------------------------------

A: To fix the issue of the `conversation_item_added` event printing the raw tool outputs dictionary instead of speaking a natural-language response, you need to check if the item's text content is a dictionary before printing it. You can do this by using the `isinstance` function to check if the text content is a dictionary. If it is, you can print the natural-language response instead of the raw tool outputs dictionary.

**Q: How do I check if the item's text content is a dictionary?**
----------------------------------------------------------------

A: To check if the item's text content is a dictionary, you can use the `isinstance` function. The `isinstance` function takes two arguments: the object to check and the class to check against. If the object is an instance of the class, the `isinstance` function returns `True`. If the object is not an instance of the class, the `isinstance` function returns `False`.

**Q: How do I get the natural-language response from the raw tool outputs dictionary?**
--------------------------------------------------------------------------------

A: To get the natural-language response from the raw tool outputs dictionary, you can use the `get` method. The `get` method takes two arguments: the key to get and the default value to return if the key is not found. If the key is found, the `get` method returns the value associated with the key. If the key is not found, the `get` method returns the default value.

**Q: What is the code snippet to fix the issue of the `conversation_item_added` event printing the raw tool outputs dictionary instead of speaking a natural-language response?**
-----------------------------------------------------------------------------------------------------------------------------------------

A: The code snippet to fix the issue of the `conversation_item_added` event printing the raw tool outputs dictionary instead of speaking a natural-language response is as follows:

```python
@session.on("conversation_item_added")
def _on_agent_output(ev: ConversationItemAddedEvent):
    if ev.item.role == "assistant":
        text_content = ev.item.text_content
        if isinstance(text_content, dict):
            # Get the natural-language response from the dictionary
            response = text_content.get("message")
            print(f"BOT ▶ {response}", flush=True)
            post_conversation_message(
                conversation_id,
                sender="agent",
                text=response,
                building_id=building_id,
                lock_id=lock_id,
            )
        else:
            print(f"BOT ▶ {text_content}", flush=True)
            post_conversation_message(
                conversation_id,
                sender="agent",
                text=text_content,
                building_id=building_id,
                lock_id=lock_id,
            )

Q: What is the Livekit version being used in this example?

A: The Livekit version being used in this example is 1.0.20.

Q: What is the session definition being used in this example?

A: The session definition being used in this example is as follows:

session = AgentSession(
    llm=google.beta.realtime.RealtimeModel(model="gemini-2.0-flash-exp", voice="Charon"),
    turn_detection=EnglishModel()
)

In this session definition, the AgentSession is created with a RealtimeModel from Google's beta API and an EnglishModel for turn detection.