How to continue the incomplete response of OpenAI API

Share

While using the OpenAI API, there might be instances where the returned response appears unfinished due to the model's token limit. This can pose difficulties, particularly when you're dealing with lengthy discussions or prompts. In this Answer, we'll learn how to manage such situations and continue with incomplete responses.

Understanding the token limit

First, we need to grasp the idea of tokens in relation to OpenAI's models. A token could be a single character or a complete word. For instance, "ChatGPT" counts as one token, as does "a." The models process text in chunks, known as tokens. The aggregate count of tokens in an API call, encompassing both the input and output tokens, must not surpass the model's maximum limit. The gpt-3.5-turbo model has a maximum limit of 4096 tokens.

Here’s an example of how you can count tokens before sending an API request using Tiktoken library in Python:

import tiktoken
encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")
text = "This is an example sentence to count tokens."
token_count = len(encoding.encode(text))
print(f"The text contains {token_count} tokens.")

Identifying incomplete responses

The OpenAI API provides a field called finish_reason in its response. If the response is incomplete, this field would return length, signifying that the response was curtailed due to hitting the token limit.

Handling incomplete responses

Upon encountering an incomplete response, you can proceed with the conversation by including the unfinished response as part of the subsequent prompt. However, one must consider the token limit. If the conversation history, along with the new prompt, goes beyond the token limit, you will need to trim or leave out some parts of the discussion.

Here's a Python code snippet demonstrating this:

import openai
def chat_openai(prompt):
conversation = [{'role': 'user', 'content': prompt}]
response, answer = None, ''
openai.api_key = "{{SECRET_KEY}}"
response = openai.ChatCompletion.create(model='gpt-3.5-turbo', messages=conversation, temperature=1)
answer += response.choices[0].message.content
conversation.append({'role': response.choices[0].message.role, 'content': answer})
return {'answer': answer, 'conversation': conversation, 'response': response}
test = chat_openai("Your initial prompt here.")
print(test['answer'])

In this code, we start off by defining a function named chat_openai that accepts a prompt as input. We initiate a conversation with the prompt and place a call to the OpenAI API. The API's response is then added to the conversation. If the response is incomplete, we can call this function once again with the new prompt and the incomplete response.

Example

Let's work through an example. Assume we have a long conversation that exceeds the token limit. Here's how we can handle it:

import openai
class ConversationAgent:
def __init__(self, model="gpt-3.5-turbo"):
self.model = model
def continue_conversation(self, conversation, next_prompt):
# Make the first API call
response = openai.ChatCompletion.create(model=self.model, messages=conversation, temperature=1)
# Check if the response is incomplete
if response['finish_reason'] == 'length':
conversation.append({'role': 'user', 'content': next_prompt})
# Continue the conversation
next_response = openai.ChatCompletion.create(model=self.model, messages=conversation, temperature=1)
return next_response['choices'][0]['message']['content']
# Usage:
# First set your API key
openai.api_key = "{{SECRET_KEY}}" # Replace 'my_API' with your actual OpenAI API key
# Now you can use the ConversationAgent
agent = ConversationAgent()
conversation = [{'role': 'user', 'content': 'A very long conversation...'}]
next_prompt = 'The next part of the conversation...'
print(agent.continue_conversation(conversation, next_prompt))

Note: This code will only be executable when you enter your API key. To learn how to obtain ChatGPT's API key, click here.

Code explanation

  • Line 1: import openai – Import OpenAI library.

  • Line 3–5: the class ConversationAgent is defined with a constructor to initialize the model.

  • Line 6: The model is set to the chosen language model.

  • Line 8–18: A method to continue a conversation given a prompt.

  • Line 10: Make an API call to OpenAI.

  • Line 13: Check if the response was truncated.

  • Line 14: Append the next part of the conversation if the response was truncated.

  • Line 17: Make another API call to continue the conversation.

  • Line 18: Return the content of the generated message.

  • Line 22: Set the OpenAI API key.

  • Line 25–27: The instance of ConversationAgent is created, and the conversation prompts are set.

  • Line 28: Print the continued conversation.

In this example, we first make an API call with a long conversation. If the response comes back as incomplete, we get the subsequent prompt ready and proceed with the conversation by calling the openai.ChatCompletion.create function once more. This approach enables us to manage prolonged conversations that go beyond the token limit.

Conclusion

Managing incomplete responses from the OpenAI API demands careful handling of tokens and a solid understanding of the API's workings. By checking the finish_reason and suitably continuing the conversation, we can assure a seamless and cohesive interaction with the API.

Copyright ©2024 Educative, Inc. All rights reserved