Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue when streaming LLM response #1523

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

MottoX
Copy link
Contributor

@MottoX MottoX commented Oct 28, 2024

Problem

Currently, if stream flag is set to True in the LLM params, there will be an exception raised fromllm.py. The reason is that litellm will return an instance of litellm's CustomStreamWrapper which cannot be accessed in the current way. This is a real issue for our use case where we stream LLM response for special handling of lengthy content as well as optimizing application performance.

Proposed Change

To solve this, we can check the stream flag when calling litellm.completion and have a separated handling for streaming mode.

Test Case

The following sample code fails due to TypeError: 'CustomStreamWrapper' object is not subscriptable and succeeds after the PR.

from crewai import Agent, Task, LLM
from crewai import Crew, Process


api_key = "my_key"
my_llm = LLM(model="gpt-4o-mini", api_key=api_key, stream=True)

# Create a researcher agent
researcher = Agent(
    role='Senior Researcher',
    goal='Discover groundbreaking technologies',
    verbose=True,
    llm=my_llm,
    backstory='A curious mind fascinated by cutting-edge innovation and the potential to change the world, you know everything about tech.'
)

# Task for the researcher
research_task = Task(
    description='Identify the next big trend in AI',
    expected_output="A single short sentence.",
    async_execution=False,
    agent=researcher  # Assigning the task to the researcher
)

# Instantiate your crew
crew = Crew(
    agents=[researcher],
    tasks=[research_task],
    process=Process.sequential  # Tasks will be executed one after the other
)

# Begin the task execution
crew.kickoff()

@@ -153,7 +153,13 @@ def call(self, messages: List[Dict[str, str]], callbacks: List[Any] = []) -> str
params = {k: v for k, v in params.items() if v is not None}

response = litellm.completion(**params)
return response["choices"][0]["message"]["content"]
if params.get("stream", False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're trying to listen to the stream, wouldn't you want this to be True?

Copy link
Contributor Author

@MottoX MottoX Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"stream" is set to False in params by default but can be overridden through kwargs.
Despite this, here we pass a default value in get() function, to make the code more readable and independent of preceding code.
So, stream option can be enabled by passing stream=True when creating crewai.LLM instance.

@bhancockio
Copy link
Collaborator

Hey @MottoX!

Thank you for submitting this PR! Could you please elaborate more on this use case?

It looks like you are trying to create an LLM that is going to stream a response. However, within crewAI, we don't really support streaming results.

Are you trying to add support for streaming because you're going to use the same LLM with streaming elsewhere?

@MottoX
Copy link
Contributor Author

MottoX commented Oct 29, 2024

Hello @bhancockio
Our team is developing applications using crewAI and in-house LLMs to process extensive data and generate or proofread articles. In some tasks, particularly those with lengthy request messages, non-streaming requests can lead to server timeouts. Invoking LLM in streaming fashion can address the issue and ensure that sufficient content is generated in the response. We have implemented engineering optimizations for LLM streaming, enhancing its efficiency, stability, and fault tolerance.

However, after upgrading to the latest crewAI version, we found that crewAI, now using litellm for LLM invocation, only supports non-streaming requests.

This PR aims to enhance crewAI by enabling users to interact with their LLMs in streaming mode via passing stream=True in crewai.LLM. This feature will benefit users like us and make crewAI more flexible and adaptable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants