Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added streaming langchain example. #68

Open
wants to merge 33 commits into
base: master
Choose a base branch
from

Conversation

CoffeeVampir3
Copy link

I think adding this as an example makes the most sense, this is a relatively complete example of a conversation model setup using Exllama and langchain. I've probably made some dumb mistakes as I'm not extremely familiar with the inner workings of Exllama, but this is a working example.

I should note, this is meant to serve as an example for streaming, it falls back to generate_simple on non-streaming and isin't meant to be used here.

@turboderp
Copy link
Owner

turboderp commented Jun 20, 2023

So, I can't actually get this to produce any output? If I just run it as is, with a prompt of "Hello?" and a breakpoint in the stream() function, the context passed to the model looks like this:

The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: 
    ### Instruction: 
    You are an extremely serious chatbot. Do exactly what is asked of you and absolutely nothing more.
    ### User:
    Hello?
    ### Response:

    
AI:

It looks like there are two nested prompt formats there. I would expect the generation to start from "### Response:" following the Alpaca template, and at least with the models I've tried the model starts by generating " \n ###", which becomes a stop condition.

@CoffeeVampir3
Copy link
Author

CoffeeVampir3 commented Jun 20, 2023

I don't exactly know why the model wouldn't generate anything, potentially this was an issue with models being temperamental about the formats.

I made the following changes:
Added a bunch of debugging outputs and some basic benchmarking.
Switched the prompt template to an airoboros-vicuna format using the advice from https://huggingface.co/jondurbin/airoboros-33b-gpt4-1.2 and also wired it correctly to the history as well, their should be only one singular prompt format now.
Corrected some bugs around the generation length running past the attention cache's maximum size.
Fixed stops being case sensitive

I'm unsure why nothing would be generated, but potentially the models were being confused by the mixed formats. If the issue persists that's more troubling as I'd have no idea what would cause nothing to be generated here. I've tested on about 10 models and they're all performing quite well. At any rate, let me know if the issues continue and I'll investigate if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants