Added streaming langchain example. #68

CoffeeVampir3 · 2023-06-18T14:40:48Z

I think adding this as an example makes the most sense, this is a relatively complete example of a conversation model setup using Exllama and langchain. I've probably made some dumb mistakes as I'm not extremely familiar with the inner workings of Exllama, but this is a working example.

I should note, this is meant to serve as an example for streaming, it falls back to generate_simple on non-streaming and isin't meant to be used here.

turboderp · 2023-06-20T20:12:38Z

So, I can't actually get this to produce any output? If I just run it as is, with a prompt of "Hello?" and a breakpoint in the stream() function, the context passed to the model looks like this:

The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: 
    ### Instruction: 
    You are an extremely serious chatbot. Do exactly what is asked of you and absolutely nothing more.
    ### User:
    Hello?
    ### Response:

    
AI:

It looks like there are two nested prompt formats there. I would expect the generation to start from "### Response:" following the Alpaca template, and at least with the models I've tried the model starts by generating " \n ###", which becomes a stop condition.

CoffeeVampir3 · 2023-06-20T23:51:23Z

I don't exactly know why the model wouldn't generate anything, potentially this was an issue with models being temperamental about the formats.

I made the following changes:
Added a bunch of debugging outputs and some basic benchmarking.
Switched the prompt template to an airoboros-vicuna format using the advice from https://huggingface.co/jondurbin/airoboros-33b-gpt4-1.2 and also wired it correctly to the history as well, their should be only one singular prompt format now.
Corrected some bugs around the generation length running past the attention cache's maximum size.
Fixed stops being case sensitive

I'm unsure why nothing would be generated, but potentially the models were being confused by the mixed formats. If the issue persists that's more troubling as I'd have no idea what would cause nothing to be generated here. I've tested on about 10 models and they're all performing quite well. At any rate, let me know if the issues continue and I'll investigate if necessary.

…press emb

Added langchain example.

b301a53

CoffeeVampir3 mentioned this pull request Jun 18, 2023

Possible to add a pip package? #63

Closed

CoffeeVampir3 added 3 commits June 18, 2023 09:03

Update langchain-exllama-example.py

673caa9

Merge branch 'turboderp:master' into master

dca75f6

Merge branch 'turboderp:master' into master

2fdfff2

CoffeeVampir3 and others added 3 commits June 20, 2023 18:08

Merge branch 'turboderp:master' into master

6c47843

Much more robust, changed up prompt format.

973792c

Tweak to debugging so it's all external to the main functions

423473c

CoffeeVampir3 and others added 20 commits June 22, 2023 11:19

Update langchain-exllama-example.py

bd7719f

Merge branch 'turboderp:master' into master

77279b7

Merge branch 'turboderp:master' into master

0ad1c28

Added compress embedding parameter

e4f2895

Cleaned up and fixed settings.

716c48a

Logging now respects verbose setting.

4ed38d7

Cleaned up logging impl.

7006896

Lora support

82c0930

Merge branch 'turboderp:master' into master

4b16bb5

Multi gpu support

d594532

Cleaned up, more modular. Example of auto map usage, max seq, and com…

d0ff302

…press emb

Added all supported parameters

988f23f

Merge branch 'turboderp:master' into master

ee6157b

More sane settings scheme.

a16ffe9

Added some prompt templates, stabilized settings.

f22bc2e

Fixed disallowed tokens

db75eb6

Merge branch 'turboderp:master' into master

7204101

Merge branch 'turboderp:master' into master

a1bf6f6

Merge branch 'turboderp:master' into master

2caf806

Merge branch 'turboderp:master' into master

a140d04

CoffeeVampir3 and others added 6 commits July 13, 2023 18:34

Merge branch 'turboderp:master' into master

d44d3da

Stamped out some pesky bugs.

6cacab1

Finally fixed memory ring buffer.

275f0fe

Merge branch 'turboderp:master' into master

c88e480

Merge branch 'turboderp:master' into master

f738cc8

Merge branch 'turboderp:master' into master

2ba2940

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added streaming langchain example. #68

Added streaming langchain example. #68

CoffeeVampir3 commented Jun 18, 2023

turboderp commented Jun 20, 2023 •

edited

Loading

CoffeeVampir3 commented Jun 20, 2023 •

edited

Loading

Added streaming langchain example. #68

Are you sure you want to change the base?

Added streaming langchain example. #68

Conversation

CoffeeVampir3 commented Jun 18, 2023

turboderp commented Jun 20, 2023 • edited Loading

CoffeeVampir3 commented Jun 20, 2023 • edited Loading

turboderp commented Jun 20, 2023 •

edited

Loading

CoffeeVampir3 commented Jun 20, 2023 •

edited

Loading