Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Server #47

Merged
merged 17 commits into from
May 19, 2024
Merged

API Server #47

merged 17 commits into from
May 19, 2024

Conversation

DifferentialityDevelopment
Copy link
Contributor

This pull request introduces API functionality to the distributed llama project. The main addition is the implementation of the chat completion endpoint, following the specifications outlined by OpenAI for chat completions.

Key features of this implementation include streaming support, the capability to terminate generation upon detecting a stop word or end-of-sequence token (EOS), and the ability to dynamically adjust parameters such as temperature, seed, and top probability (top-p) for each request.

The code has undergone significant refactoring to enhance clarity and maintainability. I have tested the changes locally to ensure functionality. Your feedback on the implementation is highly appreciated.

server Outdated Show resolved Hide resolved
src/server.cpp Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
@DifferentialityDevelopment
Copy link
Contributor Author

DifferentialityDevelopment commented May 12, 2024

Noticed i accidentally renamed ProgramArgs to ServerArgs in main.cpp,
This wasn't intentional and will be reverted.

@DifferentialityDevelopment
Copy link
Contributor Author

DifferentialityDevelopment commented May 12, 2024

@b4rtaz
I've made the change to rather use SocketServer and I've added a function on a Socket that's meant just for reading an full http request, using read caused it to never finish since I didn't know ahead of time how much data is being sent.
I don't want to change the current read function as it's being used by the workers and such and don't want to break anything there.

@b4rtaz
Copy link
Owner

b4rtaz commented May 12, 2024

@DifferentialityDevelopment I need a bit of time to test it, after I'll release the refactored multihead layers I'll switch to this (maybe 2-5 days).

@b4rtaz b4rtaz merged commit 0590ece into b4rtaz:main May 19, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants