-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API Server #47
API Server #47
Conversation
Noticed i accidentally renamed ProgramArgs to ServerArgs in main.cpp, |
@b4rtaz |
@DifferentialityDevelopment I need a bit of time to test it, after I'll release the refactored multihead layers I'll switch to this (maybe 2-5 days). |
This pull request introduces API functionality to the distributed llama project. The main addition is the implementation of the chat completion endpoint, following the specifications outlined by OpenAI for chat completions.
Key features of this implementation include streaming support, the capability to terminate generation upon detecting a stop word or end-of-sequence token (EOS), and the ability to dynamically adjust parameters such as temperature, seed, and top probability (top-p) for each request.
The code has undergone significant refactoring to enhance clarity and maintainability. I have tested the changes locally to ensure functionality. Your feedback on the implementation is highly appreciated.