- Docker
- GPU: https://huggingface.co/docs/text-generation-inference/en/quicktour#supported-hardware
- 80GB of disk storage for the model and docker image
- Start the service.
docker compose up
- TGI is deployed as a server that implements the OpenAI API protocol.
By default, it starts the server at http://localhost:8000. This server can be queried in the same format as OpenAI API. For example:
curl http://localhost:8000/v1/completions \ -H "Content-Type: application/json" \ -d '{ "model": "llama3-8b-cpt-sea-lionv2.1-instruct", "prompt": "Artificial Intelligence is", "max_tokens": 20, "temperature": 0.8, "repetition_penalty": 1.2 }'