Fast Inference Solutions for BLOOM

This repo provides demos and packages to perform fast inference solutions for BLOOM. Some of the solutions have their own repos in which case a link to the corresponding repos is provided instead.

Some of the solutions provide both half-precision and int8-quantized solution.

Client-side solutions

Solutions developed to perform large batch inference locally:

Pytorch:

JAX:

BLOOM Inference in JAX

Server solutions

Solutions developed to be used in a server mode (i.e. varied batch size, varied request rate):

Pytorch:

Accelerate and DeepSpeed-Inference based solutions

Rust:

Bloom-server

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
bloom-inference-scripts		bloom-inference-scripts
inference_server		inference_server
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fast Inference Solutions for BLOOM

Client-side solutions

Server solutions

About

Releases

Packages

Languages

License

RezaYazdaniAminabadi/transformers-bloom-inference

Folders and files

Latest commit

History

Repository files navigation

Fast Inference Solutions for BLOOM

Client-side solutions

Server solutions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages