Update README.md

neuralmagic · Oct 1, 2024 · e944610 · e944610
1 parent 2cd265f
commit e944610
Showing 1 changed file with 2 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -1,11 +1,11 @@
 # AutoFP8
 
+**ATTENTION: AutoFP8 has been deprecated in preference of [`llm-compressor`](https://github.com/vllm-project/llm-compressor), a library for producing all sorts of model compression in addition to FP8. Check out the [FP8 example here](https://github.com/vllm-project/llm-compressor/tree/main/examples/quantization_w8a8_fp8).**
+
 Open-source FP8 quantization library for producing compressed checkpoints for running in vLLM - see https://github.com/vllm-project/vllm/pull/4332 for details on the implementation for inference. This library focuses on providing quantized weight, activation, and kv cache scales for FP8_E4M3 precision.
 
 [FP8 Model Collection from Neural Magic](https://huggingface.co/collections/neuralmagic/fp8-llms-for-vllm-666742ed2b78b7ac8df13127) with many accurate (<1% accuracy drop) FP8 checkpoints ready for inference with vLLM. 
 
-> NOTE: AutoFP8 is in early beta and subject to change
-
 <p align="center">
   <img src="https://github.com/neuralmagic/AutoFP8/assets/3195154/c6bb9ddb-1bc9-48df-bf5f-9d7916dbd1f9" width="40%" />
   <img src="https://github.com/neuralmagic/AutoFP8/assets/3195154/2e30d4c0-340a-4527-8ff7-e8d48a8807ca" width="40%" />