Release 2024.4.0.0 · openvinotoolkit/openvino.genai

Please check out the latest documentation pages related to the new openvino_genai package!

What's Changed

Support chat conversation for StaticLLMPipeline by @TolyaTalamanov in #580
Prefix caching. by @popovaan in #639
Allow to build GenAI with OpenVINO via extra modules by @ilya-lavrenov in #726
Simplified partial preemption algorithm. by @popovaan in #730
Add set_chat_template by @Wovchena in #734
Detect KV cache sequence length axis by @as-suvorov in #744
Enable u8 KV cache precision for CB by @ilya-lavrenov in #759
Add test case for native pytorch model by @wgzintel in #722
Prefix caching improvements by @popovaan in #758
Add USS metric by @wgzintel in #762
Prefix caching optimization by @popovaan in #785
Transition to default int4 compression configs from optimum-intel by @nikita-savelyevv in #689
Control KV-cache size for StaticLLMPipeline by @TolyaTalamanov in #795
[2024.4] update optimum intel commit to include mxfp4 conversion by @eaidova in #828
[2024.4] use perf metrics for genai in llm bench by @eaidova in #830
Update Pybind to version 13 by @mryzhov in #836
Introduce stop_strings and stop_token_ids sampling params [2024.4 base] by @mzegla in #817
StaticLLMPipeline: Handle single element list of prompts by @TolyaTalamanov in #848
Fix Meta-Llama-3.1-8B-Instruct chat template by @pavel-esir in #846
Add GPU support for continuous batching [2024.4] by @sshlyapn in #858

Full Changelog: 2024.3.0.0...2024.4.0.0