andrei-kochin
released this
23 Sep 08:19
·
8 commits
to releases/2024/4
since this release
Please check out the latest documentation pages related to the new openvino_genai
package!
What's Changed
- Support chat conversation for StaticLLMPipeline by @TolyaTalamanov in #580
- Prefix caching. by @popovaan in #639
- Allow to build GenAI with OpenVINO via extra modules by @ilya-lavrenov in #726
- Simplified partial preemption algorithm. by @popovaan in #730
- Add set_chat_template by @Wovchena in #734
- Detect KV cache sequence length axis by @as-suvorov in #744
- Enable u8 KV cache precision for CB by @ilya-lavrenov in #759
- Add test case for native pytorch model by @wgzintel in #722
- Prefix caching improvements by @popovaan in #758
- Add USS metric by @wgzintel in #762
- Prefix caching optimization by @popovaan in #785
- Transition to default int4 compression configs from optimum-intel by @nikita-savelyevv in #689
- Control KV-cache size for StaticLLMPipeline by @TolyaTalamanov in #795
- [2024.4] update optimum intel commit to include mxfp4 conversion by @eaidova in #828
- [2024.4] use perf metrics for genai in llm bench by @eaidova in #830
- Update Pybind to version 13 by @mryzhov in #836
- Introduce stop_strings and stop_token_ids sampling params [2024.4 base] by @mzegla in #817
- StaticLLMPipeline: Handle single element list of prompts by @TolyaTalamanov in #848
- Fix Meta-Llama-3.1-8B-Instruct chat template by @pavel-esir in #846
- Add GPU support for continuous batching [2024.4] by @sshlyapn in #858
Full Changelog: 2024.3.0.0...2024.4.0.0