Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[0.4.0] - 2023-11-23

Added

Added support for Yi models. (#27)
Added support for BTLM models. (#29)

Fixed

Prevent crash if large input is provided. (#23)
Update QWen to match recent updates to the QWen modeling files. (#33)

[0.3.0] - 2023-10-19

Added

Added support for Qwen models. (#15)
Added support for StableLM_Epoch models. (#20)

Changed

Changed how Attention Sinks are injected into models, allows attention_sinks to be integrated with architectures that aren't in transformers (#16)

[0.2.3] - 2023-10-10

Added

Added support for GPT-J models. (#13)

[0.2.2] - 2023-10-04

Fixed

Fix model.generate for all model architectures. (#6)

[0.2.1] - 2023-10-03

Fixed

Implemented parity between attention_sinks and transformers==4.34.0 for Falcon and Llama.

[0.2.0] - 2023-10-03

Added

Added support for Mistral models. (#5)
Added support for GPT-NeoX/Pythia models. (#4)
Added support for MPT models. (#3)

[0.1.0] - 2023-10-03

Added

Added support for Falcon models. (#2)

[0.0.1] - 2023-10-02

Added

Implement initial working version.