All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Prevent crash if large input is provided. (#23)
- Update QWen to match recent updates to the QWen modeling files. (#33)
- Changed how Attention Sinks are injected into models, allows
attention_sinks
to be integrated with architectures that aren't intransformers
(#16)
- Added support for GPT-J models. (#13)
- Fix
model.generate
for all model architectures. (#6)
- Implemented parity between
attention_sinks
andtransformers==4.34.0
for Falcon and Llama.
- Added support for Mistral models. (#5)
- Added support for GPT-NeoX/Pythia models. (#4)
- Added support for MPT models. (#3)
- Added support for Falcon models. (#2)
- Implement initial working version.