Skip to content

Releases: huggingface/trl

v0.7.9: Patch release for DPO & SFTTrainer

09 Jan 12:06
7a95cc8
Compare
Choose a tag to compare

v0.7.9: Patch release for DPO & SFTTrainer

This is a patch release that fixes critical issues with SFTTrainer & DPOTrainer, together with minor fixes for PPOTrainer and DataCollatorForCompletionOnlyLM

What's Changed

Full Changelog: v0.7.8...v0.7.9

v0.7.8: Unsloth tag, DPO fixes, PEFT support for DDPO

09 Jan 04:17
Compare
Choose a tag to compare

v0.7.8: Unsloth tag, DPO fixes, PEFT support for DDPO

Unsloth tag for xxxTrainer

If users use Unsloth library, the unsloth tag gets automatically pushed on the Hub.

DPO fixes

Some important fixes for DPO has been introduced to address: https://twitter.com/jon_durbin/status/1743575483365699809 and to make DPO faster

  • Allow separate devices for target/ref models. by @jondurbin in #1190
  • Allow swapping PEFT adapters for target/ref model. by @jondurbin in #1193
  • Change device access order for speedup of calculating metrics in DPOTrainer by @brcps12 in #1154

DDPO + PEFT

Now DDPO supports PEFT

Other fixes

New Contributors

Full Changelog: v0.7.7...v0.7.8

v0.7.7

26 Dec 09:27
Compare
Choose a tag to compare

v0.7.7: Patch release PPO & DDPO tags

A fix has been introduce to fix a breaking change with PPOTrainer.push_to_hub() and DDPOTrainer.push_to_hub()

What's Changed

New Contributors

Full Changelog: v0.7.6...v0.7.7

v0.7.6: Patch release - Multi-tag instead of single tags for `xxxTrainer`

22 Dec 14:10
Compare
Choose a tag to compare

Patch release: Multi-tag instead of single tags for xxxTrainer

This is a patch release to push multiple tags (e.g. trl & sft) instead of one tag

What's Changed

Full Changelog: v0.7.5...v0.7.6

v0.7.5: IPO & KTO & cDPO loss, `DPOTrainer` enhancements, automatic tags for `xxxTrainer`

22 Dec 13:09
Compare
Choose a tag to compare

IPO & KTO & cDPO loss, DPOTrainer enhancements, automatic tags for xxxTrainer

Important enhancements for DPOTrainer

This release introduces many new features in TRL for DPOTrainer:

  • IPO-loss for a better generalization of DPO algorithm
  • KTO & cDPO loss
  • You can also pass pre-computed logits to DPOTrainer

Automatic xxxTrainer tagging on the Hub

Now, trainers from TRL pushes automatically tags trl-sft, trl-dpo, trl-ddpo when pushing models on the Hub

unsloth 🤝 TRL

We encourage users to try out unsloth library for faster LLM fine-tuning using PEFT & TRL's SFTTrainer and DPOTrainer

What's Changed

New Contributors

Full Changelog: v0.7.4...v0.7.5

v0.7.4: Patch Release

10 Nov 15:07
Compare
Choose a tag to compare

Patch Release

This release is a patch release that addresses an issue for users that have TRL installed without PEFT

What's Changed

Full Changelog: v0.7.3...v0.7.4

v0.7.3:`IterativeTrainer`, NEFTune and major bugfixes for `DPOTrainer` and Distributed Training

10 Nov 15:06
Compare
Choose a tag to compare

IterativeTrainer, NEFTune and major bugfixes for DPOTrainer and Distributed Training

In this release we introduce two new features, IterativeTrainer from @gaetanlop and NEFTune, together with important bugfixes for distributed training.

IterativeTrainer

Iterative fine-tuning is a training method that enables to perform custom actions (generation and filtering for example) between optimization steps. In TRL we provide an easy-to-use API to fine-tune your models in an iterative way in just a few lines of code.

Read more about it here: https://huggingface.co/docs/trl/iterative_sft_trainer

NEFTune

NEFTune is a technique to boost the performance of chat models and was introduced by the paper “NEFTune: Noisy Embeddings Improve Instruction Finetuning” from Jain et al. it consists of adding noise to the embedding vectors during training. According to the abstract of the paper:

Read more about it here

Major bugfixes

Major bugfixes have been addressed to tackle many issues with distributed training and gradient checkpointing.

  • [DPO] fix DPO + GC issues by @younesbelkada in #927
  • [core / DDP] Fix RM trainer + DDP + quantization + propagate gradient_checkpointing_kwargs in SFT & DPO by @younesbelkada in #912

DPOTrainer enhancements and fixes

The DPOTrainer now comes with multiple enhancements and bugfixes! Check them out below

What's Changed

New Contributors

Full Changelog: v0.7.2...v0.7.3

v0.7.2

12 Oct 13:32
Compare
Choose a tag to compare

0.7.2: Flash Attention documentation and Minor bugfixes

In this release we provide minor bugfixes and smoother user experience for all public classes. We also added some clarification on the documentation on how to use Flash Attention with SFTTrainer

How to use Flash Attention with SFTTrainer:

What's Changed

New Contributors

Full Changelog: v0.7.1...v0.7.2

v0.7.1: Patch release

30 Aug 15:38
Compare
Choose a tag to compare

Patch release: fix bug with PPOTrainer and log_stats

Fixed a bug with log_stats of PPOTrainer to avoid breaking behaviour

What's Changed

Full Changelog: v0.7.0...v0.7.1

v0.7.0: Text Environments, Agents & Tools

30 Aug 15:38
Compare
Choose a tag to compare

Text environments, LLMs with tools and agents!

Text environments provide a learning ground for language agents. It allows a language model to use tools to accomplish a task such as using a Python interpreter to answer math questions or using a search index for trivia questions. Having access to tools allows language models to solve tasks that would be very hard for the models itself but can be trivial for the appropriate tools.

We are excited to bring to the community a complete set of functionalities and full examples to train LLMs to use tools!

Check out the documentation page here and few examples below:

What's Changed

Full Changelog: v0.6.0...v0.7.0