trl

Train transformer language models with reinforcement learning.

91 个版本 Python >=3.10

安装

pip install trl

poetry add trl

pipenv install trl

conda install trl

描述

TRL - Transformers Reinforcement Learning

A comprehensive library to post-train foundation models

🎉 What's New

TRL v1: We released TRL v1 — a major milestone that marks a real shift in what TRL is. Read the blog post to learn more.

Overview

TRL is a cutting-edge library designed for post-training foundation models using advanced techniques like Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), and Direct Preference Optimization (DPO). Built on top of the 🤗 Transformers ecosystem, TRL supports a variety of model architectures and modalities, and can be scaled-up across various hardware setups.

Highlights

Trainers: Various fine-tuning methods are easily accessible via trainers like SFTTrainer, GRPOTrainer, DPOTrainer, RewardTrainer and more.
Efficient and scalable:
- Leverages 🤗 Accelerate to scale from single GPU to multi-node clusters using methods like DDP and DeepSpeed.
- Full integration with 🤗 PEFT enables training on large models with modest hardware via quantization and LoRA/QLoRA.
- Integrates 🦥 Unsloth for accelerating training using optimized kernels.
Command Line Interface (CLI): A simple interface lets you fine-tune with models without needing to write code.

Installation

Python Package

Install the library using pip:

pip install trl

From source

If you want to use the latest features before an official release, you can install TRL from source:

pip install git+https://github.com/huggingface/trl.git

Repository

If you want to use the examples you can clone the repository with the following command:

git clone https://github.com/huggingface/trl.git

Quick Start

For more flexibility and control over training, TRL provides dedicated trainer classes to post-train language models or PEFT adapters on a custom dataset. Each trainer in TRL is a light wrapper around the 🤗 Transformers trainer and natively supports distributed training methods like DDP, DeepSpeed ZeRO, and FSDP.

`SFTTrainer`

Here is a basic example of how to use the SFTTrainer:

from trl import SFTTrainer
from datasets import load_dataset

dataset = load_dataset("trl-lib/Capybara", split="train")

trainer = SFTTrainer(
    model="Qwen/Qwen2.5-0.5B",
    train_dataset=dataset,
)
trainer.train()

`GRPOTrainer`

GRPOTrainer implements the Group Relative Policy Optimization (GRPO) algorithm that is more memory-efficient than PPO and was used to train Deepseek AI's R1.

from datasets import load_dataset
from trl import GRPOTrainer
from trl.rewards import accuracy_reward

dataset = load_dataset("trl-lib/DeepMath-103K", split="train")

trainer = GRPOTrainer(
    model="Qwen/Qwen2.5-0.5B-Instruct",
    reward_funcs=accuracy_reward,
    train_dataset=dataset,
)
trainer.train()

[!NOTE] For reasoning models, use the reasoning_accuracy_reward() function for better results.

`DPOTrainer`

DPOTrainer implements the popular Direct Preference Optimization (DPO) algorithm that was used to post-train Llama 3 and many other models. Here is a basic example of how to use the DPOTrainer:

from datasets import load_dataset
from trl import DPOTrainer

dataset = load_dataset("trl-lib/ultrafeedback_binarized", split="train")

trainer = DPOTrainer(
    model="Qwen/Qwen3-0.6B",
    train_dataset=dataset,
)
trainer.train()

`RewardTrainer`

Here is a basic example of how to use the RewardTrainer:

from trl import RewardTrainer
from datasets import load_dataset

dataset = load_dataset("trl-lib/ultrafeedback_binarized", split="train")

trainer = RewardTrainer(
    model="Qwen/Qwen2.5-0.5B-Instruct",
    train_dataset=dataset,
)
trainer.train()

Command Line Interface (CLI)

You can use the TRL Command Line Interface (CLI) to quickly get started with post-training methods like Supervised Fine-Tuning (SFT) or Direct Preference Optimization (DPO):

SFT:

trl sft --model_name_or_path Qwen/Qwen2.5-0.5B \
    --dataset_name trl-lib/Capybara \
    --output_dir Qwen2.5-0.5B-SFT

DPO:

trl dpo --model_name_or_path Qwen/Qwen2.5-0.5B-Instruct \
    --dataset_name argilla/Capybara-Preferences \
    --output_dir Qwen2.5-0.5B-DPO

Read more about CLI in the relevant documentation section or use --help for more details.

Development

If you want to contribute to trl or customize it to your needs make sure to read the contribution guide and make sure you make a dev install:

git clone https://github.com/huggingface/trl.git
cd trl/
pip install -e .[dev]

Experimental

A minimal incubation area is available under trl.experimental for unstable / fast-evolving features. Anything there may change or be removed in any release without notice.

Example:

from trl.experimental.new_trainer import NewTrainer

Citation

@software{vonwerra2020trl,
  title   = {{TRL: Transformers Reinforcement Learning}},
  author  = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
  license = {Apache-2.0},
  url     = {https://github.com/huggingface/trl},
  year    = {2020}
}

License

This repository's source code is available under the Apache-2.0 License.

分类

Development Status :: 2 - Pre-Alpha Intended Audience :: Developers Intended Audience :: Science/Research Natural Language :: English Operating System :: OS Independent Programming Language :: Python :: 3 Programming Language :: Python :: 3.10 Programming Language :: Python :: 3.11 Programming Language :: Python :: 3.12 Programming Language :: Python :: 3.13 Programming Language :: Python :: 3.14

版本列表

1.6.0 2026-06-11

1.5.1 2026-05-27

1.5.0 2026-05-25

1.4.0 2026-05-08

1.3.0 2026-04-26

1.2.0 2026-04-17

1.1.0 2026-04-12

1.0.0 2026-03-30

1.0.0rc1 2026-03-20

0.29.1 2026-03-20

0.29.0 2026-02-25

0.28.0 2026-02-10

0.27.2 2026-02-03

0.27.1 2026-01-24

0.27.0 2026-01-16

0.26.2 2025-12-18

0.26.1 2025-12-12

0.26.0 2025-12-09

0.25.1 2025-11-12

0.25.0 2025-11-05

0.24.0 2025-10-16

0.23.1 2025-10-02

0.23.0 2025-09-10

0.22.2 2025-09-03

0.22.1 2025-08-29

0.22.0 2025-08-29

0.21.0 2025-08-05

0.20.0 2025-07-29

0.19.1 2025-07-08

0.19.0 2025-06-20

0.18.2 2025-06-15

0.18.1 2025-05-29

0.18.0 2025-05-28

0.17.0 2025-04-24

0.16.1 2025-04-04

0.16.0 2025-03-22

0.15.2 2025-02-25

0.15.1 2025-02-18

0.15.0 2025-02-13

0.14.0 2025-01-29

0.13.0 2024-12-16

0.12.2 2024-12-06

0.12.1 2024-11-14

0.12.0 2024-11-01

0.11.4 2024-10-15

0.11.3 2024-10-10

0.11.2 2024-10-07

0.11.1 2024-09-24

0.11.0 2024-09-19

0.10.1 2024-08-29

0.9.6 2024-07-08

0.9.4 2024-06-06

0.9.3 2024-06-05

0.9.2 2024-06-05

0.8.6 2024-04-22

0.8.5 2024-04-18

0.8.4 2024-04-17

0.8.3 2024-04-12

0.8.2 2024-04-11

0.8.1 2024-03-20

0.8.0 2024-03-19

0.7.11 2024-02-16

0.7.10 2024-01-19

0.7.9 2024-01-09

0.7.8 2024-01-09

0.7.7 2023-12-26

0.7.6 2023-12-22

0.7.5 2023-12-22

0.7.4 2023-11-08

0.7.3 2023-11-08

0.7.2 2023-10-12

0.7.1 2023-08-30

0.7.0 2023-08-30

0.6.0 2023-08-24

0.5.0 2023-08-02

0.4.7 2023-07-13

0.4.6 2023-06-23

0.4.5 2023-06-23

0.4.4 2023-06-08

0.4.3 2023-06-08

0.4.2 2023-06-07

0.4.1 2023-03-17

0.4.0 2023-03-09

0.3.1 2023-03-02

0.3.0 2023-03-01

0.2.1 2023-01-25

0.2.0 2023-01-25

0.1.0 2022-05-15

0.0.3 2021-02-28

0.0.2 2020-07-17

0.0.1 2020-03-30

trl

安装

描述