[Docs] Update DPO example to use DPOConfig instead of TrainingArguments
#4155 opened on Mar 4, 2026
Description
Description
The current DPO example code in the documentation causes an AttributeError because it uses TrainingArguments from transformers instead of DPOConfig from trl. Recent versions of trl require DPOConfig for the DPOTrainer to properly handle DPO-specific arguments like padding_value.
URL: https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide/preference-dpo-orpo-and-kto
Error Log
Traceback (most recent call last):
File "/workspace/work/main.py", line 74, in <module>
dpo_trainer = DPOTrainer(
^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/unsloth/trainer.py", line 314, in new_init
original_init(self, *args, **kwargs)
...
File "/workspace/work/unsloth_compiled_cache/UnslothDPOTrainer.py", line 903, in __init__
if args.padding_value is not None:
^^^^^^^^^^^^^^^^^^
AttributeError: 'TrainingArguments' object has no attribute 'padding_value'
Environment
- unsloth: 2026.3.3
- trl: 0.23.1
- transformers: 4.57.1
Suggested Fix
Replacing TrainingArguments with DPOConfig resolves the issue.
Current:
from transformers import TrainingArguments
...
dpo_trainer = DPOTrainer(
model = model,
args = TrainingArguments(
...
),
)
Proposed:
from trl import DPOConfig # Changed from TrainingArguments
...
dpo_trainer = DPOTrainer(
model = model,
args = DPOConfig( # Use DPOConfig
per_device_train_batch_size = 4,
...
),
)
How can I contribute?
I would like to submit a Pull Request to update the documentation if this is acceptable. However, I couldn't find the source files for the documentation (GitBook) in this repository.
Could you please guide me on where the documentation source is located or how I should proceed with a PR?