[Docs] Update DPO example to use DPOConfig instead of TrainingArguments · unslothai/unsloth#4155

(4 comments) (0 reactions) (0 assignees)Python (64,271 stars) (5,658 forks)batch import

good first issuehelp wanted

Description

The current DPO example code in the documentation causes an AttributeError because it uses TrainingArguments from transformers instead of DPOConfig from trl. Recent versions of trl require DPOConfig for the DPOTrainer to properly handle DPO-specific arguments like padding_value.

URL: https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide/preference-dpo-orpo-and-kto

Error Log

Traceback (most recent call last):
  File "/workspace/work/main.py", line 74, in <module>
    dpo_trainer = DPOTrainer(
                  ^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/unsloth/trainer.py", line 314, in new_init
    original_init(self, *args, **kwargs)
  ...
  File "/workspace/work/unsloth_compiled_cache/UnslothDPOTrainer.py", line 903, in __init__
    if args.padding_value is not None:
       ^^^^^^^^^^^^^^^^^^
AttributeError: 'TrainingArguments' object has no attribute 'padding_value'

Environment

unsloth: 2026.3.3
trl: 0.23.1
transformers: 4.57.1

Suggested Fix

Replacing TrainingArguments with DPOConfig resolves the issue.

Current:

from transformers import TrainingArguments
...
dpo_trainer = DPOTrainer(
    model = model,
    args = TrainingArguments(
        ...
    ),
)

Proposed:

from trl import DPOConfig # Changed from TrainingArguments
...
dpo_trainer = DPOTrainer(
    model = model,
    args = DPOConfig( # Use DPOConfig
        per_device_train_batch_size = 4,
        ...
    ),
)

How can I contribute?

I would like to submit a Pull Request to update the documentation if this is acceptable. However, I couldn't find the source files for the documentation (GitBook) in this repository.

Could you please guide me on where the documentation source is located or how I should proceed with a PR?

Contributor guide

Tech stack: python
Domain: documentation
Issue type: documentation
Difficulty: 1
Estimated time: under 1 hour
Activity status: fresh
Clarity: clear
Prerequisites: None
Newbie friendliness: 95
Research direction: The issue describes an AttributeError when using TrainingArguments instead of DPOConfig in the DPO example code. The fix is to replace TrainingArguments with DPOConfig. However, the reporter cannot find the documentation source files in the repository. The contributor should look for documentation source files, possibly in a docs directory or a separate GitBook repository. Check the repository structure or ask maintainers for the exact location of the DPO example documentation.