unslothai/unsloth

How to set "reasoning_effort" of GPT-OSS during GRPO rollouts?

Open

#3949 opened on Jan 29, 2026

View on GitHub
 (3 comments) (0 reactions) (0 assignees)Python (64,271 stars) (5,658 forks)batch import
help wanted

Description

  1. Did you update? Yes
  2. Colab or Kaggle or local / cloud. Kaggle
  3. Number GPUs used, use nvidia-smi. 1
  4. Which trainer? GRPOTrainer

Question: We can see how to set reasoning_effort when manually inference in official colab example:

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt = True,
    return_tensors = "pt",
    return_dict = True,
    reasoning_effort = "low", # **NEW!** Set reasoning effort to low, medium or high
).to("cuda")

_ = model.generate(**inputs, max_new_tokens = 64, streamer = TextStreamer(tokenizer))

But how to set reasoning_effort of GRPO Trainer (rollouts)? I could not find this option in official colab examples. I have tested that maybeGRPOTrainer is using "high" by default . But for me "medium" is enough and more time-efficient.

Thanks in advance!

Contributor guide