unslothai/unsloth

How to set "reasoning_effort" of GPT-OSS during GRPO rollouts?

Open

#3,949 opened on 2026年1月29日

GitHub で見る
 (3 comments) (0 reactions) (0 assignees)Python (64,271 stars) (5,658 forks)batch import
help wanted

説明

  1. Did you update? Yes
  2. Colab or Kaggle or local / cloud. Kaggle
  3. Number GPUs used, use nvidia-smi. 1
  4. Which trainer? GRPOTrainer

Question: We can see how to set reasoning_effort when manually inference in official colab example:

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt = True,
    return_tensors = "pt",
    return_dict = True,
    reasoning_effort = "low", # **NEW!** Set reasoning effort to low, medium or high
).to("cuda")

_ = model.generate(**inputs, max_new_tokens = 64, streamer = TextStreamer(tokenizer))

But how to set reasoning_effort of GRPO Trainer (rollouts)? I could not find this option in official colab examples. I have tested that maybeGRPOTrainer is using "high" by default . But for me "medium" is enough and more time-efficient.

Thanks in advance!

コントリビューターガイド