How to set "reasoning_effort" of GPT-OSS during GRPO rollouts? · unslothai/unsloth#3949

(3 comments) (0 reactions) (0 assignees)Python (64,271 stars) (5,658 forks)batch import

help wanted

説明

Did you update? Yes
Colab or Kaggle or local / cloud. Kaggle
Number GPUs used, use nvidia-smi. 1
Which trainer? GRPOTrainer

Question: We can see how to set reasoning_effort when manually inference in official colab example:

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt = True,
    return_tensors = "pt",
    return_dict = True,
    reasoning_effort = "low", # **NEW!** Set reasoning effort to low, medium or high
).to("cuda")

_ = model.generate(**inputs, max_new_tokens = 64, streamer = TextStreamer(tokenizer))

But how to set reasoning_effort of GRPO Trainer (rollouts)? I could not find this option in official colab examples. I have tested that maybeGRPOTrainer is using "high" by default . But for me "medium" is enough and more time-efficient.

Thanks in advance!

コントリビューターガイド

技術スタック: pythonpytorch
領域: machine learning
Issue 種別: feature
難度: 2
推定時間: under 1 hour
活動状況: needs maintainer response
明確さ: clear
前提条件: PythonPyTorchGRPO trainingunsloth
初心者向け度: 70
調査方針: Investigate the GRPOTrainer code in the unsloth repository. Check the generation config or the process function to see if reasoning effort is passed to the model. If not, a new parameter needs to be added to the trainer's init and propagate to the generate call. Also review any existing discussions in the issue comments for workarounds.