unslothai/unsloth
在 GitHub 查看How to set "reasoning_effort" of GPT-OSS during GRPO rollouts?
Open
#3,949 建立於 2026年1月29日
help wanted
描述
- Did you update? Yes
ColaborKaggleor local / cloud. Kaggle- Number GPUs used, use
nvidia-smi. 1 - Which trainer? GRPOTrainer
Question: We can see how to set reasoning_effort when manually inference in official colab example:
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt = True,
return_tensors = "pt",
return_dict = True,
reasoning_effort = "low", # **NEW!** Set reasoning effort to low, medium or high
).to("cuda")
_ = model.generate(**inputs, max_new_tokens = 64, streamer = TextStreamer(tokenizer))
But how to set reasoning_effort of GRPO Trainer (rollouts)? I could not find this option in official colab examples. I have tested that maybeGRPOTrainer is using "high" by default . But for me "medium" is enough and more time-efficient.
Thanks in advance!