verl-project/verl

Support Generative Reward Model (GenRM)

Open

#229 opened on 2025年2月9日

GitHub で見る
 (17 comments) (0 reactions) (1 assignee)Python (21,533 stars) (3,940 forks)auto 404
enhancementgood first issue

説明

According to the documentation, veRL only supports AutoModelForSequenceClassification. What would be the best way to implement generative reward model (GenRM) for veRL? I tried looking at FSDP Workers and Megatron-LM Workers but they no longer exist.

コントリビューターガイド