verl-project/verl

Support Generative Reward Model (GenRM)

Open

#229 opened on Feb 9, 2025

View on GitHub
 (17 comments) (0 reactions) (1 assignee)Python (21,533 stars) (3,940 forks)auto 404
enhancementgood first issue

Description

According to the documentation, veRL only supports AutoModelForSequenceClassification. What would be the best way to implement generative reward model (GenRM) for veRL? I tried looking at FSDP Workers and Megatron-LM Workers but they no longer exist.

Contributor guide