enhancementgood first issue
Description
According to the documentation, veRL only supports AutoModelForSequenceClassification. What would be the best way to implement generative reward model (GenRM) for veRL? I tried looking at FSDP Workers and Megatron-LM Workers but they no longer exist.