It seems that the "save_checkpoint" method is not implemented.
Contributor guide
Tech stack
pythonpytorch
Domain
backendmachine learning
Issue type
feature
DifficultyEstimated implementation difficulty for a new contributor, from 1 for very small changes to 5 for expert-level work.
3
Estimated timeA rough time range for an experienced contributor to investigate, implement, test, and prepare a pull request.
1-3 hours
Activity statusHow available the issue appears right now: fresh, active, stale, blocked, or waiting on maintainer input.
active
ClarityHow clearly the issue explains the expected change, acceptance criteria, and next step.
clear
Prerequisites
PythonPyTorchMegatron (distributed training)
Newbie friendlinessA 1-100 score estimating how approachable this issue is for first-time contributors.
55
Research direction
Investigate the missing `save checkpoint` method in `verl/workers/megatron workers.py` at line 428. Examine existing checkpoint implementations in other VERL workers (e.g., `fsdp workers.py`) and Megatron LM's checkpointing utilities. Implement a method that saves model and optimizer states in a format compatible with `load checkpoint`. Review the 4 comments on the issue for any additional context or proposed approaches.