Investigate the missing `save checkpoint` method in `verl/workers/megatron workers.py` at line 428. Examine existing checkpoint implementations in other VERL workers (e.g., `fsdp workers.py`) and Megatron LM's checkpointing utilities. Implement a method that saves model and optimizer states in a format compatible with `load checkpoint`. Review the 4 comments on the issue for any additional context or proposed approaches.
Megatron cannot save checkpoint? · verl-project/verl#277 | Good First Issue