verl-project/verl

Additional memory optimization features

Open

#144 建立於 2025年1月27日

在 GitHub 查看
 (4 留言) (3 反應) (0 負責人)Python (21,533 star) (3,940 fork)auto 404
call for contributionenhancementgood first issue

描述

  • Activation offloading (see implementation here)
  • Fusing optimizer step into backward pass (see implementation here)
  • Utilize full_shard reshard_after_forward (see here). I wasn't 100% sure if I could see this already implemented in veRL.

These optimizations largely trade off decreased peak memory useage for additional compute, so may only be useful for training larger models, and in GPU-constrained settings.

貢獻者指南