Resume in the pretraining code · Lightning-AI/lit-llama#359

(2 评论) (1 反应) (0 负责人)Python (473 fork)batch import

help wanted

仓库指标

Star: (5,533 star)
PR 合并指标: (30 天内没有已合并 PR)

描述

I would like to request a new feature in the code: the ability to resume training from a checkpoint.

Currently, the code can save a checkpoint of the model's state at any point during training. However, there is no way to resume training from a checkpoint.

The code can save two things along with the model state_dict: 1)the optimizer, 2)the id of the last example it has seen (assuming the data is fed sequentially to the model not randomly)

贡献者指南

研究方向: 实现检查点加载以恢复模型、优化器和数据状态，然后继续训练循环。
技术栈: python
领域: ai
议题类型: 功能
难度: 3
预计时间: 半天
活动状态: 新近可参与
清晰度: 清晰
前置要求: PythonPyTorch
新手友好度: 65

仓库指标

描述

贡献者指南

每天在邮箱收到新鲜 Easy issues。