Resume in the pretraining code · Lightning-AI/lit-llama#359

(2 留言) (1 反應) (0 負責人)Python (473 fork)batch import

help wanted

倉庫指標

Star: (5,533 star)
PR 合併指標: (30 天內沒有已合併 PR)

描述

I would like to request a new feature in the code: the ability to resume training from a checkpoint.

Currently, the code can save a checkpoint of the model's state at any point during training. However, there is no way to resume training from a checkpoint.

The code can save two things along with the model state_dict: 1)the optimizer, 2)the id of the last example it has seen (assuming the data is fed sequentially to the model not randomly)

貢獻者指南

研究方向: 實現檢查點載入以恢復模型、優化器和數據狀態，然後繼續訓練循環。
技術棧: python
領域: ai
議題類型: 功能
難度: 3
預計時間: 半天
活動狀態: 新近可參與
清晰度: 清晰
前置要求: PythonPyTorch
新手友善度: 65

倉庫指標

描述

貢獻者指南

每天在信箱收到新鮮 Easy issues。