Resume in the pretraining code · Lightning-AI/lit-llama#359

(2 comments) (1 reaction) (0 assignees)Python (473 forks)batch import

help wanted

Repository metrics

Stars: (5,533 stars)
PR merge metrics: (30d に merged PR はありません)

説明

I would like to request a new feature in the code: the ability to resume training from a checkpoint.

Currently, the code can save a checkpoint of the model's state at any point during training. However, there is no way to resume training from a checkpoint.

The code can save two things along with the model state_dict: 1)the optimizer, 2)the id of the last example it has seen (assuming the data is fed sequentially to the model not randomly)

コントリビューターガイド

調査方針: チェックポイントのロードを実装して、モデル、オプティマイザ、データの状態を復元し、トレーニングループを再開します。
技術スタック: python
領域: ai
Issue 種別: 機能
難度: 3
推定時間: 半日
活動状況: 新着
明確さ: 明確
前提条件: PythonPyTorch
初心者向け度: 65

Repository metrics

説明

コントリビューターガイド

新着 Easy issues をメールで受け取る。