Lightning-AI/lit-llama

Resume in the pretraining code

Open

#359 建立於 2023年6月2日

在 GitHub 查看
 (2 留言) (1 反應) (0 負責人)Python (5,533 star) (473 fork)batch import
help wanted

描述

I would like to request a new feature in the code: the ability to resume training from a checkpoint.

Currently, the code can save a checkpoint of the model's state at any point during training. However, there is no way to resume training from a checkpoint.

The code can save two things along with the model state_dict: 1)the optimizer, 2)the id of the last example it has seen (assuming the data is fed sequentially to the model not randomly)

貢獻者指南