Lightning-AI/pytorch-lightning

Strict warning about using optimizers manually would be needed in the docs

Open

#17281 opened on Apr 5, 2023

View on GitHub
 (0 comments) (2 reactions) (0 assignees)Python (26,687 stars) (3,233 forks)batch import
docshelp wanted

Description

📚 Documentation

# Correct
self.optimizers()[0].zero_grad()
self.optimizers()[1].zero_grad()
self.manual_backward(loss)
self.optimizers()[0].step()
self.optimizers()[1].step()

# Wrong
self.optimizer0.zero_grad()
self.optimizer1.zero_grad()
self.manual_backward(loss)
self.optimizer0.step()
self.optimizer1.step()

I've struggled with the issue of ModelCheckpoint module not saving checkpoint files to the designated directory. And I just found out from this issue that the difference of code above can cause global_step not increasing because global_step refers to the sum of all optimizer.step() calls, which leads to the situation that self._last_global_step_saved == trainer.global_step in _should_skip_saving_checkpoint() function is always True. This is very frustrating for newcomers like me because it's hardly found any relations between the coding style of accessing optimizers and global step. So I suggest global_step be more intuitive (for example, the number of training_step ends), or put some explanation in the docs about why it is necessary accessing optimizers with self.optimizers() function, not directly.

cc @borda

Contributor guide