Strict warning about using optimizers manually would be needed in the docs
#17281 opened on Apr 5, 2023
Description
📚 Documentation
# Correct
self.optimizers()[0].zero_grad()
self.optimizers()[1].zero_grad()
self.manual_backward(loss)
self.optimizers()[0].step()
self.optimizers()[1].step()
# Wrong
self.optimizer0.zero_grad()
self.optimizer1.zero_grad()
self.manual_backward(loss)
self.optimizer0.step()
self.optimizer1.step()
I've struggled with the issue of ModelCheckpoint module not saving checkpoint files to the designated directory. And I just found out from this issue that the difference of code above can cause global_step not increasing because global_step refers to the sum of all optimizer.step() calls, which leads to the situation that self._last_global_step_saved == trainer.global_step in _should_skip_saving_checkpoint() function is always True. This is very frustrating for newcomers like me because it's hardly found any relations between the coding style of accessing optimizers and global step. So I suggest global_step be more intuitive (for example, the number of training_step ends), or put some explanation in the docs about why it is necessary accessing optimizers with self.optimizers() function, not directly.
cc @borda