Strict warning about using optimizers manually would be needed in the docs · Lightning-AI/pytorch-lightning#17281

Repository metrics

Stars: (26,687 stars)
PR merge metrics: (Avg merge 9d 15h) (3 merged PRs in 30d)

Description

📚 Documentation

# Correct
self.optimizers()[0].zero_grad()
self.optimizers()[1].zero_grad()
self.manual_backward(loss)
self.optimizers()[0].step()
self.optimizers()[1].step()

# Wrong
self.optimizer0.zero_grad()
self.optimizer1.zero_grad()
self.manual_backward(loss)
self.optimizer0.step()
self.optimizer1.step()

I've struggled with the issue of ModelCheckpoint module not saving checkpoint files to the designated directory. And I just found out from this issue that the difference of code above can cause global_step not increasing because global_step refers to the sum of all optimizer.step() calls, which leads to the situation that self._last_global_step_saved == trainer.global_step in _should_skip_saving_checkpoint() function is always True. This is very frustrating for newcomers like me because it's hardly found any relations between the coding style of accessing optimizers and global step. So I suggest global_step be more intuitive (for example, the number of training_step ends), or put some explanation in the docs about why it is necessary accessing optimizers with self.optimizers() function, not directly.

cc @borda

Contributor guide

Research direction: Review the existing documentation at the URL provided, locate the section about manual optimization, and add a warning that direct access to optimizers (e.g., self.optimizer0) may cause issues with global step. Reference issue #17281.
Tech stack: pythonpytorch
Domain: documentationdeveloper experience
Issue type: Documentation
Difficulty: 1
Estimated time: Under 1 hour
Activity status: Fresh
Clarity: Clear
Prerequisites: PythonGit
Newbie friendliness: 90

Repository metrics

Description

📚 Documentation

Contributor guide

Get fresh easy issues in your inbox.