enhancementgood first issue
Description
Summary
Promise from #906.
Add tests to assert the losses on the same mini-batch are equal
- between w/o & w/ gradient accumulation
- using all the different loss aggregation modes
Plan
We might went to test end-to-end to avoid potential implication by any other changes.
| Plan | Pro |
|---|---|
| A) Inherit a new Trainer class to test | Standalone without affecting the main config |
B) Add a config term and modify update_* to test |
1) Minimal code modification 2) Compatibility to any Trainer |