Investigate the finetune from model command in fairseq to understand how the learning rate is reset. Consider adding a command line flag (e.g., preserve lr) to retain the original learning rate from the checkpoint. Look at the training loop in fairseq/trainer.py and relevant configurations in fairseq/optim/ to implement the option.
Finetune without resetting learning rate · facebookresearch/fairseq#5250 | Good First Issue