good first issue
Description
https://github.com/pytorch/examples/blob/7d0d413425e2ee64fcd0e0de1b11c5cca1f79f4d/word_language_model/main.py#L171 doesn't look like it is actually saving the best model? It saves only the current model
https://github.com/pytorch/examples/blob/7d0d413425e2ee64fcd0e0de1b11c5cca1f79f4d/word_language_model/main.py#L171 doesn't look like it is actually saving the best model? It saves only the current model