docsgood first issuenlp
描述
The torch.nn.modules.transformer documentation says the word_language_model example in this repo is an example of its use. But it seems to instead DIY a transformer and uses that instead. Is this intentional? I would offer my help to write it for torch.nn.modules.transformer but I'm here to learn how to use it.