codertimo/BERT-pytorch

Question about the loss of Masked LM

Open

#49 opened on Dec 7, 2018

View on GitHub
 (5 comments) (13 reactions) (0 assignees)Python (5,757 stars) (1,252 forks)batch import
good first issue

Description

Thank you very much for this great contribution. I found the loss of masked LM didn't decrease when it reaches the value around 7. However, in the official tensorflow implementation, the loss of MLM decreases to 1 easily. I think something went wrong in your implementation. In additional, I found the code can not predict the next sentence correctly. I think the reason is: self.criterion = nn.NLLLoss(ignore_index=0). It can not be used as criterion for sentence prediction because the label of sentence is 1 or 0. We should remove ignore_index=0 for sentence prediction. I am looking forward to your reply~

Contributor guide