Inconsistency in Evaluation of Word Language Model · pytorch/examples#214

倉庫指標

Star: (21,634 star)
PR 合併指標: (30 天內沒有已合併 PR)

描述

I was looking at the main.py code for word-level language modeling and noticed a possible inconsistency. The final evaluation loss is intended to be a mean of the individual losses implemented as a weighted mean of the batches with the weight being the sequence length of the batch.

https://github.com/pytorch/examples/blob/930ae27d64ceae1c77bbf616e713bc4b7c403849/word_language_model/main.py#L116

There are len(data_source)-1 such losses.

https://github.com/pytorch/examples/blob/930ae27d64ceae1c77bbf616e713bc4b7c403849/word_language_model/main.py#L112

In the end, however, the division is performed with len(data_source) causing an inconsistency.

https://github.com/pytorch/examples/blob/930ae27d64ceae1c77bbf616e713bc4b7c403849/word_language_model/main.py#L118

A similar issue also arises with the book-keeping in the training loss. If this is true, the fix should be straightforward, we would need to keep track of total_seen and divide by that instead of some pre-determined quantity in both training and evaluation cases.

Tagging: @Smerity

貢獻者指南

研究方向: 檢查 word language model/main.py 中的評估程式碼，確認損失計算中的不一致性，並透過追蹤 total seen 而不是除以 len(data source) 來實作修復。
技術棧: python
領域: machine learning
議題類型: 錯誤
難度: 2
預計時間: 1-3 小時
活動狀態: 活躍
清晰度: 清晰
前置要求: PythonPyTorch
新手友善度: 75

倉庫指標

描述

貢獻者指南

每天在信箱收到新鮮 Easy issues。