Inconsistency in Evaluation of Word Language Model · pytorch/examples#214

Repository metrics

Stars: (21,634 stars)
PR merge metrics: (30d に merged PR はありません)

説明

I was looking at the main.py code for word-level language modeling and noticed a possible inconsistency. The final evaluation loss is intended to be a mean of the individual losses implemented as a weighted mean of the batches with the weight being the sequence length of the batch.

https://github.com/pytorch/examples/blob/930ae27d64ceae1c77bbf616e713bc4b7c403849/word_language_model/main.py#L116

There are len(data_source)-1 such losses.

https://github.com/pytorch/examples/blob/930ae27d64ceae1c77bbf616e713bc4b7c403849/word_language_model/main.py#L112

In the end, however, the division is performed with len(data_source) causing an inconsistency.

https://github.com/pytorch/examples/blob/930ae27d64ceae1c77bbf616e713bc4b7c403849/word_language_model/main.py#L118

A similar issue also arises with the book-keeping in the training loss. If this is true, the fix should be straightforward, we would need to keep track of total_seen and divide by that instead of some pre-determined quantity in both training and evaluation cases.

Tagging: @Smerity

コントリビューターガイド

調査方針: word language model/main.pyの評価コードを調べて、損失計算の矛盾を確認し、total seenを追跡してlen(data source)で割るのではなく修正を実装してください。
技術スタック: python
領域: machine learning
Issue 種別: バグ
難度: 2
推定時間: 1-3時間
活動状況: アクティブ
明確さ: 明確
前提条件: PythonPyTorch
初心者向け度: 75

Repository metrics

説明

コントリビューターガイド

新着 Easy issues をメールで受け取る。