Ambiguous code in reinforce · pytorch/examples#297

(2 留言) (0 反應) (0 負責人)Python (9,429 fork)batch import

good first issue

倉庫指標

Star: (21,634 star)
PR 合併指標: (30 天內沒有已合併 PR)

描述

In /reinforcement_learning/reinforce.py, line 91:

running_reward = running_reward * 0.99 + t * 0.01

The variable running_reward seems to used for record average episodic rewards(not actually average, but I think the concept is similar), 0.01, is the scalar to update the average episodic rewards and t is done step. Add some comment or refactor naming may help beginners to understand this example.

貢獻者指南

研究方向: 該問題指向 `/reinforcement learning/reinforce.py` 中的第 91 行。變數 `running reward` 和更新公式 `running reward * 0.99 + t * 0.01` 令人困惑。添加註解說明這是計算回合獎勵的指數移動平均，或將 `running reward` 重新命名為類似 `avg reward` 的名稱，可以澄清程式碼。檢查檔案中現有的註解和編碼風格。沒有關聯的拉取請求或維護者回應。確保更改與儲存庫中的其他範例保持一致。
技術棧: python
領域: machine learning
議題類型: 文件
難度: 1
預計時間: 1 小時以內
活動狀態: 新近可參與
清晰度: 清晰
前置要求: Pythonreinforcement learning basics
新手友善度: 90

倉庫指標

描述

貢獻者指南

每天在信箱收到新鮮 Easy issues。