pytorch/examples

Ambiguous code in reinforce

Open

#297 建立於 2018年2月2日

在 GitHub 查看
 (2 留言) (0 反應) (0 負責人)Python (21,634 star) (9,429 fork)batch import
good first issue

描述

In /reinforcement_learning/reinforce.py, line 91:

running_reward = running_reward * 0.99 + t * 0.01

The variable running_reward seems to used for record average episodic rewards(not actually average, but I think the concept is similar), 0.01, is the scalar to update the average episodic rewards and t is done step. Add some comment or refactor naming may help beginners to understand this example.

貢獻者指南

Ambiguous code in reinforce · pytorch/examples#297 | Good First Issue