pytorch/examples

Ambiguous code in reinforce

Open

#297 创建于 2018年2月2日

在 GitHub 查看
 (2 评论) (0 反应) (0 负责人)Python (21,634 star) (9,429 fork)batch import
good first issue

描述

In /reinforcement_learning/reinforce.py, line 91:

running_reward = running_reward * 0.99 + t * 0.01

The variable running_reward seems to used for record average episodic rewards(not actually average, but I think the concept is similar), 0.01, is the scalar to update the average episodic rewards and t is done step. Add some comment or refactor naming may help beginners to understand this example.

贡献者指南