pytorch/examples

Ambiguous code in reinforce

Open

#297 opened on Feb 2, 2018

View on GitHub
 (2 comments) (0 reactions) (0 assignees)Python (21,634 stars) (9,429 forks)batch import
good first issue

Description

In /reinforcement_learning/reinforce.py, line 91:

running_reward = running_reward * 0.99 + t * 0.01

The variable running_reward seems to used for record average episodic rewards(not actually average, but I think the concept is similar), 0.01, is the scalar to update the average episodic rewards and t is done step. Add some comment or refactor naming may help beginners to understand this example.

Contributor guide