Ambiguous code in reinforce · pytorch/examples#297

(2 comments) (0 reactions) (0 assignees)Python (9,429 forks)batch import

good first issue

Repository metrics

Stars: (21,634 stars)
PR merge metrics: (No merged PRs in 30d)

Description

In /reinforcement_learning/reinforce.py, line 91:

running_reward = running_reward * 0.99 + t * 0.01

The variable running_reward seems to used for record average episodic rewards(not actually average, but I think the concept is similar), 0.01, is the scalar to update the average episodic rewards and t is done step. Add some comment or refactor naming may help beginners to understand this example.

Contributor guide

Research direction: Add comments to explain the running reward calculation or rename the variable to clarify its purpose.
Tech stack: python
Domain: machine learning
Issue type: Documentation
Difficulty: 1
Estimated time: Under 1 hour
Activity status: Fresh
Clarity: Clear
Prerequisites: Pythonreinforcement learning basics
Newbie friendliness: 90

Repository metrics

Description

Contributor guide

Get fresh easy issues in your inbox.