pytorch/examples

Actor critic example not using discount rate properly

Open

#744 opened on Mar 27, 2020

View on GitHub
 (3 comments) (0 reactions) (0 assignees)Python (21,634 stars) (9,429 forks)batch import
good first issuetriaged

Description

The Actor Critic example (which is actually an implementation of REINFORCE-with-baseline as pointed out in https://github.com/pytorch/examples/issues/573), does not use the discount rate properly.

The loss should include \gamma ^ t, as shown in the box on page 330 of Sutton & Barto:

image

Contributor guide