pytorch/examples

Actor critic example not using discount rate properly

Open

#744 geöffnet am 27. März 2020

Auf GitHub ansehen
 (3 Kommentare) (0 Reaktionen) (0 zugewiesene Personen)Python (21.634 Stars) (9.429 Forks)batch import
good first issuetriaged

Beschreibung

The Actor Critic example (which is actually an implementation of REINFORCE-with-baseline as pointed out in https://github.com/pytorch/examples/issues/573), does not use the discount rate properly.

The loss should include \gamma ^ t, as shown in the box on page 330 of Sutton & Barto:

image

Contributor Guide