pytorch/examples

Actor critic example not using discount rate properly

Open

Aperta il 27 mar 2020

Vedi su GitHub
 (3 commenti) (0 reazioni) (0 assegnatari)Python (21.634 star) (9429 fork)batch import
good first issuetriaged

Descrizione

The Actor Critic example (which is actually an implementation of REINFORCE-with-baseline as pointed out in https://github.com/pytorch/examples/issues/573), does not use the discount rate properly.

The loss should include \gamma ^ t, as shown in the box on page 330 of Sutton & Barto:

image

Guida contributor