pytorch/examples

Actor critic example not using discount rate properly

Open

#744 ouverte le 27 mars 2020

Voir sur GitHub
 (3 commentaires) (0 réactions) (0 assignés)Python (21 634 stars) (9 429 forks)batch import
good first issuetriaged

Description

The Actor Critic example (which is actually an implementation of REINFORCE-with-baseline as pointed out in https://github.com/pytorch/examples/issues/573), does not use the discount rate properly.

The loss should include \gamma ^ t, as shown in the box on page 330 of Sutton & Barto:

image

Guide contributeur