pytorch/examples

Actor critic example not using discount rate properly

Open

#744 opened on 2020年3月27日

GitHub で見る
 (3 comments) (0 reactions) (0 assignees)Python (21,634 stars) (9,429 forks)batch import
good first issuetriaged

説明

The Actor Critic example (which is actually an implementation of REINFORCE-with-baseline as pointed out in https://github.com/pytorch/examples/issues/573), does not use the discount rate properly.

The loss should include \gamma ^ t, as shown in the box on page 330 of Sutton & Barto:

image

コントリビューターガイド