thu-ml/tianshou

Extend and fix reward/return normalizations

Open

#927 opened on 2023年8月29日

GitHub で見る
 (0 comments) (0 reactions) (0 assignees)Python (7,121 stars) (1,072 forks)batch import
enhancementgood first issue

説明

There are some ways in which reward/return/value normalization could be improved. But one most drastic thing is the following:

Currently PGPolicy instantiates self.ret_rms = RunningMeanStd(), and RunningMeanStd has a default value of clip_max=10. This cannot be adjusted by users! (except through monkey-patching, ofc)

This might work well for some standard envs, but the clipping value is arbitrary and making it non-configurable is a major hinderance for users, who are most probably not aware of this.

Generally, how to best normalize stuff in RL is an active discussion and normalization can play an important role in performance. I believe tianshou should be extended to accomodate various normalization strategies

コントリビューターガイド