thu-ml/tianshou

Extend and fix reward/return normalizations

Open

#927 创建于 2023年8月29日

在 GitHub 查看
 (0 评论) (0 反应) (0 负责人)Python (7,121 star) (1,072 fork)batch import
enhancementgood first issue

描述

There are some ways in which reward/return/value normalization could be improved. But one most drastic thing is the following:

Currently PGPolicy instantiates self.ret_rms = RunningMeanStd(), and RunningMeanStd has a default value of clip_max=10. This cannot be adjusted by users! (except through monkey-patching, ofc)

This might work well for some standard envs, but the clipping value is arbitrary and making it non-configurable is a major hinderance for users, who are most probably not aware of this.

Generally, how to best normalize stuff in RL is an active discussion and normalization can play an important role in performance. I believe tianshou should be extended to accomodate various normalization strategies

贡献者指南