Extend and fix reward/return normalizations · thu-ml/tianshou#927

(0 comments) (0 reactions) (0 assignees)Python (1,072 forks)batch import

enhancementgood first issue

Repository metrics

Stars: (7,121 stars)
PR merge metrics: (30d に merged PR はありません)

説明

There are some ways in which reward/return/value normalization could be improved. But one most drastic thing is the following:

Currently PGPolicy instantiates self.ret_rms = RunningMeanStd(), and RunningMeanStd has a default value of clip_max=10. This cannot be adjusted by users! (except through monkey-patching, ofc)

This might work well for some standard envs, but the clipping value is arbitrary and making it non-configurable is a major hinderance for users, who are most probably not aware of this.

Generally, how to best normalize stuff in RL is an active discussion and normalization can play an important role in performance. I believe tianshou should be extended to accomodate various normalization strategies

コントリビューターガイド

調査方針: 現在のRunningMeanStdクラスとそのclip maxの使用法を調査してください。clip maxをPGPolicyや他のポリシーのパラメーターにすることを検討してください。また、PopArtや適応型正規化などの他の正規化手法も探索してください。
技術スタック: python
領域: backend
Issue 種別: 機能
難度: 2
推定時間: 1-3時間
活動状況: 新着
明確さ: 明確
前提条件: PythonPyTorchGit
初心者向け度: 70

Repository metrics

説明

コントリビューターガイド

新着 Easy issues をメールで受け取る。