[Enhancement]: Wrong gains for weight initialization · DLR-RM/stable-baselines3#1559

(2 留言) (0 反應) (1 負責人)Python (6,550 star) (1,407 fork)batch import

enhancementhelp wanted

描述

Enhancement

The recommended gains for the weight init depend on the used activation function, see torch docs. However, as for now the used gains are statically implemented and always the same in ActorCriticPolicies. See here.

I recommend making the gains dependent on the activation function used(, i.e. probably mainly ReLU and tanh).

If you agree with this, I would like to implement it myself and PR.

Thanks and a good day!

To Reproduce

Relevant log output / Error message

--

System Info

Checklist

I have checked that there is no similar issue in the repo
I have read the documentation
I have provided a minimal working example to reproduce the bug
I've used the markdown code blocks for both code and stack traces.

貢獻者指南

技術棧: pythonpytorch
領域: machine learning
議題類型: feature
難度: 3
預計時間: 1-3 hours
活動狀態: needs maintainer response
清晰度: clear
前置要求: Basic knowledge of PyTorchUnderstanding of weight initializationFamiliarity with stable baselines3 codebase
新手友善度: 60
研究方向: The issue is in stable baselines3/common/policies.py around line 589 where gains are statically set. The goal is to dynamically set gains based on activation function, following PyTorch's recommendations. The contributor should look at how activation functions are selected in the policy network and adjust the initialization accordingly. No linked PRs or assignee comments are visible.