DLR-RM/stable-baselines3

[Bug]: Reset options ignored when resetting due to termination / truncation from within wrapper's `step`

Open

#1790 opened on Dec 20, 2023

View on GitHub
 (2 comments) (0 reactions) (0 assignees)Python (6,550 stars) (1,407 forks)batch import
bugdocumentationhelp wanted

Description

🐛 Bug

Given a wrapped env, options passed with the recommended way (wrapped_env.set_options) are ignored when reset is triggered by episode termination / truncation in step_wait of the env wrapper.

A (myopic) fix could be to pass self._options[env_idx] and self._seeds[env_idx] to the linked code above, or refactor a single-env resetting function to use both in DummyVecEnv.step_wait and DummyVecEnv.reset.

Apologies if this expected behavior. If so, what is the recommended way to pass reset options that affect the above resetting scenario?

To Reproduce

from stable_baselines3.common.vec_env import DummyVecEnv
import gymnasium as gym
import numpy as np


# dummy env
class CustomEnv(gym.Env):
    def __init__(self):
        super().__init__()
        self.action_space = gym.spaces.Discrete(3)
        self.observation_space = gym.spaces.Box(low=0, high=5, shape=(5,), dtype=np.uint8)

    # step terminates the episode
    def step(self, action):
        terminated = 1
        return np.zeros(5), 0, terminated, 0, {}

    # reset prints the options, if provided
    def reset(self, seed=None, options=None):
        if options is not None:
            print(" -- Options supplied:", options)
        return np.zeros(5), {}

# make and wrap the env
env = CustomEnv()
env = DummyVecEnv([lambda: env])

# resetting by invoking the wrapper function -- options are passed
print("Resetting by invoking DummyVecEnv.reset() :")
env.set_options({'opt': 1})
env.reset()

# reset by a terminating environment step:
# the wrapper step function calls self.envs[env_idx].reset(), which ignores self._options
print("Resetting by an episode-terminating invokation to DummyVecEnv.step() :")
env.set_options({'opt': 1})
env.step([0])

print("Done.")

Relevant log output / Error message

Resetting by invoking DummyVecEnv.reset() :
 -- Options supplied: {'opt': 1}
Resetting by an episode-terminating invokation to DummyVecEnv.step() :
Done.

System Info

  • OS: Linux-6.6.7-arch1-1-x86_64-with-glibc2.34 # 1 SMP PREEMPT_DYNAMIC Thu, 14 Dec 2023 03:45:42 +0000
  • Python: 3.8.15
  • Stable-Baselines3: 2.2.1
  • PyTorch: 1.13.1+cu117
  • GPU Enabled: True
  • Numpy: 1.24.1
  • Cloudpickle: 2.2.1
  • Gymnasium: 0.28.1

Checklist

  • My issue does not relate to a custom gym environment. (Use the custom gym env template instead)
  • I have checked that there is no similar issue in the repo
  • I have read the documentation
  • I have provided a minimal and working example to reproduce the bug
  • I've used the markdown code blocks for both code and stack traces.

Contributor guide