Allow finding the corresponding episode from a sample in reply buffer
#881 创建于 2023年6月1日
描述
- I have marked all applicable categories:
- exception-raising bug
- RL algorithm bug
- documentation request (i.e. "X is missing from the documentation.")
- new feature request
- I have visited the source website
- I have searched through the issue tracker for duplicates
- I have mentioned version numbers, operating system and environment, where applicable:
import tianshou, gymnasium as gym, torch, numpy, sys print(tianshou.__version__, gym.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)
I'm trying to save the PrioritizedVectorReplayBuffer by its eposide. But I am not sure if there is a key asscioated with the eposide. Are there any way to find the eposide of a sample?
PrioritizedVectorReplayBuffer( info: Batch( env_id: array([0, 0, 0, ..., 0, 0, 0], dtype=int32), players: Batch( env_id: array([0, 0, 0, ..., 0, 0, 0], dtype=int32), ), lives: array([5, 5, 5, ..., 0, 0, 0], dtype=int32), reward: array([0., 0., 0., ..., 0., 0., 0.], dtype=float32), terminated: array([0, 0, 0, ..., 0, 0, 0], dtype=int32), elapsed_step: array([1, 2, 3, ..., 0, 0, 0], dtype=int32), ), act: array([2, 2, 2, ..., 0, 0, 0]), done: array([False, False, False, ..., False, False, False]), policy: Batch(), rew: array([0., 0., 0., ..., 0., 0., 0.]), terminated: array([False, False, False, ..., False, False, False]), obs: array([[]]), truncated: array([False, False, False, ..., False, False, False]), )