thu-ml/tianshou

Allow finding the corresponding episode from a sample in reply buffer

Open

#881 opened on Jun 1, 2023

View on GitHub
 (1 comment) (1 reaction) (0 assignees)Python (7,121 stars) (1,072 forks)batch import
RNNenhancementgood first issue

Description

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the source website
  • I have searched through the issue tracker for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:
    import tianshou, gymnasium as gym, torch, numpy, sys
    print(tianshou.__version__, gym.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)
    

I'm trying to save the PrioritizedVectorReplayBuffer by its eposide. But I am not sure if there is a key asscioated with the eposide. Are there any way to find the eposide of a sample?

PrioritizedVectorReplayBuffer( info: Batch( env_id: array([0, 0, 0, ..., 0, 0, 0], dtype=int32), players: Batch( env_id: array([0, 0, 0, ..., 0, 0, 0], dtype=int32), ), lives: array([5, 5, 5, ..., 0, 0, 0], dtype=int32), reward: array([0., 0., 0., ..., 0., 0., 0.], dtype=float32), terminated: array([0, 0, 0, ..., 0, 0, 0], dtype=int32), elapsed_step: array([1, 2, 3, ..., 0, 0, 0], dtype=int32), ), act: array([2, 2, 2, ..., 0, 0, 0]), done: array([False, False, False, ..., False, False, False]), policy: Batch(), rew: array([0., 0., 0., ..., 0., 0., 0.]), terminated: array([False, False, False, ..., False, False, False]), obs: array([[]]), truncated: array([False, False, False, ..., False, False, False]), )

Contributor guide