Allow finding the corresponding episode from a sample in reply buffer · thu-ml/tianshou#881

(1 评论) (1 反应) (0 负责人)Python (7,121 star) (1,072 fork)batch import

RNNenhancementgood first issue

描述

I have marked all applicable categories:
- exception-raising bug
- RL algorithm bug
- documentation request (i.e. "X is missing from the documentation.")
- new feature request
I have visited the source website
I have searched through the issue tracker for duplicates

I have mentioned version numbers, operating system and environment, where applicable:

import tianshou, gymnasium as gym, torch, numpy, sys
print(tianshou.__version__, gym.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)

I'm trying to save the PrioritizedVectorReplayBuffer by its eposide. But I am not sure if there is a key asscioated with the eposide. Are there any way to find the eposide of a sample?

PrioritizedVectorReplayBuffer( info: Batch( env_id: array([0, 0, 0, ..., 0, 0, 0], dtype=int32), players: Batch( env_id: array([0, 0, 0, ..., 0, 0, 0], dtype=int32), ), lives: array([5, 5, 5, ..., 0, 0, 0], dtype=int32), reward: array([0., 0., 0., ..., 0., 0., 0.], dtype=float32), terminated: array([0, 0, 0, ..., 0, 0, 0], dtype=int32), elapsed_step: array([1, 2, 3, ..., 0, 0, 0], dtype=int32), ), act: array([2, 2, 2, ..., 0, 0, 0]), done: array([False, False, False, ..., False, False, False]), policy: Batch(), rew: array([0., 0., 0., ..., 0., 0., 0.]), terminated: array([False, False, False, ..., False, False, False]), obs: array([[]]), truncated: array([False, False, False, ..., False, False, False]), )

贡献者指南

技术栈: pythonpytorch
领域: machine learningdata
议题类型: feature
难度: 3
预计时间: 1-3 hours
活动状态: needs maintainer response
清晰度: mostly clear
前置要求: Familiarity with Tianshou's replay buffer architectureBasic Python
新手友好度: 60
研究方向: Examine the PrioritizedVectorReplayBuffer class, likely in `tianshou/data/buffer/prioritized.py`. Look for how episodes are stored in the `elapsed step` field or similar. The goal is to add a method (e.g., `get episode id(sample indices)`) that returns the episode number for each sample. Check if there is an existing mapping from sample to episode, or if it needs to be reconstructed from `done` and `terminated` flags. Consider how episodes are tracked in the vectorized buffer (multiple environments). The issue has no maintainer response yet, so first clarify the expected API and any edge cases.