Lightning-AI/pytorch-lightning

Documentation: writing custom samplers compatible with multi GPU training

Open

#19,964 创建于 2024年6月10日

在 GitHub 查看
 (1 评论) (0 反应) (0 负责人)Python (26,687 star) (3,233 fork)batch import
docshelp wanted

描述

📚 Documentation

Hi,

I'm trying to run distributed training with a custom sampler for the first time. The idea is rather simple (fixed budget for each class) and works fine in single GPU. When moving to multi GPU, unsurprisingly I get an error message, which tells me that I should subclass BatchSampler.

TypeError:  Lightning can't inject a (distributed) sampler into your batch sampler, because it doesn't subclass PyTorch's `BatchSampler`. To mitigate this, either follow the API of `BatchSampler` or set `Trainer(use_distributed_sampler=False)`. If you choose the latter, you will be responsible for handling the distributed sampling within your batch sampler.

It is my understanding that torch's BatchSampler takes one (single-sample) Sampler and samples from that repeatedly to fill up the batch size. Are there any guidelines for how samplers should be built to be compatible with the sampler injection? I can't seem to find it in the docs.

cc @borda

贡献者指南