Documentation: writing custom samplers compatible with multi GPU training · Lightning-AI/pytorch-lightning#19964

仓库指标

Star: (26,687 star)
PR 合并指标: (平均合并 9天 15小时) (30 天内合并 3 个 PR)

描述

📚 Documentation

Hi,

I'm trying to run distributed training with a custom sampler for the first time. The idea is rather simple (fixed budget for each class) and works fine in single GPU. When moving to multi GPU, unsurprisingly I get an error message, which tells me that I should subclass BatchSampler.

TypeError:  Lightning can't inject a (distributed) sampler into your batch sampler, because it doesn't subclass PyTorch's `BatchSampler`. To mitigate this, either follow the API of `BatchSampler` or set `Trainer(use_distributed_sampler=False)`. If you choose the latter, you will be responsible for handling the distributed sampling within your batch sampler.

It is my understanding that torch's BatchSampler takes one (single-sample) Sampler and samples from that repeatedly to fill up the batch size. Are there any guidelines for how samplers should be built to be compatible with the sampler injection? I can't seem to find it in the docs.

cc @borda

贡献者指南

研究方向: 查阅PyTorch Lightning文档中关于自定义采样器的部分，重点在于继承BatchSampler。查看DistributedSampler及相关类的源代码。实现一个简单的自定义批量采样器，继承自BatchSampler，并使用Trainer(use distributed sampler=True)在多GPU下测试。
技术栈: python
领域: machine learning
议题类型: 文档
难度: 2
预计时间: 1-3 小时
活动状态: 活跃
清晰度: 清晰
前置要求: PyTorchLightning
新手友好度: 75

仓库指标

描述

📚 Documentation

贡献者指南

每天在邮箱收到新鲜 Easy issues。