仓库议题
Lightning-AI/pytorch-lightning
Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.
议题
开放
NCCL timeout while doing multi gpu training
bugdistributedhelp wantedrepro neededver: 2.4.x
4 条评论0 个反应0 负责人
开放
Make logging_mode (`on_step` or `on_epoch`) available to loggers
featurehelp wanted
5 条评论0 个反应0 负责人
开放
Confusing recommendation to use sync_dist=True even with TorchMetrics
bughelp wantedloggingver: 2.2.x
17 条评论0 个反应0 负责人
开放
training time increase epoch by epoch
bughelp wantedperformancerepro neededver: 2.2.x
2 条评论0 个反应0 负责人
开放
enable loading `universal checkpointing` checkpoint in `DeepSpeedStrategy`
featurehelp wantedstrategy: deepspeed
1 条评论0 个反应0 负责人
开放
trainer.test() with given checkpoint logs last epoch instead of checkpoint epoch
bughelp wantedrepro needed
1 条评论1 个反应0 负责人
开放
ModelCheckpoint could not find key in returned metrics
bugcallback: model checkpointhelp wantedver: 2.1.x
4 条评论3 个反应0 负责人
开放
[Fabric Lightning] Named barriers
distributedfeaturehelp wanted
1 条评论2 个反应0 负责人
开放
1 条评论0 个反应0 负责人
开放
Returning num_replicas=world_size when using distributed sampler in ddp
distributedduplicatefeaturehelp wantedstrategy: ddp
3 条评论1 个反应0 负责人
开放
neptune.ai logger produces lots of errors when logging "training/epoch"
bughelp wantedlogger: neptune
4 条评论3 个反应0 负责人
开放
`configure_model` is incompatible with the `BaseFinetuning` behavior when fitting
bugcallback: finetuninghelp wantedver: 2.1.x
0 条评论3 个反应0 负责人
开放
3 条评论0 个反应0 负责人
开放
`batch_sampler.batch_size` is None with deepspeed and `DataLoader(batch_size=None)`
bughelp wantedstrategy: deepspeed
4 条评论0 个反应0 负责人
开放
Potential off by 1 error when resuming training of mid-epoch checkpoint
bughelp wantedloopsver: 2.1.x
1 条评论0 个反应0 负责人
开放
Support skipping the first iteration time in process bar
featurehelp wantedprogress bar: tqdmtorch.compile
1 条评论1 个反应0 负责人
开放
Support `DDP(static_graph=True)` and gradient accumulation
help wantedstrategy: ddp
3 条评论3 个反应1 负责人
开放
ignore_modules in Quantization via Bitsandbytes
featurehelp wantedprecision: bnb
4 条评论0 个反应0 负责人
开放
Loggers fails to create metrics.csv file when running on multiple TPU cores
bughelp wantedstrategy: xlaver: 2.2.x
3 条评论0 个反应0 负责人
开放
Downloading artifacts with wandblogger in DDP case failing on non-zero rank processes
bughelp wantedlogger: wandbver: 2.1.x
3 条评论1 个反应0 负责人