Slower Multi-GPU training with 2x the number of GPUs and 4x the amount of VRAM · RVC-Project/Retrieval-based-Voice-Conversion-WebUI#244

(1 comment) (1 reaction) (0 assignees)Python (2,849 forks)batch import

help wantedquestion

Repository metrics

I have two systems training on identical datasets

System A has 4 x NVIDIA RTX A5000 (24GB VRAM per GPU), and a batch size of 12 per GPU.

System B has 7 x NVIDIA RTX A6000 (48GB VRAM per GPU), and a batch size of 18 per GPU.

I would expect System B to train much faster. However...

I'm wondering if this is down to the overhead of multi-GPU training, or if there's something I'm missing here?

Thank you

Research direction: Investigate the training loop and data loading for potential bottlenecks in multi GPU communication.
Tech stack: python
Domain: machine learning
Issue type: Performance
Difficulty: 3
Estimated time: 1-2 days
Activity status: Active
Clarity: Clear
Prerequisites: PythonPyTorchmulti GPU training
Newbie friendliness: 40