RVC-Project/Retrieval-based-Voice-Conversion-WebUI

Slower Multi-GPU training with 2x the number of GPUs and 4x the amount of VRAM

Open

#244 opened on May 7, 2023

View on GitHub
 (1 comment) (1 reaction) (0 assignees)Python (2,849 forks)batch import
help wantedquestion

Repository metrics

Stars
 (18,427 stars)
PR merge metrics
 (No merged PRs in 30d)

Description

I have two systems training on identical datasets

System A has 4 x NVIDIA RTX A5000 (24GB VRAM per GPU), and a batch size of 12 per GPU.

System B has 7 x NVIDIA RTX A6000 (48GB VRAM per GPU), and a batch size of 18 per GPU.

I would expect System B to train much faster. However...

  • System A (96GB total VRAM, batch size 12) takes 11 seconds per epoch.

  • System B (336GB total VRAM, batch size 18) takes 13 seconds per epoch.

I'm wondering if this is down to the overhead of multi-GPU training, or if there's something I'm missing here?

Thank you

Contributor guide