yl4579/StyleTTS2

h100: Worse output & 20x slower inference?

Open

#89 opened on Nov 26, 2023

View on GitHub
 (14 comments) (0 reactions) (0 assignees)Python (210 forks)batch import
help wanted

Repository metrics

Stars
 (3,429 stars)
PR merge metrics
 (No merged PRs in 30d)

Description

We're testing finetuning on an h100 and 4090, here are the results:

4090: https://voca.ro/11mtxzLHzzih h100: https://voca.ro/15QldVjuG7nu

Almost identical finetune, but h100 is output is SIGNIFICANTLY worse. It isn't a config issue, and we've replicated it twice with LJSpeech as well.

4090 is also faster during training and considerably faster during inference, almost 20x faster than h100:

Screenshot_2023-11-26_at_5 01 16_PM

h100:

Screenshot_2023-11-24_at_3 46 08_PM

And during training, one epoch took the 4090 about 3 minutes, while the h100 took 4.12 minutes.

Does anyone know what could be going on here? Never seen an issue like this on an h100 before with a diffusion like model. Thanks

Contributor guide