Multi-GPU training capability for the Pytorch Transformer LM training script - https://github.com/kaldi-asr/kaldi/blob/master/egs/wsj/s5/local/pytorchnn/run_nnlm.sh · kaldi-asr/kaldi#4699

(3 comments) (0 reactions) (0 assignees)Shell (5,359 forks)batch import

enhancementhelp wantedstale-exclude

Repository metrics

I used the script ### https://github.com/kaldi-asr/kaldi/blob/master/egs/wsj/s5/local/pytorchnn/run_nnlm.sh, but I could not figure out how we could distribute the training of Transformer based LM on multiple GPUs in order to speed-up the Pytorch training. Please suggest if there is any way to do so.

Thanks!

Research direction: Investigate PyTorch DistributedDataParallel (DDP) or DataParallel to parallelize the Transformer LM training across multiple GPUs. Modify the training script to wrap the model with DDP, use distributed sampler for data loading, and launch with torch.distributed.launch.
Tech stack: pytorch
Domain: machine learning
Issue type: Research
Difficulty: 2
Estimated time: Half day
Activity status: Active
Clarity: Clear
Prerequisites: PyTorchCUDAPython
Newbie friendliness: 65