RVC-Project/Retrieval-based-Voice-Conversion-WebUI

[Bug] RuntimeError: CUDA error: an illegal instruction was encountered

Open

#215 opened on May 2, 2023

View on GitHub
 (7 comments) (0 reactions) (0 assignees)Python (2,849 forks)batch import
bughelp wantedquestion

Repository metrics

Stars
 (18,427 stars)
PR merge metrics
 (No merged PRs in 30d)

Description

Using an Ubuntu system, 2x3060 (12g ea) and the latest version of RVC, commit c4a1810

During training, after a few epochs complete, a CUDA error is thrown:


INFO:user-test-3:====> Epoch: 1
INFO:user-test-3:Train Epoch: 2 [11%]
INFO:user-test-3:[200, 9.99875e-05]
INFO:user-test-3:loss_disc=3.124, loss_gen=2.644, loss_fm=8.702,loss_mel=19.773, loss_kl=1.555
INFO:user-test-3:====> Epoch: 2
INFO:user-test-3:Train Epoch: 3 [22%]
INFO:user-test-3:[400, 9.99750015625e-05]
INFO:user-test-3:loss_disc=3.009, loss_gen=2.687, loss_fm=8.580,loss_mel=19.066, loss_kl=1.653
INFO:user-test-3:====> Epoch: 3
INFO:user-test-3:Train Epoch: 4 [33%]
INFO:user-test-3:[600, 9.996250468730469e-05]
INFO:user-test-3:loss_disc=3.033, loss_gen=2.489, loss_fm=7.798,loss_mel=18.964, loss_kl=1.770
INFO:user-test-3:====> Epoch: 4
INFO:user-test-3:Train Epoch: 5 [44%]
INFO:user-test-3:[800, 9.995000937421877e-05]
INFO:user-test-3:loss_disc=2.957, loss_gen=2.745, loss_fm=7.675,loss_mel=18.730, loss_kl=1.756
terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: an illegal instruction was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f37fab9e4d7 in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f37fab6836b in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f38008b6fa8 in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0xdf9d4e (0x7f378a7f9d4e in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0x4ccea6 (0x7f37c90ccea6 in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/lib/python3.10/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0x3ee77 (0x7f37fab83e77 in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #6: c10::TensorImpl::~TensorImpl() + 0x1be (0x7f37fab7c69e in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #7: c10::TensorImpl::~TensorImpl() + 0x9 (0x7f37fab7c7b9 in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #8: <unknown function> + 0x752458 (0x7f37c9352458 in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/lib/python3.10/site-packages/torch/lib/libtorch_python.so)
frame #9: THPVariable_subclass_dealloc(_object*) + 0x305 (0x7f37c93527e5 in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/lib/python3.10/site-packages/torch/lib/libtorch_python.so)
frame #10: <unknown function> + 0x12c1dc (0x55db69db61dc in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/bin/python)
frame #11: <unknown function> + 0x154b6f (0x55db69ddeb6f in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/bin/python)
frame #12: <unknown function> + 0x167367 (0x55db69df1367 in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/bin/python)
frame #13: <unknown function> + 0x167394 (0x55db69df1394 in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/bin/python)
frame #14: <unknown function> + 0x167394 (0x55db69df1394 in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/bin/python)
frame #15: <unknown function> + 0x171a2c (0x55db69dfba2c in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/bin/python)
frame #16: <unknown function> + 0x132719 (0x55db69dbc719 in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/bin/python)
frame #17: <unknown function> + 0x272015 (0x55db69efc015 in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/bin/python)
frame #18: _PyEval_EvalFrameDefault + 0x5ae7 (0x55db69dd79e7 in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/bin/python)
frame #19: _PyFunction_Vectorcall + 0x79 (0x55db69de7ff9 in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/bin/python)
frame #20: _PyEval_EvalFrameDefault + 0x8c2 (0x55db69dd27c2 in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/bin/python)
frame #21: _PyFunction_Vectorcall + 0x79 (0x55db69de7ff9 in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/bin/python)
frame #22: _PyEval_EvalFrameDefault + 0x6d0 (0x55db69dd25d0 in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/bin/python)
frame #23: _PyFunction_Vectorcall + 0x79 (0x55db69de7ff9 in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/bin/python)
frame #24: _PyEval_EvalFrameDefault + 0x197b (0x55db69dd387b in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/bin/python)
frame #25: <unknown function> + 0x144cb4 (0x55db69dcecb4 in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/bin/python)
frame #26: PyEval_EvalCode + 0x86 (0x55db69ebb266 in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/bin/python)
frame #27: <unknown function> + 0x25d497 (0x55db69ee7497 in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/bin/python)
frame #28: <unknown function> + 0x25645e (0x55db69ee045e in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/bin/python)
frame #29: PyRun_StringFlags + 0x81 (0x55db69ed8a71 in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/bin/python)
frame #30: PyRun_SimpleStringFlags + 0x3c (0x55db69ed894c in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/bin/python)
frame #31: Py_RunMain + 0x377 (0x55db69ed7ae7 in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/bin/python)
frame #32: Py_BytesMain + 0x2b (0x55db69eaf38b in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/bin/python)
frame #33: <unknown function> + 0x23510 (0x7f38a3823510 in /lib/x86_64-linux-gnu/libc.so.6)
frame #34: __libc_start_main + 0x89 (0x7f38a38235c9 in /lib/x86_64-linux-gnu/libc.so.6)
frame #35: _start + 0x25 (0x55db69eaf285 in /home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/bin/python)

Traceback (most recent call last):
  File "/home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 534, in <module>
    main()
  File "/home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 50, in main
    mp.spawn(
  File "/home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 239, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 197, in start_processes
    while not context.join():
  File "/home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/venv/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 202, in run
    train_and_evaluate(
  File "/home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 389, in train_and_evaluate
    wave = commons.slice_segments(
  File "/home/user/rvc-test/Retrieval-based-Voice-Conversion-WebUI/infer_pack/commons.py", line 49, in slice_segments
    ret[i] = x[i, :, idx_str:idx_end]
RuntimeError: CUDA error: an illegal instruction was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Contributor guide