JuliaGPU/CUDA.jl

Base.stack is underperforming.

Open

#2,248 opened on 2024年1月21日

GitHub で見る
 (2 comments) (1 reaction) (0 assignees)Julia (1,408 stars) (274 forks)batch import
good first issueperformance

説明

Describe the bug Stacking arrays of CuArrays is slow.

To reproduce

The Minimal Working Example (MWE) for this bug:

using BenchmarkTools, CUDA;
N=100;
M=1000;
x=randn(N);
x_cu=cu(x);
@btime stack(fill($x,M));
@btime stack(fill($x_cu,M));
@btime cu(stack(fill(collect($x_cu),M)));

As timing I am getting:

70.800 μs (3 allocations: 789.23 KiB)
15.774 ms (8 allocations: 8.19 KiB)
318.900 μs (12 allocations: 399.83 KiB)
CUDA v5.1.2

Version info

Details on Julia: 1.10

Julia Version 1.10.0
Commit 3120989f39 (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 8 × Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
  Threads: 1 on 8 virtual cores

Details on CUDA:

CUDA runtime 12.3, artifact installation
CUDA driver 12.0
Unknown NVIDIA driver

CUDA libraries:
- CUBLAS: 12.3.4
- CURAND: 10.3.4
- CUFFT: 11.0.12
- CUSOLVER: 11.5.4
- CUSPARSE: 12.2.0
- CUPTI: 21.0.0
- NVML: missing

Julia packages:
- CUDA: 5.1.2
- CUDA_Driver_jll: 0.7.0+1
- CUDA_Runtime_jll: 0.10.1+0

Toolchain:
- Julia: 1.10.0
- LLVM: 15.0.7

1 device:
  0: NVIDIA GeForce MX150 (sm_61, 1.491 GiB / 2.000 GiB available)

コントリビューターガイド