CUSPARSE: Float16 sparse arrays fail on arithmetic of opposite transpositions
#2,284 opened on 2024年3月8日
説明
Description
Basic arithmetic of transposed sparse Float16 matrices, including addition, throws an exception during kernel execution (InvalidIRError). This happens when computing A + B' expressions, even for matching element types and positions of non-zero elements.
This happens specifically when one side remains in the original order, while the other side is transposed. It is also specific to Float16 elements, as using Float32 or Float64 instead causes no such issues.
CUSPARSE.CuSparseMatrixCSR{T} types are generally usable for T being Float32 or Float64, including the mentioned operation. As such, different behavior for Float16 specialization classifies as a bug.
Example
The following code:
using SparseArrays
using CUDA
T = Float16
A = CUSPARSE.CuSparseMatrixCSR{T}(sparse(T[0 1 0; 1 0 0; 0 0 1]))
B = CUSPARSE.CuSparseMatrixCSR{T}(sparse(T[0 1 0; 1 0 0; 0 0 1]))
CUDA.@allowscalar begin
@show A + B
@show A' + B'
@show A + B'
end
Throws an error while computing contents of the last @show invocation. The crucial messages are:
A + B = Float16[0.0 2.0 0.0; 2.0 0.0 0.0; 0.0 0.0 2.0]
A' + B' = Float16[0.0 2.0 0.0; 2.0 0.0 0.0; 0.0 0.0 2.0]
ERROR: a exception was thrown during kernel execution.
(...)
ERROR: LoadError: InvalidIRError: compiling MethodInstance for CUDA.CUSPARSE.sparse_to_dense_broadcast_kernel(::Type{CUDA.CUSPARSE.CuSparseMatrixCSR{Float16, Int32}}, ::typeof(+), ::CuDeviceMatrix{Float16, 1}, ::CUDA.CUSPARSE.CuSparseDeviceMatrixCSR{Float16, Int32, 1}, ::LinearAlgebra.Adjoint{Float16, CUDA.CUSPARSE.CuSparseDeviceMatrixCSR{Float16, Int32, 1}}) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to julia.new_gc_frame)
Stacktrace:
[1] ntuple
@ ./ntuple.jl:49
[2] sparse_to_dense_broadcast_kernel
@ ~/.julia/packages/CUDA/htRwP/lib/cusparse/broadcast.jl:429
Computing A' + B yields the same error. Setting T = Float32 or T = Float64 makes the snippet execute properly.
Similar behavior can be observed when the same matrix is used on both sides, i.e. on computing A + A'. This breaks more common operations, such as LinearAlgebra.issymmetric.
Expected behavior
The expected output, analogous to other types, would be:
A + B = Float16[0.0 2.0 0.0; 2.0 0.0 0.0; 0.0 0.0 2.0]
A' + B' = Float16[0.0 2.0 0.0; 2.0 0.0 0.0; 0.0 0.0 2.0]
A + B' = Float16[0.0 2.0 0.0; 2.0 0.0 0.0; 0.0 0.0 2.0]
Version info
Details on Julia:
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 16 × Intel(R) Core(TM) i9-9960X CPU @ 3.10GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, skylake-avx512)
Threads: 1 default, 0 interactive, 1 GC (on 16 virtual cores)
Details on CUDA:
CUDA runtime 12.3, artifact installation
CUDA driver 12.2
NVIDIA driver 535.104.12
CUDA libraries:
- CUBLAS: 12.3.4
- CURAND: 10.3.4
- CUFFT: 11.0.12
- CUSOLVER: 11.5.4
- CUSPARSE: 12.2.0
- CUPTI: 21.0.0
- NVML: 12.0.0+535.104.12
Julia packages:
- CUDA: 5.2.0
- CUDA_Driver_jll: 0.7.0+1
- CUDA_Runtime_jll: 0.11.1+0
Toolchain:
- Julia: 1.10.2
- LLVM: 15.0.7
2 devices:
0: NVIDIA GeForce RTX 4090 (sm_89, 23.643 GiB / 23.988 GiB available)
1: NVIDIA GeForce RTX 4090 (sm_89, 23.642 GiB / 23.988 GiB available)