AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
(4,370 stars) (351 forks) (0 个已索引 issue) (0 个开放 good first issue)