📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉
倉庫
xlite-dev 的倉庫
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
🔥Robust Video Matting C++ inference toolkit with ONNXRuntime、MNN、NCNN and TNN, via lite.ai.toolkit.
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
🤖FFPA: Extends FlashAttention-2 via Split-D for large headdims, 1.5x~3×↑🎉 vs SDPA, up to 430T🎉 on H200.
FlashInfer: Kernel Library for LLM Serving
FSANet: 1 Mb!! Head Pose Estimation with MNN、TNN and ONNXRuntime C++.
📚《统计学习方法-李航: 笔记》 200页PDF,公式细节讲解🎉
🛠A lite C++ AI toolkit: 100+ models with MNN, ORT and TRT, including Det, Seg, Stable-Diffusion, Face-Fusion, etc.🎉
MGMatting with MNN/TNN/ONNXRuntime C++, GPU/CPU, support dynamic shape.
NanoDet、NanoDet-Plus with ONNXRuntime/MNN/TNN/NCNN C++.
☕️ A vscode extension for netron, support *.pdmodel, *.nb, *.onnx, *.pb, *.h5, *.tflite, *.pth, *.pt, *.mnn, *.param, etc.
Super fast accurate face detector ! SCRFD(CVPR 2021) with MNN/TNN/NCNN/ONNXRuntime C++.
SSRNet: 190 Kb!! Super fast Age Estimation with MNN/TNN/ONNXRuntime C++.
💎An easy-to-use PyTorch library for face landmarks detection: training, evaluation, inference, and 100+ data augmentations.🎉
YOLO5Face 2021 with MNN/NCNN/TNN/ONNXRuntime