xlite-dev Repositories

📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉

Last commit Mar 19, 2026

(562 stars) (26 forks) (0 indexed issues) (0 open good first issues)

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Last commit Apr 20, 2026

(5,277 stars) (384 forks) (0 indexed issues) (0 open good first issues)

xlite-dev/HGEMMCuda

⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.

Last commit May 10, 2025

(155 stars) (9 forks) (0 indexed issues) (0 open good first issues)

xlite-dev/LeetCUDACuda

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Last commit May 17, 2026

(11,209 stars) (1,142 forks) (0 indexed issues) (0 open good first issues)

xlite-dev/RVM-InferenceC++

🔥Robust Video Matting C++ inference toolkit with ONNXRuntime、MNN、NCNN and TNN, via lite.ai.toolkit.

Last commit Jul 29, 2024

(142 stars) (27 forks) (0 indexed issues) (0 open good first issues)

xlite-dev/SageAttentionCuda

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Last commit Jan 17, 2026

(0 stars) (0 forks) (0 indexed issues) (0 open good first issues)

xlite-dev/ffpa-attnPython

🤖FFPA: Extends FlashAttention-2 via Split-D for large headdims, 1.5x~3×↑🎉 vs SDPA, up to 430T🎉 on H200.

Last commit Jun 6, 2026

(306 stars) (20 forks) (0 indexed issues) (0 open good first issues)

xlite-dev/flashinferPython

FlashInfer: Kernel Library for LLM Serving

Last commit May 1, 2026

(0 stars) (0 forks) (0 indexed issues) (0 open good first issues)

xlite-dev/fsanet-toolkitC++

FSANet: 1 Mb!! Head Pose Estimation with MNN、TNN and ONNXRuntime C++.

Last commit Feb 4, 2022

(17 stars) (2 forks) (0 indexed issues) (0 open good first issues)

xlite-dev/lihang-notesShell

📚《统计学习方法-李航: 笔记》 200页PDF，公式细节讲解🎉

Last commit Jul 13, 2025

(495 stars) (61 forks) (0 indexed issues) (0 open good first issues)

xlite-dev/lite.ai.toolkitC++

🛠A lite C++ AI toolkit: 100+ models with MNN, ORT and TRT, including Det, Seg, Stable-Diffusion, Face-Fusion, etc.🎉

Last commit Mar 19, 2026

(4,412 stars) (781 forks) (0 indexed issues) (0 open good first issues)

xlite-dev/mgmatting-toolkitC++

MGMatting with MNN/TNN/ONNXRuntime C++, GPU/CPU, support dynamic shape.

Last commit Feb 3, 2022

(8 stars) (2 forks) (0 indexed issues) (0 open good first issues)

xlite-dev/nanodet-toolkitC++

NanoDet、NanoDet-Plus with ONNXRuntime/MNN/TNN/NCNN C++.

Last commit Dec 27, 2021

(30 stars) (7 forks) (0 indexed issues) (0 open good first issues)

xlite-dev/netron-vscode-extensionTypeScript

☕️ A vscode extension for netron, support *.pdmodel, *.nb, *.onnx, *.pb, *.h5, *.tflite, *.pth, *.pt, *.mnn, *.param, etc.

Last commit Jun 4, 2023

(14 stars) (0 forks) (0 indexed issues) (0 open good first issues)

xlite-dev/scrfd-toolkitC++

Super fast accurate face detector ! SCRFD(CVPR 2021) with MNN/TNN/NCNN/ONNXRuntime C++.

Last commit Jan 12, 2022

(20 stars) (4 forks) (0 indexed issues) (0 open good first issues)

xlite-dev/ssrnet-toolkitC++

SSRNet: 190 Kb!! Super fast Age Estimation with MNN/TNN/ONNXRuntime C++.

Last commit Feb 4, 2022

(3 stars) (0 forks) (0 indexed issues) (0 open good first issues)

xlite-dev/torchlmPython

💎An easy-to-use PyTorch library for face landmarks detection: training, evaluation, inference, and 100+ data augmentations.🎉

Last commit Jul 16, 2025

(271 stars) (29 forks) (0 indexed issues) (0 open good first issues)

xlite-dev/yolov5face-toolkitC++

YOLO5Face 2021 with MNN/NCNN/TNN/ONNXRuntime

Last commit Apr 21, 2023

(61 stars) (8 forks) (0 indexed issues) (0 open good first issues)

Get fresh easy issues in your inbox.