Repositories

xlite-dev repositories

18 supported repositories

📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉

Last commit Mar 19, 2026

 (562 stars) (26 forks) (0 indexed issues) (0 open good first issues)

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Last commit Apr 20, 2026

 (5,277 stars) (384 forks) (0 indexed issues) (0 open good first issues)

⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.

Last commit May 10, 2025

 (155 stars) (9 forks) (0 indexed issues) (0 open good first issues)

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Last commit May 17, 2026

 (11,209 stars) (1,142 forks) (0 indexed issues) (0 open good first issues)

🔥Robust Video Matting C++ inference toolkit with ONNXRuntime、MNN、NCNN and TNN, via lite.ai.toolkit.

Last commit Jul 29, 2024

 (142 stars) (27 forks) (0 indexed issues) (0 open good first issues)

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Last commit Jan 17, 2026

 (0 stars) (0 forks) (0 indexed issues) (0 open good first issues)

🤖FFPA: Extends FlashAttention-2 via Split-D for large headdims, 1.5x~3×↑🎉 vs SDPA, up to 430T🎉 on H200.

Last commit Jun 6, 2026

 (306 stars) (20 forks) (0 indexed issues) (0 open good first issues)

FlashInfer: Kernel Library for LLM Serving

Last commit May 1, 2026

 (0 stars) (0 forks) (0 indexed issues) (0 open good first issues)

FSANet: 1 Mb!! Head Pose Estimation with MNN、TNN and ONNXRuntime C++.

Last commit Feb 4, 2022

 (17 stars) (2 forks) (0 indexed issues) (0 open good first issues)

📚《统计学习方法-李航: 笔记》 200页PDF,公式细节讲解🎉

Last commit Jul 13, 2025

 (495 stars) (61 forks) (0 indexed issues) (0 open good first issues)

🛠A lite C++ AI toolkit: 100+ models with MNN, ORT and TRT, including Det, Seg, Stable-Diffusion, Face-Fusion, etc.🎉

Last commit Mar 19, 2026

 (4,412 stars) (781 forks) (0 indexed issues) (0 open good first issues)

MGMatting with MNN/TNN/ONNXRuntime C++, GPU/CPU, support dynamic shape.

Last commit Feb 3, 2022

 (8 stars) (2 forks) (0 indexed issues) (0 open good first issues)

NanoDet、NanoDet-Plus with ONNXRuntime/MNN/TNN/NCNN C++.

Last commit Dec 27, 2021

 (30 stars) (7 forks) (0 indexed issues) (0 open good first issues)

☕️ A vscode extension for netron, support *.pdmodel, *.nb, *.onnx, *.pb, *.h5, *.tflite, *.pth, *.pt, *.mnn, *.param, etc.

Last commit Jun 4, 2023

 (14 stars) (0 forks) (0 indexed issues) (0 open good first issues)

Super fast accurate face detector ! SCRFD(CVPR 2021) with MNN/TNN/NCNN/ONNXRuntime C++.

Last commit Jan 12, 2022

 (20 stars) (4 forks) (0 indexed issues) (0 open good first issues)

SSRNet: 190 Kb!! Super fast Age Estimation with MNN/TNN/ONNXRuntime C++.

Last commit Feb 4, 2022

 (3 stars) (0 forks) (0 indexed issues) (0 open good first issues)

💎An easy-to-use PyTorch library for face landmarks detection: training, evaluation, inference, and 100+ data augmentations.🎉

Last commit Jul 16, 2025

 (271 stars) (29 forks) (0 indexed issues) (0 open good first issues)

YOLO5Face 2021 with MNN/NCNN/TNN/ONNXRuntime

Last commit Apr 21, 2023

 (61 stars) (8 forks) (0 indexed issues) (0 open good first issues)