xlite-dev 的倉庫

📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉

最近提交 2026年3月19日

(562 stars) (26 forks) (0 個已索引 issue) (0 個開放 good first issue)

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

最近提交 2026年4月20日

(5,277 stars) (384 forks) (0 個已索引 issue) (0 個開放 good first issue)

xlite-dev/HGEMMCuda

⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.

最近提交 2025年5月10日

(155 stars) (9 forks) (0 個已索引 issue) (0 個開放 good first issue)

xlite-dev/LeetCUDACuda

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

最近提交 2026年5月17日

(11,209 stars) (1,142 forks) (0 個已索引 issue) (0 個開放 good first issue)

xlite-dev/RVM-InferenceC++

🔥Robust Video Matting C++ inference toolkit with ONNXRuntime、MNN、NCNN and TNN, via lite.ai.toolkit.

最近提交 2024年7月29日

(142 stars) (27 forks) (0 個已索引 issue) (0 個開放 good first issue)

xlite-dev/SageAttentionCuda

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

最近提交 2026年1月17日

(0 stars) (0 forks) (0 個已索引 issue) (0 個開放 good first issue)

xlite-dev/ffpa-attnPython

🤖FFPA: Extends FlashAttention-2 via Split-D for large headdims, 1.5x~3×↑🎉 vs SDPA, up to 430T🎉 on H200.

最近提交 2026年6月6日

(306 stars) (20 forks) (0 個已索引 issue) (0 個開放 good first issue)

xlite-dev/flashinferPython

FlashInfer: Kernel Library for LLM Serving

最近提交 2026年5月1日

(0 stars) (0 forks) (0 個已索引 issue) (0 個開放 good first issue)

xlite-dev/fsanet-toolkitC++

FSANet: 1 Mb!! Head Pose Estimation with MNN、TNN and ONNXRuntime C++.

最近提交 2022年2月4日

(17 stars) (2 forks) (0 個已索引 issue) (0 個開放 good first issue)

xlite-dev/lihang-notesShell

📚《统计学习方法-李航: 笔记》 200页PDF，公式细节讲解🎉

最近提交 2025年7月13日

(495 stars) (61 forks) (0 個已索引 issue) (0 個開放 good first issue)

xlite-dev/lite.ai.toolkitC++

🛠A lite C++ AI toolkit: 100+ models with MNN, ORT and TRT, including Det, Seg, Stable-Diffusion, Face-Fusion, etc.🎉

最近提交 2026年3月19日

(4,412 stars) (781 forks) (0 個已索引 issue) (0 個開放 good first issue)

xlite-dev/mgmatting-toolkitC++

MGMatting with MNN/TNN/ONNXRuntime C++, GPU/CPU, support dynamic shape.

最近提交 2022年2月3日

(8 stars) (2 forks) (0 個已索引 issue) (0 個開放 good first issue)

xlite-dev/nanodet-toolkitC++

NanoDet、NanoDet-Plus with ONNXRuntime/MNN/TNN/NCNN C++.

最近提交 2021年12月27日

(30 stars) (7 forks) (0 個已索引 issue) (0 個開放 good first issue)

xlite-dev/netron-vscode-extensionTypeScript

☕️ A vscode extension for netron, support *.pdmodel, *.nb, *.onnx, *.pb, *.h5, *.tflite, *.pth, *.pt, *.mnn, *.param, etc.

最近提交 2023年6月4日

(14 stars) (0 forks) (0 個已索引 issue) (0 個開放 good first issue)

xlite-dev/scrfd-toolkitC++

Super fast accurate face detector ! SCRFD(CVPR 2021) with MNN/TNN/NCNN/ONNXRuntime C++.

最近提交 2022年1月12日

(20 stars) (4 forks) (0 個已索引 issue) (0 個開放 good first issue)

xlite-dev/ssrnet-toolkitC++

SSRNet: 190 Kb!! Super fast Age Estimation with MNN/TNN/ONNXRuntime C++.

最近提交 2022年2月4日

(3 stars) (0 forks) (0 個已索引 issue) (0 個開放 good first issue)

xlite-dev/torchlmPython

💎An easy-to-use PyTorch library for face landmarks detection: training, evaluation, inference, and 100+ data augmentations.🎉

最近提交 2025年7月16日

(271 stars) (29 forks) (0 個已索引 issue) (0 個開放 good first issue)

xlite-dev/yolov5face-toolkitC++

YOLO5Face 2021 with MNN/NCNN/TNN/ONNXRuntime

最近提交 2023年4月21日

(61 stars) (8 forks) (0 個已索引 issue) (0 個開放 good first issue)

每天在信箱收到新鮮 Easy issues。