[Kernel] cuDNN attention backend · sgl-project/sglang#2272 | Good First Issue

(5 comments) (1 reaction) (2 assignees)Python (6,216 forks)auto 404

enhancementgood first issuehelp wantedhigh priorityinactive

Repository metrics

Stars: (28,442 stars)
PR merge metrics: (平均マージ 2d 1h) (30d で 1,000 merged PRs)

説明

cuDNN provides very fast attention implementation and it is well maintained by NVIDIA. We would like to add a new attention backend based on cudnn.

Steps

Learn this cudnn paged attention python api. https://github.com/NVIDIA/cudnn-frontend/blob/v1.8.0/samples/python/52_scaled_dot_product_attention_with_paged_caches.ipynb
Add a new attention backend "cudnn" here https://github.com/sgl-project/sglang/tree/main/python/sglang/srt/layers/attention
We should be able to use it with python3 -m sglang.launch_server --model meta-llama/Llama-3.1-8B-Instruct --attention-backend cudnn

コントリビューターガイド

調査方針: sglang/srt/layers/attention 内の既存のアテンションバックエンドを調査し、そのインターフェースを理解し、ステップ1でリンクされている cuDNN ページドアテンション Python API を探索してください。その後、既存のバックエンドのパターンに従って新しいバックエンドを実装してください。
技術スタック: python
領域: backendmachine learning
Issue 種別: 機能
難度: 3
推定時間: 1-2日
活動状況: アクティブ
明確さ: 明確
前提条件: PythonCUDAcuDNN
初心者向け度: 60