[Kernel] cuDNN attention backend · sgl-project/sglang#2272 | Good First Issue

(5 留言) (1 反應) (2 負責人)Python (6,216 fork)auto 404

enhancementgood first issuehelp wantedhigh priorityinactive

倉庫指標

Star: (28,442 star)
PR 合併指標: (平均合併 2天 1小時) (30 天內合併 1,000 個 PR)

描述

cuDNN provides very fast attention implementation and it is well maintained by NVIDIA. We would like to add a new attention backend based on cudnn.

Steps

Learn this cudnn paged attention python api. https://github.com/NVIDIA/cudnn-frontend/blob/v1.8.0/samples/python/52_scaled_dot_product_attention_with_paged_caches.ipynb
Add a new attention backend "cudnn" here https://github.com/sgl-project/sglang/tree/main/python/sglang/srt/layers/attention
We should be able to use it with python3 -m sglang.launch_server --model meta-llama/Llama-3.1-8B-Instruct --attention-backend cudnn

貢獻者指南

研究方向: 研究 sglang/srt/layers/attention 中現有的注意力後端，理解其介面，並探索步驟 1 中連結的 cuDNN 分頁注意力 Python API。然後按照現有後端的模式實作一個新的後端。
技術棧: python
領域: backendmachine learning
議題類型: 功能
難度: 3
預計時間: 1-2 天
活動狀態: 活躍
清晰度: 清晰
前置要求: PythonCUDAcuDNN
新手友善度: 60