[Feature]: Fused Kernel for GPT-OSS Router · vllm-project/vllm#28986

(10 评论) (0 反应) (0 负责人)Python (80,034 star) (16,816 fork)batch import

feature requestgood first issuehelp wantedstale

描述

Write a fused kernel like we have for deepseek grouped_topk

No response

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

技术栈: pythonccpp
领域: backendperformance
议题类型: feature
难度: 5
预计时间: over 1 week
活动状态: active
清晰度: mostly clear
前置要求: GPU programmingCUDA/Triton kernel developmentvLLM architectureAttention mechanism
新手友好度: 8
研究方向: Examine the existing deepseek grouped topk fused kernel in the vLLM repository (likely in csrc/ or kernels/). Look at the current GPT OSS router implementation (e.g., in vllm/model executor/layers/fused moe/ or similar). Identify the unfused operations and consider how to fuse them. Review issue comments for any initial design suggestions or performance profiling data. Refer to the alternatives mentioned (Triton vs CUDA) and evaluate trade offs. Consider writing a prototype in Triton for rapid iteration.