feature requestgood first issuehelp wantedstale
Description
🚀 The feature, motivation and pitch
- Right now, we spend ~3.5% of the layer in the expert selection
- The operation is unfused
Write a fused kernel like we have for deepseek grouped_topk
Alternatives
- torch compile
- triton
- cuda
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.