vllm-project/vllm
Auf GitHub ansehen[Feature]: Integrate fused `kMoEFinalizeARResidualRMSNorm` from FlashInfer
Open
#40.544 geöffnet am 21. Apr. 2026
feature requesthelp wanted
Repository-Metriken
- Stars
- (80.034 Stars)
- PR-Merge-Metriken
- (Durchschn. Merge 9T 2h) (921 gemergte PRs in 30 T)
Beschreibung
🚀 The feature, motivation and pitch
Available today via FlashInfer 0.6.8, we can leverage this fused MoE Finalize + ResidualAdd + AllReduce + RMSNorm https://github.com/flashinfer-ai/flashinfer/pull/2982.
This should be added as a new torch.compile custom pass. I think moe_finalize might still be inside the wrapped fused_moe op so this might require pulling that out.
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.