[Feature]: Batch Invariant Feature and Performance Optimization · vllm-project/vllm#27433

Repository metrics

But there are still some work to be done, so here is the issue to track the work

Prefix caching support
NVFP4 support
AMD testing/support
Speculative decoding support (this might be hard)
vLLM Support for Generic Model Definitions @bwasti https://github.com/vllm-project/vllm/issues/28326
(Out of scope) DP Support https://github.com/vllm-project/vllm/issues/30321

🙋Help needed for validations of more models.

Test a model using the script in https://github.com/vllm-project/vllm/tree/main/tests/v1/determinism
Submit a PR updating the document

Research direction: Investigate the FLASHINFER MLA support task. Study the existing flashinfer integration in the codebase and the linked issue flashinfer ai/flashinfer#2107. Understand the MLA kernel requirements and implement support by following the pattern of other backends like Flashinfer and Triton.
Tech stack: pythonpytorch
Domain: backend
Issue type: Feature
Difficulty: 4
Estimated time: Over 1 week
Activity status: Active
Clarity: Mostly clear
Prerequisites: PythonvLLMCUDAPyTorch
Newbie friendliness: 30