vllm-project/vllm
View on GitHub[Feature]: Extract KV-Cache update from all attention backends
Open
#32335 opened on Jan 14, 2026
feature requestgood first issuehelp wanted
Description
🚀 The feature, motivation and pitch
Similar to how https://github.com/vllm-project/vllm/pull/25954 extracts it from FlashAttn. Ideally, we want to cover all backends with kv-cache update from v1/attention/backends.
Backends:
- FlashAttention
- FlashInfer
- AiterFlashAttention (in progress)
- RocmAiterUnifiedAttention
- RocmAttention
- TritonAttention
- FlashAttentionDiffKV
- FlexAttention
- TreeAttention
MLA Backends:
- FlashAttnMLA
- FlashInferMLA
- FlashMLASparse
- FlashMLA
- AiterMLA
- ROCMAiterMLASparse
- CutlassMLA
- TritonMLA
After all backends are supported, we can remove slot_mapping from attention metadata.
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.