[Feature]: Extract KV-Cache update from all attention backends

Open

#32,335 opened on Jan 14, 2026

51 comments (51 comments)0 reactions (0 reactions)9 assignees (9 assignees)Python16,816 forks (16,816 forks)batch import

feature requestgood first issuehelp wanted

Repository metrics

Similar to how https://github.com/vllm-project/vllm/pull/25954 extracts it from FlashAttn. Ideally, we want to cover all backends with kv-cache update from v1/attention/backends.

Backends:

MLA Backends:

After all backends are supported, we can remove slot_mapping from attention metadata.

No response

No response

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Research direction: Study the FlashAttention KV cache extraction PR #25954 as a reference, then apply similar pattern to AiterFlashAttention backend. Understand the slot mapping removal goal.
Tech stack: python
Domain: backend
Issue type: Feature
Difficulty: 5
Estimated time: Over 1 week
Activity status: Active
Clarity: Clear
Prerequisites: Pythonattention mechanisms
Newbie friendliness: 20

Daily Newsletter

Subscribe to GoodFirstIssue Daily for newly found easy issues that are ready for beginner-friendly open source work.

Email address*

No spam. Unsubscribe from any email.

Good First Issue

Find beginner-friendly issues and build the portfolio AI can't fake.