vllm-project/vllm

[Roadmap]: PD Disaggregation with `NixlConnector` Roadmap

Open

#33702 opened on Feb 3, 2026

View on GitHub
 (5 comments) (15 reactions) (0 assignees)Python (80,034 stars) (16,816 forks)batch import
feature requesthelp wanted

Description

🚀 The feature, motivation and pitch

Description

This RFC tracks the current state and planned improvements for Prefill-Decode (P/D) Disaggregation using the NixlConnector, which enables high-performance KV cache transfer between prefill and decode instances using the NIXL library.

Currently Supported Features

Core Infrastructure

Async KV Cache Transfers

Multi-Transport Backend Support

Tensor Parallelism

MLA

CPU Host Buffer Transfers

Heterogeneous Configurations The following also partially enable Hybrid hardware deployment among other use-cases.

Reliability & Observability

Deployment Configurations Guides & Docs

Spec Decoding

SSM

Work in Progress

Planned

  • Nixl + HMA support request failure handling
  • Documentation improvements - Clarify PD feature matrix in docs with examples
  • Pipeline parallelism support - P/D disaggregation with pipeline parallelism
  • Multi-backend model support - Models with multiple attention backends (mostly validation of HMA feature coverage)
  • Hybrid hardware deployment - Supported in the measure tested by @xuechendi and team. Another AMD-Nvidia use-case reported https://uccl-project.github.io/posts/uccl-ep-full/. This is un-tested in CI and we should clarify capabilities and limitations.
  • Mamba1 support
  • FP8 support (attention-dependent for now, depending on how scales are stored) - Issue requesting support https://github.com/vllm-project/vllm/issues/42179

Backlog

  • HTTP-based handshake endpoint - Replace ZMQ side channel with HTTP for better observability
  • Transfer Failure handling for HMA
  • More efficient h2d copy_blocks operations for HMA groups
  • Heterogenous block size (blcok_size_ration > 1) HMA support

RFCs

Known Issues/Bugs:

Bug Fixes

Related Projects

cc @robertgshaw2-redhat @tlrmchlsmth @markmc @njhill @orozery

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Contributor guide