[Docs] Document NIXL KV connector metrics aggregation semantics · vllm-project/vllm#41230

(8 comments) (1 reaction) (1 assignee)Python (16,816 forks)batch import

good first issue

Repository metrics

Stars: (80,034 stars)
PR merge metrics: (Avg merge 3d 17h) (993 merged PRs in 30d)

Description

Summary

The NIXL KV connector logs transfer metrics periodically:

KV Transfer metrics: Num successful transfers=4, Avg xfer time (ms)=1.381, P90 xfer time (ms)=2.601, Avg post time (ms)=0.672, P90 post time (ms)=0.801, Avg MB per transfer=2.25, Throughput (MB/s)=1629.549, Avg number of descriptors=72.0

Currently there is no documentation explaining what these metrics represent, especially in the context of multi-rank (TP > 1) deployments. This has already caused confusion among users.

Current behavior

All metrics are aggregated across all TP ranks before summary stats are computed:

Each TP rank independently records per-transfer telemetry (transfer_duration, post_duration, bytes_transferred, num_descriptors) via NixlKVConnectorStats.record_transfer() in stats.py.
Stats from all ranks are concatenated via aggregate() (list.extend()).
reduce() computes averages, percentiles, and throughput over the combined pool of observations from all ranks.

This means:

"Num successful transfers" is the total count across all ranks, not per-rank.
"Avg MB per transfer" is the average over all individual rank-level transfers, not the total bytes moved for a single KV cache transfer operation.
"Throughput (MB/s)" is total_MB_all_ranks / total_time_all_ranks — effectively an average per-rank throughput, not the aggregate system throughput.
Percentiles (P90) are computed over the combined distribution of all ranks' transfer times.

This is unintuitive because users may expect metrics to reflect per-engine totals or aggregate system throughput.

What needs to be documented

Docstrings in stats.py: Add clear documentation to NixlKVConnectorStats explaining that stats are aggregated across all TP ranks and what each metric represents in that context.
Inline comments in reduce(): Clarify the semantics of throughput and averages — that they are per-rank averages over the combined observation pool.
Docstrings in metrics.py: Document the observe() → aggregate() → reduce() → log() pipeline and the fact that stats arrive pre-aggregated across workers.
(Optional) Docs page: Add a section to the disaggregated serving documentation explaining how to interpret the KV Transfer metrics log line.

Relevant files

vllm/distributed/kv_transfer/kv_connector/v1/nixl/stats.py — NixlKVConnectorStats (recording, aggregation, reduction)
vllm/distributed/kv_transfer/kv_connector/v1/metrics.py — KVConnectorLogging (observe/log pipeline), KVConnectorStats (base class)

Context

See related discussion: metrics are aggregated across ranks rather than reported per-rank or per-engine. This is a deliberate design choice (fire-and-forget from workers), but it needs to be clearly documented so users can correctly interpret the numbers.

Contributor guide

Research direction: Read the relevant files (stats.py, metrics.py) to understand the aggregation pipeline, then add docstrings and inline comments explaining that metrics are aggregated across all TP ranks.
Tech stack: python
Domain: backenddocumentation
Issue type: Documentation
Difficulty: 2
Estimated time: 1-3 hours
Activity status: Active
Clarity: Clear
Prerequisites: GitPython
Newbie friendliness: 85