[Docs] Document NIXL KV connector metrics aggregation semantics · vllm-project/vllm#41230

(4 comments) (1 reaction) (1 assignee)Python (80,034 stars) (16,816 forks)batch import

good first issue

説明

Summary

The NIXL KV connector logs transfer metrics periodically:

KV Transfer metrics: Num successful transfers=4, Avg xfer time (ms)=1.381, P90 xfer time (ms)=2.601, Avg post time (ms)=0.672, P90 post time (ms)=0.801, Avg MB per transfer=2.25, Throughput (MB/s)=1629.549, Avg number of descriptors=72.0

Currently there is no documentation explaining what these metrics represent, especially in the context of multi-rank (TP > 1) deployments. This has already caused confusion among users.

Current behavior

All metrics are aggregated across all TP ranks before summary stats are computed:

Each TP rank independently records per-transfer telemetry (transfer_duration, post_duration, bytes_transferred, num_descriptors) via NixlKVConnectorStats.record_transfer() in stats.py.
Stats from all ranks are concatenated via aggregate() (list.extend()).
reduce() computes averages, percentiles, and throughput over the combined pool of observations from all ranks.

This means:

"Num successful transfers" is the total count across all ranks, not per-rank.
"Avg MB per transfer" is the average over all individual rank-level transfers, not the total bytes moved for a single KV cache transfer operation.
"Throughput (MB/s)" is total_MB_all_ranks / total_time_all_ranks — effectively an average per-rank throughput, not the aggregate system throughput.
Percentiles (P90) are computed over the combined distribution of all ranks' transfer times.

This is unintuitive because users may expect metrics to reflect per-engine totals or aggregate system throughput.

What needs to be documented

Docstrings in stats.py: Add clear documentation to NixlKVConnectorStats explaining that stats are aggregated across all TP ranks and what each metric represents in that context.
Inline comments in reduce(): Clarify the semantics of throughput and averages — that they are per-rank averages over the combined observation pool.
Docstrings in metrics.py: Document the observe() → aggregate() → reduce() → log() pipeline and the fact that stats arrive pre-aggregated across workers.
(Optional) Docs page: Add a section to the disaggregated serving documentation explaining how to interpret the KV Transfer metrics log line.

Relevant files

vllm/distributed/kv_transfer/kv_connector/v1/nixl/stats.py — NixlKVConnectorStats (recording, aggregation, reduction)
vllm/distributed/kv_transfer/kv_connector/v1/metrics.py — KVConnectorLogging (observe/log pipeline), KVConnectorStats (base class)

Context

See related discussion: metrics are aggregated across ranks rather than reported per-rank or per-engine. This is a deliberate design choice (fire-and-forget from workers), but it needs to be clearly documented so users can correctly interpret the numbers.

コントリビューターガイド

技術スタック: python
領域: documentation
Issue 種別: documentation
難度: 2
推定時間: 1-3 hours
活動状況: fresh
明確さ: clear
前提条件: basic understanding of Pythonfamiliarity with distributed metrics concepts
初心者向け度: 80
調査方針: Examine the files 'vllm/distributed/kv transfer/kv connector/v1/nixl/stats.py' and 'vllm/distributed/kv transfer/kv connector/v1/metrics.py' to understand the current implementation. Add docstrings to 'NixlKVConnectorStats' and inline comments in the 'reduce()' method explaining that stats are aggregated across all TP ranks and what each metric represents. Also document the 'observe()' → 'aggregate()' → 'reduce()' → 'log()' pipeline in 'metrics.py'. Optionally, create a documentation page for the disaggregated serving section. The goal is to clarify the aggregation semantics without changing the code behavior.