[Docs] Document NIXL KV connector metrics aggregation semantics

(8 Kommentare) (1 Reaktion) (1 zugewiesene Person)Python (16.816 Forks)batch import

good first issue

Repository-Metriken

Stars: (80.034 Sterne)
PR-Merge-Metriken: (Durchschn. Merge 3T 17h) (993 gemergte PRs in 30 T)

Beschreibung

Summary

The NIXL KV connector logs transfer metrics periodically:

KV Transfer metrics: Num successful transfers=4, Avg xfer time (ms)=1.381, P90 xfer time (ms)=2.601, Avg post time (ms)=0.672, P90 post time (ms)=0.801, Avg MB per transfer=2.25, Throughput (MB/s)=1629.549, Avg number of descriptors=72.0

Currently there is no documentation explaining what these metrics represent, especially in the context of multi-rank (TP > 1) deployments. This has already caused confusion among users.

Current behavior

All metrics are aggregated across all TP ranks before summary stats are computed:

Each TP rank independently records per-transfer telemetry (transfer_duration, post_duration, bytes_transferred, num_descriptors) via NixlKVConnectorStats.record_transfer() in stats.py.
Stats from all ranks are concatenated via aggregate() (list.extend()).
reduce() computes averages, percentiles, and throughput over the combined pool of observations from all ranks.

This means:

"Num successful transfers" is the total count across all ranks, not per-rank.
"Avg MB per transfer" is the average over all individual rank-level transfers, not the total bytes moved for a single KV cache transfer operation.
"Throughput (MB/s)" is total_MB_all_ranks / total_time_all_ranks — effectively an average per-rank throughput, not the aggregate system throughput.
Percentiles (P90) are computed over the combined distribution of all ranks' transfer times.

This is unintuitive because users may expect metrics to reflect per-engine totals or aggregate system throughput.

What needs to be documented

Docstrings in stats.py: Add clear documentation to NixlKVConnectorStats explaining that stats are aggregated across all TP ranks and what each metric represents in that context.
Inline comments in reduce(): Clarify the semantics of throughput and averages — that they are per-rank averages over the combined observation pool.
Docstrings in metrics.py: Document the observe() → aggregate() → reduce() → log() pipeline and the fact that stats arrive pre-aggregated across workers.
(Optional) Docs page: Add a section to the disaggregated serving documentation explaining how to interpret the KV Transfer metrics log line.

Relevant files

vllm/distributed/kv_transfer/kv_connector/v1/nixl/stats.py — NixlKVConnectorStats (recording, aggregation, reduction)
vllm/distributed/kv_transfer/kv_connector/v1/metrics.py — KVConnectorLogging (observe/log pipeline), KVConnectorStats (base class)

Context

See related discussion: metrics are aggregated across ranks rather than reported per-rank or per-engine. This is a deliberate design choice (fire-and-forget from workers), but it needs to be clearly documented so users can correctly interpret the numbers.

Contributor Guide

Research-Richtung: Lesen Sie die relevanten Dateien (stats.py, metrics.py), um die Aggregationspipeline zu verstehen, und fügen Sie dann Docstrings und Inline Kommentare hinzu, die erklären, dass Metriken über alle TP Ränge aggregiert werden.
Tech Stack: python
Domain: backenddocumentation
Issue Type: Dokumentation
Schwierigkeit: 2
Geschätzte Zeit: 1-3 Stunden
Aktivitätsstatus: Aktiv
Klarheit: Klar
Voraussetzungen: GitPython
Einsteigerfreundlichkeit: 85