[flyte2] Instrument the actions service (watcher metrics + dropped-updates counter) · flyteorg/flyte#7450

(2 commenti) (0 reazioni) (1 assegnatario)Python (378 fork)batch import

flyte2good first issue

Metriche repository

Star: (3705 star)
Metriche merge PR: (Merge medio 3g 8h) (114 PR mergiate in 30 g)

Descrizione

Part of #7445. Depends on #7446 (the /metrics endpoint + initialized Scope must exist first).

Summary

Instrument the actions service with Prometheus metrics: implement the existing dropped-updates counter TODO, and add throughput / latency / queue-depth metrics for the TaskAction watcher.

Background

The actions service is already partly wired for metrics — it just has nothing to plug into yet:

actions/setup.go:39 already passes sc.Scope into NewActionsClient(...).
actions/k8s/client.go:91 already uses scope.NewSubScope("actions_filter") for the dedup bloom filter.
actions/k8s/client.go:65 has an explicit TODO: // TODO: add a prometheus counter for dropped updates when metrics are wired up.

Note on the metrics scope: When run via the unified manager (manager/cmd/main.go:75), sc.Scope is already initialized (promutils.NewScope("flyte")) before actions.Setup runs, so the bloom-filter sub-scope at client.go:91 works and there is no panic. The dependency on #7446 is because #7446 mounts the /metrics endpoint — without it, the metrics you add here are registered into the default registry but never exposed to a scrape. (#7446 also initializes sc.Scope at the framework level, which additionally makes the standalone actions/cmd/main.go binary safe — that path currently leaves sc.Scope nil, so client.go:90-91's scope.NewSubScope(...) would panic there, since RecordFilterSize defaults to 1 << 23 > 0.)

What to do

Using the Scope available on ActionsClient (passed in via NewActionsClient), add metrics under a dedicated sub-scope (e.g. scope.NewSubScope("watcher")):

Dropped updates counter — implement the TODO at actions/k8s/client.go:65. Increment a counter whenever a watch update is dropped (e.g. buffer full / channel send would block).
Watcher throughput — counter of TaskAction events processed, labeled by result (success/error).
Processing latency — a timer/histogram around per-event handling in the watch worker loop.
Queue/buffer depth — a gauge for the watch buffer occupancy (config WatchBufferSize), updated as events are enqueued/dequeued (or sampled periodically).

Acceptance criteria

/metrics exposes a dropped-updates counter, watcher event throughput (by result), processing latency, and buffer depth for the actions service.
The TODO at actions/k8s/client.go:65 is implemented and removed.
Metrics are created once under a dedicated sub-scope (no Prometheus duplicate-registration panics).
A unit test verifies the dropped-updates counter increments when an update is dropped, and that the throughput counter increments on event processing.

Pointers

actions/k8s/client.go — the watcher, worker loop, buffer, and the dropped-updates TODO (line 65); constructor NewActionsClient (line 77) already receives a promutils.Scope.
actions/setup.go:31-40 — where NewActionsClient is constructed with sc.Scope.
flytestdlib/promutils/scope.go — Scope helpers (MustNewCounter, MustNewGauge, MustNewStopWatch, NewSubScope).

Notes for contributors

Keep label cardinality bounded — label by result/status, never by action/run IDs or other user input.
This is independent of the runs-service instrumentation issues (#7447, #7448, #7449); all consume the same Scope from #7446.

Guida contributor

Direzione di ricerca: Esplora il codice del servizio actions in actions/k8s/client.go e setup.go, e flytestdlib/promutils/scope.go per capire l'uso esistente di Scope. Aggiungi contatori, gauge e istogrammi Prometheus sotto un nuovo sottoscopo, implementa il TODO per gli aggiornamenti scartati e scrivi test unitari per verificare gli incrementi delle metriche.
Tech stack: go
Dominio: backend
Tipo issue: Funzionalità
Difficoltà: 3
Tempo stimato: 1-2 giorni
Stato attività: Attiva
Chiarezza: Chiara
Prerequisiti: GoPrometheus metrics
Adatta ai principianti: 65