[flyte2] Instrument the runs service DB repository layer with Prometheus metrics
Aperta il 29 mag 2026
Descrizione
Part of #7445. Depends on #7446 (the
/metricsendpoint +Scopemust exist first).
Summary
Add Prometheus metrics to the runs service database repository layer so we can observe DB call volume, error rates, and latency per operation.
Background
The repository implementations live in runs/repository/impl/ (e.g. task.go, trigger.go, and the run repo). These wrap GORM/DB calls and currently emit no metrics — there is no visibility into how often each query runs, how long it takes, or how often it fails.
What to do
-
Thread the metrics
Scope(from #7446, available viaapp.SetupContext.Scope) into the repository constructor(s).runs/setup.gocallsrepository.NewRepository(sc.DB, cfg.Database)andimpl.NewProjectRepo(sc.DB)— extend these (or wrap the repo) to accept apromutils.Scope. -
For each DB operation (create/get/list/update/delete), record:
- call count (labeled by operation name)
- error count (labeled by operation name)
- latency (a Prometheus timer / stopwatch)
A small helper that wraps a DB call with
start := time.Now(); defer timer.Stop()+ counter increments keeps this DRY.
Acceptance criteria
-
/metricsexposes per-operation DB call count, error count, and latency for the runs repository. - Metrics are created once (no duplicate-registration panics) using a dedicated sub-scope, e.g.
scope.NewSubScope("db"). - Unit tests assert that a repository operation increments the expected counter / records latency.
Pointers
runs/repository/impl/— repository implementations to instrument.runs/repository/repository.go(theNewRepositoryconstructor) andruns/setup.go:40where it's called.flytestdlib/promutils/scope.go—Scopehelpers (MustNewCounter,MustNewStopWatch,NewSubScope).
Notes for contributors
- Keep label cardinality low: label by operation name, never by row IDs / project / user values.
- This can be done independently of #7447 (RPC interceptors); both consume the same
Scope.