flyteorg/flyte

[flyte2] Instrument the runs service DB repository layer with Prometheus metrics

Open

#7,448 opened on May 29, 2026

View on GitHub
 (1 comment) (0 reactions) (0 assignees)Python (3,705 stars) (378 forks)batch import
flyte2good first issue

Description

Part of #7445. Depends on #7446 (the /metrics endpoint + Scope must exist first).

Summary

Add Prometheus metrics to the runs service database repository layer so we can observe DB call volume, error rates, and latency per operation.

Background

The repository implementations live in runs/repository/impl/ (e.g. task.go, trigger.go, and the run repo). These wrap GORM/DB calls and currently emit no metrics — there is no visibility into how often each query runs, how long it takes, or how often it fails.

What to do

  1. Thread the metrics Scope (from #7446, available via app.SetupContext.Scope) into the repository constructor(s). runs/setup.go calls repository.NewRepository(sc.DB, cfg.Database) and impl.NewProjectRepo(sc.DB) — extend these (or wrap the repo) to accept a promutils.Scope.

  2. For each DB operation (create/get/list/update/delete), record:

    • call count (labeled by operation name)
    • error count (labeled by operation name)
    • latency (a Prometheus timer / stopwatch)

    A small helper that wraps a DB call with start := time.Now(); defer timer.Stop() + counter increments keeps this DRY.

Acceptance criteria

  • /metrics exposes per-operation DB call count, error count, and latency for the runs repository.
  • Metrics are created once (no duplicate-registration panics) using a dedicated sub-scope, e.g. scope.NewSubScope("db").
  • Unit tests assert that a repository operation increments the expected counter / records latency.

Pointers

  • runs/repository/impl/ — repository implementations to instrument.
  • runs/repository/repository.go (the NewRepository constructor) and runs/setup.go:40 where it's called.
  • flytestdlib/promutils/scope.goScope helpers (MustNewCounter, MustNewStopWatch, NewSubScope).

Notes for contributors

  • Keep label cardinality low: label by operation name, never by row IDs / project / user values.
  • This can be done independently of #7447 (RPC interceptors); both consume the same Scope.

Contributor guide