flyteorg/flyte

[flyte2] Emit Prometheus metrics from v2 services (runs service + app framework)

Open

Aperta il 29 mag 2026

Vedi su GitHub
 (1 commento) (0 reazioni) (0 assegnatari)Python (3705 star) (378 fork)batch import
flyte2good first issue

Descrizione

Summary

The v2 services built on the flytestdlib/app framework (starting with the runs service) currently expose no Prometheus metrics. This is an umbrella issue tracking the work to add observability, broken into small, independent tasks that are great for first-time contributors.

Background

When you run the runs service (runs/cmd/main.goapp.App.Run()), the only HTTP endpoints served are /healthz and /readyz (flytestdlib/app/app.go:94-109). There is no /metrics endpoint, and nothing emits Prometheus metrics:

  • app.SetupContext has a Scope promutils.Scope field (flytestdlib/app/context.go:62-63), but it is never assigned.
  • runs/cmd/main.go:43 passes promutils.NewTestScope() into the DataStore — a throwaway scope that registers nothing.
  • runs/setup.go registers no counters/gauges/timers and no RPC interceptors.
  • The old v1 pattern that served /metrics (flytestdlib/profutils/server.go:118, promhttp.Handler()) was not ported into the v2 app framework.

So today a Prometheus scrape of the runs service returns nothing useful.

Goal

Give every v2 service that uses the app framework a Prometheus /metrics endpoint, a real metrics Scope, and meaningful instrumentation on the runs service hot paths.

Tasks

This work is split so multiple contributors can pick up pieces in parallel. Task 1 is the foundation and must land first; tasks 2–4 depend on it and can then proceed independently.

  • #7446 (foundation): Add a /metrics endpoint and initialize the metrics Scope in the app framework — blocker for the rest
  • #7447: Add gRPC/Connect RPC metrics interceptor to the runs service (depends on #7446)
  • #7448: Instrument the runs DB repository layer with Prometheus metrics (depends on #7446)
  • #7449: Instrument the runs reconcilers (abort-reconciler) with Prometheus metrics (depends on #7446)
  • #7450: Instrument the actions service (watcher metrics + dropped-updates counter) (depends on #7446)

Executor (controller-runtime based — already has a /metrics endpoint, but gaps remain):

  • #7453: Expose promutils (default-registry) metrics on the controller-runtime metrics endpoint — scoped metrics are collected but not scraped
  • #7455: Add custom domain metrics (TaskAction reconcile, cache, GC, plugins)
  • #7456: Dedupe the two promutils.NewScope("executor") constructions in setup.go

How to claim a task

Comment on the specific sub-issue you'd like to work on and a maintainer will assign it. Please land the foundation issue (#7446) first, since the others build on it. Each sub-issue is self-contained with file references and acceptance criteria.

Guida contributor