flyteorg/flyte

[flyte2] Emit Prometheus metrics from v2 services (runs service + app framework)

Open

#7,445 opened on May 29, 2026

View on GitHub
 (1 comment) (0 reactions) (0 assignees)Python (3,705 stars) (378 forks)batch import
flyte2good first issue

Description

Summary

The v2 services built on the flytestdlib/app framework (starting with the runs service) currently expose no Prometheus metrics. This is an umbrella issue tracking the work to add observability, broken into small, independent tasks that are great for first-time contributors.

Background

When you run the runs service (runs/cmd/main.goapp.App.Run()), the only HTTP endpoints served are /healthz and /readyz (flytestdlib/app/app.go:94-109). There is no /metrics endpoint, and nothing emits Prometheus metrics:

  • app.SetupContext has a Scope promutils.Scope field (flytestdlib/app/context.go:62-63), but it is never assigned.
  • runs/cmd/main.go:43 passes promutils.NewTestScope() into the DataStore — a throwaway scope that registers nothing.
  • runs/setup.go registers no counters/gauges/timers and no RPC interceptors.
  • The old v1 pattern that served /metrics (flytestdlib/profutils/server.go:118, promhttp.Handler()) was not ported into the v2 app framework.

So today a Prometheus scrape of the runs service returns nothing useful.

Goal

Give every v2 service that uses the app framework a Prometheus /metrics endpoint, a real metrics Scope, and meaningful instrumentation on the runs service hot paths.

Tasks

This work is split so multiple contributors can pick up pieces in parallel. Task 1 is the foundation and must land first; tasks 2–4 depend on it and can then proceed independently.

  • #7446 (foundation): Add a /metrics endpoint and initialize the metrics Scope in the app framework — blocker for the rest
  • #7447: Add gRPC/Connect RPC metrics interceptor to the runs service (depends on #7446)
  • #7448: Instrument the runs DB repository layer with Prometheus metrics (depends on #7446)
  • #7449: Instrument the runs reconcilers (abort-reconciler) with Prometheus metrics (depends on #7446)
  • #7450: Instrument the actions service (watcher metrics + dropped-updates counter) (depends on #7446)

Executor (controller-runtime based — already has a /metrics endpoint, but gaps remain):

  • #7453: Expose promutils (default-registry) metrics on the controller-runtime metrics endpoint — scoped metrics are collected but not scraped
  • #7455: Add custom domain metrics (TaskAction reconcile, cache, GC, plugins)
  • #7456: Dedupe the two promutils.NewScope("executor") constructions in setup.go

How to claim a task

Comment on the specific sub-issue you'd like to work on and a maintainer will assign it. Please land the foundation issue (#7446) first, since the others build on it. Each sub-issue is self-contained with file references and acceptance criteria.

Contributor guide