flyteorg/flyte

[flyte2] Add gRPC/Connect RPC metrics interceptor to the runs service

Open

Aperta il 29 mag 2026

Vedi su GitHub
 (1 commento) (0 reazioni) (0 assegnatari)Python (3705 star) (378 fork)batch import
flyte2good first issue

Descrizione

Part of #7445. Depends on #7446 (the /metrics endpoint + Scope must exist first).

Summary

Add RPC-level Prometheus metrics (request count, error count, latency) to the runs service by attaching a shared Connect interceptor to every service handler.

Background

The runs service is a Connect (connectrpc.com/connect) server. Handlers are mounted in runs/setup.go via calls like:

runsPath, runsHandler := workflowconnect.NewRunServiceHandler(runsSvc)
sc.Mux.Handle(runsPath, runsHandler)

There are currently no interceptors anywhere in the v2 tree, so no RPC metrics are emitted.

What to do

  1. Write a Connect interceptor (a connect.UnaryInterceptorFunc / connect.Interceptor) that records, per RPC procedure:

    • request count (e.g. requests_total labeled by procedure)
    • error count (labeled by procedure, and ideally connect.CodeOf(err))
    • latency (a Prometheus histogram / Scope.MustNewStopWatch style timer)

    Use the sc.Scope provided by #7446 to create the metrics (e.g. a sub-scope sc.Scope.NewSubScope("grpc")).

  2. Pass the interceptor to every New*ServiceHandler(...) call in runs/setup.go via connect.WithInterceptors(...), e.g.:

    interceptors := connect.WithInterceptors(metricsInterceptor)
    runsPath, runsHandler := workflowconnect.NewRunServiceHandler(runsSvc, interceptors)
    

    Apply it to RunService, InternalRunService, TaskService, IdentityService, AuthMetadataService, TriggerService, ProjectService (and RunLogsService when mounted).

Acceptance criteria

  • After making RPC calls, /metrics exposes per-procedure request count, error count, and latency metrics.
  • The interceptor is shared/created once and reused across all handlers.
  • A unit test verifies the interceptor increments the request counter (and error counter on error) for a sample procedure.

Pointers

  • runs/setup.go — all the sc.Mux.Handle(...) registrations (lines ~78-120+).
  • Connect interceptor docs: https://connectrpc.com/docs/go/interceptors/
  • flytestdlib/promutils/scope.goScope helpers (MustNewCounter, MustNewStopWatch, NewSubScope, etc.).

Notes for contributors

  • Keep label cardinality bounded — label by procedure name and status code, not by arbitrary user input.

Guida contributor