envoyproxy/envoy

Add jitter support for HTTP max_connection_duration

Open

#42 410 ouverte le 4 déc. 2025

Voir sur GitHub
 (1 commentaire) (2 réactions) (0 assignés)C++ (5 373 forks)batch import
area/http_connection_managerenhancementhelp wanted

Métriques du dépôt

Stars
 (27 997 stars)
Métriques de merge PR
 (Merge moyen 7j 21h) (260 PRs mergées en 30 j)

Description

Title: Add jitter support for HTTP max_connection_duration

Description: Currently, Envoy supports jitter for TCP max_connection_duration (implemented in #40686), but not for HTTP connections. This causes synchronised connection draining when many HTTP/2 connections reach the same max_connection_duration simultaneously, leading to thundering herd problems.

Use Case / Problem Statement

We are running a production Istio service mesh with approximately x number of HTTP/2 SIDECAR_INBOUND connections** using max_connection_duration: 7200s (2 hours).

Observed Behavior

When all connections hit the 2-hour mark simultaneously:

  1. Synchronised draining: All connections shutdown at the same time
  2. Service disruption: Incoming requests receive 503 errors during the drain window
  3. Response flags: Metrics show extensive UC (Upstream Connection Termination) flags in Istio telemetry

Evidence

From Istio/Envoy metrics during drain cycles:

istio_requests_total{response_code="503",response_flags="UC"} [high counts]
istio_requests_total{response_code="503",response_flags="UC"} 
istio_requests_total{response_code="503",response_flags="UC"} 

This is a classic thundering herd problem caused by synchronised connection lifecycle management.

We need to implement connection duration limits for compliance and security reasons but cannot do so with the current behavior. The synchronised draining creates service disruptions. Kindly help with the feature. Expecting the existing TCP jitter implementation (from #40686) to HTTP connection durations, allowing connections to be closed in a staggered manner.

Guide contributeur