envoyproxy/envoy

Add jitter support for HTTP max_connection_duration

Open

#42410 opened on Dec 4, 2025

View on GitHub
 (1 comment) (2 reactions) (0 assignees)C++ (27,997 stars) (5,373 forks)batch import
area/http_connection_managerenhancementhelp wanted

Description

Title: Add jitter support for HTTP max_connection_duration

Description: Currently, Envoy supports jitter for TCP max_connection_duration (implemented in #40686), but not for HTTP connections. This causes synchronised connection draining when many HTTP/2 connections reach the same max_connection_duration simultaneously, leading to thundering herd problems.

Use Case / Problem Statement

We are running a production Istio service mesh with approximately x number of HTTP/2 SIDECAR_INBOUND connections** using max_connection_duration: 7200s (2 hours).

Observed Behavior

When all connections hit the 2-hour mark simultaneously:

  1. Synchronised draining: All connections shutdown at the same time
  2. Service disruption: Incoming requests receive 503 errors during the drain window
  3. Response flags: Metrics show extensive UC (Upstream Connection Termination) flags in Istio telemetry

Evidence

From Istio/Envoy metrics during drain cycles:

istio_requests_total{response_code="503",response_flags="UC"} [high counts]
istio_requests_total{response_code="503",response_flags="UC"} 
istio_requests_total{response_code="503",response_flags="UC"} 

This is a classic thundering herd problem caused by synchronised connection lifecycle management.

We need to implement connection duration limits for compliance and security reasons but cannot do so with the current behavior. The synchronised draining creates service disruptions. Kindly help with the feature. Expecting the existing TCP jitter implementation (from #40686) to HTTP connection durations, allowing connections to be closed in a staggered manner.

Contributor guide