envoyproxy/envoy

Feature request: Add overload action to restart Envoy when it gets stuck

Open

#41 492 ouverte le 14 oct. 2025

Voir sur GitHub
 (10 commentaires) (0 réactions) (0 assignés)C++ (5 373 forks)batch import
area/overload_managerenhancementhelp wantedno stalebot

Métriques du dépôt

Stars
 (27 997 stars)
Métriques de merge PR
 (Merge moyen 8j) (378 PRs mergées en 30 j)

Description

Feature request

This feature request proposes adding a new overload manager action that automatically restarts Envoy when the overload condition remains above a configured threshold for a specified duration.

Reasoning

The overload manager is typically configured to stop accepting requests once a certain threshold is reached, as shown by the documentation example.

However, this can lead to situations where Envoy becomes permanently stuck and requires a manual restart. One such case is described here, where the Fixed Heap memory monitor caused the (perceived) memory usage to remain high, preventing Envoy from recovering from overload even after memory was freed. While this specific case may be considered a bug, it still highlights undesirable behavior in the overload manager.

Other possible scenarios could include:

  • A memory leak during request processing that cause memory pressure to exceed the threshold and never decrease.
  • Memory fragmentation that gradually reduces usable memory to the point where any new allocations trigger the overload threshold.
  • Similar issues could possibly also occur with CPU utilization, for example if a bug causes a thread to busy loop.

The common pattern in the cases is that the only recovery option is restart, yet the overload manager's configured actions explicitly prevent that from happening (e.g. through an OOM kill).

Guide contributeur