envoyproxy/envoy

Feature request: Add overload action to restart Envoy when it gets stuck

Open

#41.492 geöffnet am 14. Okt. 2025

Auf GitHub ansehen
 (10 Kommentare) (0 Reaktionen) (0 zugewiesene Personen)C++ (5.373 Forks)batch import
area/overload_managerenhancementhelp wantedno stalebot

Repository-Metriken

Stars
 (27.997 Stars)
PR-Merge-Metriken
 (Durchschn. Merge 8T) (378 gemergte PRs in 30 T)

Beschreibung

Feature request

This feature request proposes adding a new overload manager action that automatically restarts Envoy when the overload condition remains above a configured threshold for a specified duration.

Reasoning

The overload manager is typically configured to stop accepting requests once a certain threshold is reached, as shown by the documentation example.

However, this can lead to situations where Envoy becomes permanently stuck and requires a manual restart. One such case is described here, where the Fixed Heap memory monitor caused the (perceived) memory usage to remain high, preventing Envoy from recovering from overload even after memory was freed. While this specific case may be considered a bug, it still highlights undesirable behavior in the overload manager.

Other possible scenarios could include:

  • A memory leak during request processing that cause memory pressure to exceed the threshold and never decrease.
  • Memory fragmentation that gradually reduces usable memory to the point where any new allocations trigger the overload threshold.
  • Similar issues could possibly also occur with CPU utilization, for example if a bug causes a thread to busy loop.

The common pattern in the cases is that the only recovery option is restart, yet the overload manager's configured actions explicitly prevent that from happening (e.g. through an OOM kill).

Contributor Guide