Feature request: Add overload action to restart Envoy when it gets stuck
#41.492 aperta il 14 ott 2025
Metriche repository
- Star
- (27.997 star)
- Metriche merge PR
- (Merge medio 8g) (378 PR mergiate in 30 g)
Descrizione
Feature request
This feature request proposes adding a new overload manager action that automatically restarts Envoy when the overload condition remains above a configured threshold for a specified duration.
Reasoning
The overload manager is typically configured to stop accepting requests once a certain threshold is reached, as shown by the documentation example.
However, this can lead to situations where Envoy becomes permanently stuck and requires a manual restart. One such case is described here, where the Fixed Heap memory monitor caused the (perceived) memory usage to remain high, preventing Envoy from recovering from overload even after memory was freed. While this specific case may be considered a bug, it still highlights undesirable behavior in the overload manager.
Other possible scenarios could include:
- A memory leak during request processing that cause memory pressure to exceed the threshold and never decrease.
- Memory fragmentation that gradually reduces usable memory to the point where any new allocations trigger the overload threshold.
- Similar issues could possibly also occur with CPU utilization, for example if a bug causes a thread to busy loop.
The common pattern in the cases is that the only recovery option is restart, yet the overload manager's configured actions explicitly prevent that from happening (e.g. through an OOM kill).