Feature request: Add overload action to restart Envoy when it gets stuck
#41.492 aberto em 14 de out. de 2025
Métricas do repositório
- Stars
- (27.997 stars)
- Métricas de merge de PR
- (Mesclagem média 8d) (378 fundiu PRs em 30d)
Description
Feature request
This feature request proposes adding a new overload manager action that automatically restarts Envoy when the overload condition remains above a configured threshold for a specified duration.
Reasoning
The overload manager is typically configured to stop accepting requests once a certain threshold is reached, as shown by the documentation example.
However, this can lead to situations where Envoy becomes permanently stuck and requires a manual restart. One such case is described here, where the Fixed Heap memory monitor caused the (perceived) memory usage to remain high, preventing Envoy from recovering from overload even after memory was freed. While this specific case may be considered a bug, it still highlights undesirable behavior in the overload manager.
Other possible scenarios could include:
- A memory leak during request processing that cause memory pressure to exceed the threshold and never decrease.
- Memory fragmentation that gradually reduces usable memory to the point where any new allocations trigger the overload threshold.
- Similar issues could possibly also occur with CPU utilization, for example if a bug causes a thread to busy loop.
The common pattern in the cases is that the only recovery option is restart, yet the overload manager's configured actions explicitly prevent that from happening (e.g. through an OOM kill).