WAL Replay not using more memory than before the restart
#16,942 创建于 2025年7月29日
描述
This is to track some ideas from https://github.com/prometheus/prometheus/issues/6934
The current algorithm tries to replay WAL as fast as possible after restart, which can use more memory than the Prometheus use.
This could be problematic for cases where Prometheus is under pressure (tons of metrics and low memory limit) and some operation like an expensive query or API call is OOM-ing it. The recovery is impossible due to startup using even more memory, so manual removal of WAL is needed.
For any other OOMs around too many series scraped, where no specific query or API caused the OOM, but it's just high use due to too many series scraped, this feature (improving startup use) is not going to help alone, but might unlock other options like compact/truncate on start to move big load to TSDB blocks for further debugging and work.
Please use this issue if you have thoughts around replay memory consumption alone. To discuss the OOM detection ideas and the general OOM handling or safeguards, let's use the https://github.com/prometheus/prometheus/issues/13939 issue. For the general unexpected OOMs, where clearly Prometheus uses unexpected amount of memory, given the scraped/ingested load you put through it, please open separate issue.
Acceptance Criteria
- A mode (or by default, if fast enough) where Prometheus startup does not use more memory then the "normal" use.
Ideas
- Add a mode where Prometheus is slowing down and replaying in segments that guarantee stable resource usage.
- Truncate before replay (https://github.com/prometheus/prometheus/issues/13939)
- General garbage optimizations if possible.