描述
When running in K8S, telegraf is often running with a pod memory limit. Telegraf configuration should be done such way, that it wont trigger OOM. Likely, one of the biggest memory consumer is metrics buffer keeping metrics accumulated before flushes.
Current tunable metric_buffer_limit configures number of metrics which can be stored in the buffer. It is not obvious how to use that to keep telegraf memory under desired limit to prevent OOM: each metric has variable number of labels which in turn consume variable amount of memory. Currently metric_buffer_limit needs to be set empirically and final value heavily depends on type of metrics collected, which can change over time.
If telegraf could estimate total memory usage of all metrics in the buffer, then introducing metric_buffer_size_limit tunable would allow better control over memory utilization.