Extend kube_inventory plugin to include resourcequota measurement and extend node and pod measurement with few more metrics
#9,621 opened on Aug 13, 2021
Description
Feature Request
The kubernetes and kube_inventory input plugins have most of the metrics to monitor k8s infrastructure needs but few resources and metrics are still missing from the set of plugins which can be easily extended and will help in better k8s monitoring.
Proposal
The kube_inventory plugin can be extended to not only have capacity and allocatable quantity metrics but also other health metrics like node status, node count and if node is schedulable or not. Also new resource measurement like resourcequoata can be added for better monitoring. These are the metrics that can be easily extended using "k8s.io/api/core/v1" library.
- kubernetes_node_condition_status
- kubernetes_node_count
- kubernetes_unschedulable
Ex.
for _, val := range n.Status.Conditions {
.
.
fields["status_condition"] = string(val.Status)
}
fields["spec_unschedulable"] = n.Spec.Unschedulable
Also new measurement type can be included with following metrics.
kubernetes_resourcequota
- tags:
- resource
- namespace
- fields:
- hard_cpu_cores_limit
- hard_memory_bytes_limit
- hard_pods_limit
- used_cpu_cores
- used_memory_bytes
- used_pods
Current behavior
Currently the metrics mentioned above have not been included in any input plugin.
Desired behavior
After the implementation of the feature, the kube_inventory plugin output should be something like this.
_> kubernetes_node,host=vjain count=8i 1628918652000000000 _> kubernetes_node,condition=Ready,host=vjain,node_name=ip-172-17-0-2.internal,status=True status_condition=1i 1629177980000000000 _> kubernetes_node,cluster_namespace=tools,condition=Ready,host=vjain,node_name=ip-172-17-0-2.internal,status=True allocatable_cpu_cores=4i,allocatable_memory_bytes=7186567168i,allocatable_millicpu_cores=4000i,allocatable_pods=110i,capacity_cpu_cores=4i,capacity_memory_bytes=7291424768i,capacity_millicpu_cores=4000i,capacity_pods=110i,spec_unschedulable=0i,status_condition=1i 1628918652000000000
Use case
We are planning to migrate our monitoring infrastructure from prometheus to telegraf and trying to fill up those gaps in the metrics desired. Combining this feature with already raised https://github.com/influxdata/telegraf/issues/8546 will serve our purpose.