Better output from kops rolling-update cluster command
#14,122 opened on Aug 12, 2022
Description
/kind feature
1. Describe IN DETAIL the feature/behavior/change you would like to see. There are multiple reasons a k8s node can be in NeedsUpdate state. I want a more focused explanation of the trigger for nodes in an InstanceGroup being in NeedsUpdate state when kops rolling-update cluster is run, possibly at a verbosity around 4.
The reason for this request is that there are multiple (four) triggers for a node being in a NeedsUpdate state. That documentation doesn't clearly state how to check those possible causes. I guess "The instance was created with a specification that is older" refers to Launch Template versions? Maybe "The instance was detached" refers to a cordon Taint?
This will speed up debugging and improve uptime. It will also expand the pool of SREs capable of debugging as not everyone has the same level of kOps/k8s expertise.
2. Feel free to provide a design supporting your feature request. Preferred Output $ kops rolling-update cluster cactus-1-23.k8s.sproutsocial.com --state s3://infra-kops-state -v4 ~/sandbox/sprout_development_env/NeedsUpdateChecker I0812 11:52:07.404391 4005 factory.go:68] state store s3://infra-kops-state ...snip... I0812 11:52:10.825012 4005 aws_cloud.go:1551] Querying EC2 for all valid zones in region "us-east-1" I0812 11:52:10.826233 4005 request_logger.go:45] AWS request: ec2/DescribeAvailabilityZones I0812 11:52:11.322863 4005 aws_cloud.go:629] Listing all Autoscaling groups matching cluster tags I0812 11:52:11.324043 4005 request_logger.go:45] AWS request: autoscaling/DescribeTags I0812 11:52:11.841028 4005 request_logger.go:45] AWS request: autoscaling/DescribeAutoScalingGroups I0812 11:52:12.022521 4005 aws_cloud.go:743] Launch Template Version Specified By ASG: $Latest I0812 11:52:12.023747 4005 request_logger.go:45] AWS request: ec2/DescribeLaunchTemplates I0812 11:52:12.141730 4005 aws_cloud.go:762] Launch Template Version used for compare: "3" I0812 11:52:12.141732 4005 aws_cloud.go:764] InstanceGroup nodes-us-east-1a nodes Launch Template are behind! I0812 11:52:14.051511 4005 aws_cloud.go:743] Launch Template Version Specified By ASG: $Latest I0812 11:52:14.051654 4005 request_logger.go:45] AWS request: ec2/DescribeLaunchTemplates I0812 11:52:14.178106 4005 aws_cloud.go:762] Launch Template Version used for compare: "4" I0812 11:52:14.178108 4005 aws_cloud.go:765] InstanceGroup nodes-us-east-1b nodes have a Cordon Taint! I0812 11:52:14.532158 4005 aws_cloud.go:743] Launch Template Version Specified By ASG: $Latest I0812 11:52:14.532365 4005 request_logger.go:45] AWS request: ec2/DescribeLaunchTemplates I0812 11:52:14.647179 4005 aws_cloud.go:762] Launch Template Version used for compare: "4" I0812 11:52:14.647181 4005 aws_cloud.go:766] InstanceGroup nodes-us-east-1d nodes have needs-update annotation ...snip...
--or even-- NAME STATUS NEEDUPDATE READY MIN TARGET MAX NODES REASON master-us-east-1a Ready 0 1 1 1 1 1 master-us-east-1b Ready 0 1 1 1 1 1 master-us-east-1d Ready 0 1 1 1 1 1 nodes-us-east-1a NeedsUpdate 2 0 2 2 2 2 Launch Template version nodes-us-east-1b NeedsUpdate 2 0 2 2 2 2 Cordon Taint nodes-us-east-1d NeedsUpdate 2 0 2 2 2 2 kops.k8s.io/needs-update