AWS: Cluster requires designating subnets by both ID and containing AZ
#14,710 opened on Dec 2, 2022
Description
1. What kops version are you running?
1.25.3
2. What Kubernetes version are you running?
1.24.7
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
- Write a Cluster manifest that include several "spec.subnets" entries with both the "id" and "zone" fields populated.
- Run kops replace --filename=... to store that desired state.
- Observe that kOps accepts this designation of subnets via ID and containing AZ.
- Run kops edit cluster and remove one or more of the "spec.subnets[*].zone" fields.
The subnet ID should be enough designation, given that AWS already knows the containing AZ for each subnet. - Observe that kOps rejects the "zone" field omission, complaining as follows, per the
fi/cloudup/awsup.FindRegionfunction:
Error: invalid AWS zone: “” in subnet “utility-us-east-2a”
5. What happened after the commands executed?
kOps continues to mandate that the Cluster spec designate AWS EC2 subnets by both ID and containing AZ.
6. What did you expect to happen?
Given that cloudup is able to look up a subnet's containing AZ given its ID, why do we require that the Cluster designate subnets by both of these details? At best, the nominated AZ will match what AWS already knows.
7. Please provide your cluster manifest.
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
name: my-cluster
spec:
additionalSans:
- something.example.com
api:
loadBalancer:
additionalSecurityGroups:
- sg-0e73440d5420efaca
class: Network
crossZoneLoadBalancing: true
type: Public
authorization:
rbac: {}
awsLoadBalancerController:
enableWAF: true
enableWAFv2: true
enabled: true
certManager:
enabled: true
managed: false
cloudConfig:
disableSecurityGroupIngress: true
manageStorageClasses: false
cloudProvider: aws
clusterAutoscaler:
balanceSimilarNodeGroups: true
enabled: true
configBase: s3://my-bucket/my-cluster
etcdClusters:
- etcdMembers:
- instanceGroup: master-us-east-2a
name: a
- instanceGroup: master-us-east-2b
name: b
- instanceGroup: master-us-east-2c
name: c
manager:
env:
- name: ETCD_LISTEN_METRICS_URLS
value: http://0.0.0.0:8081
- name: ETCD_METRICS
value: extensive
name: main
- etcdMembers:
- instanceGroup: master-us-east-2a
name: a
- instanceGroup: master-us-east-2b
name: b
- instanceGroup: master-us-east-2c
name: c
manager:
env:
- name: ETCD_LISTEN_METRICS_URLS
value: http://0.0.0.0:8082
- name: ETCD_METRICS
value: basic
name: events
iam:
allowContainerRegistry: true
useServiceAccountExternalPermissions: true
kubeAPIServer:
featureGates:
StatefulSetAutoDeletePVC: "true"
kubeControllerManager:
featureGates:
StatefulSetAutoDeletePVC: "true"
kubeProxy:
enabled: false
kubelet:
anonymousAuth: false
kubeReserved:
cpu: 750m
memory: .75Gi
kubernetesVersion: 1.24.7
metricsServer:
enabled: true
networkCIDR: 10.3.0.0/16
networkID: vpc-03e705c4cddfee42b
networking:
calico:
bpfEnabled: true
crossSubnet: true
encapsulationMode: vxlan
typhaReplicas: 3
nonMasqueradeCIDR: 100.64.0.0/10
podIdentityWebhook:
enabled: true
serviceAccountIssuerDiscovery:
discoveryStore: s3://some-oidc-bucket/my-cluster
enableAWSOIDCProvider: true
subnets:
- cidr: 10.3.100.0/22
id: subnet-090c2c40927d862dc
name: utility-us-east-2a
type: Utility
zone: us-east-2a
- cidr: 10.3.104.0/22
id: subnet-03a87fdc7ead5ac0c
name: utility-us-east-2b
type: Utility
zone: us-east-2b
- cidr: 10.3.108.0/22
id: subnet-0fc9d25fba3d58414
name: utility-us-east-2c
type: Utility
zone: us-east-2c
- cidr: 10.3.0.0/22
egress: nat-0cd5a5fe7dd0b849f
id: subnet-06eecb7efb2e834ae
name: us-east-2a
type: Private
zone: us-east-2a
- cidr: 10.3.4.0/22
egress: nat-0e00e598753c7b918
id: subnet-0439ad6bfde63be65
name: us-east-2b
type: Private
zone: us-east-2b
- cidr: 10.3.8.0/22
egress: nat-0f936269f80667e78
id: subnet-0dccfb6407fd18cb8
name: us-east-2c
type: Private
zone: us-east-2c
topology:
masters: private
nodes: private
8. Anything else do we need to know?
It's not clear whether this is a defect. I prefer to omit over-specifying details like these that we can at best only get wrong. If we want to note the containing AZ for a given subnet, we can write a comment to that effect in the Cluster manifest—given that we're using YAML and not JSON. If kOps needs to know the containing AZ, it should be able to consult the AWS API to learn of this detail from the related subnet.
See the preceding discussion in the "kops-users" channel of the "Kubernetes" Slack workspace.
/kind bug