kubernetes-sigs/kubespray

Inconsistent etcd certificate permissions on first control-plane / etcd node

Open

#13250 opened on May 11, 2026

View on GitHub
 (2 comments) (0 reactions) (0 assignees)HTML (10,380 stars) (4,419 forks)batch import
Ubuntu 24help wantedkind/bug

Description

What happened?

Kubespray generates different ownership and permissions for etcd certificates on the first control-plane/etcd node compared to the remaining control-plane nodes.

First control plane / etcd node shows following ownership and permissions for pem certificates in /etc/ssl/etcd/ssl/ -rw------- root root for *key.pem and -rw-r--r-- root root for pem files.

Additional control planes / etcd nodes shows following: -rw-r----- etcd root for all *.pem files

This is reproducible on all our cluster (4 clusters).

Issue is probably because certificates seems to be generated on the first node, and only then copied with appropriate permissions and ownership to other nodes. If my understanding is right than they get generated in task roles/etcd/tasks/gen_certs_script.yml in task name Gen_certs | run cert generation script for etcd and kube control plane nodes and then copied to other nodes with task name name: Gen_certs | Write etcd member/admin and kube_control_plane client certs to other etcd nodes https://github.com/kubernetes-sigs/kubespray/blob/a93615ebdead888e36dadf0212ca2f2fe90c63e9/roles/etcd/tasks/gen_certs_script.yml

We found out only by accident, when our etcd backup job started on cp1 and failed with read permission. On other nodes it was completing just fine.

Used versions: Kubespray version: 2.31 Kubernetes: 1.35.4 OS: Ubuntu 24.04

What did you expect to happen?

I expect consistent permissions and ownership across the nodes

How can we reproduce it (as minimally and precisely as possible)?

Deploy the cluster and observe the file ownership and permissions on first control plane / etcd node ls -l /etc/ssl/etcd/ssl/ and compare the result with additional nodes

OS

Ubuntu 24

Version of Ansible

ansible==11.13.0 ansible-core==2.18.12

Version of Python

Python 3.12.3

Version of Kubespray (commit)

1c9add48975060f45396b34d8e022c30d7f80dab

Network plugin used

calico

Full inventory with variables

n/a

Command used to invoke ansible

ansible-playbook cluster-update.yml --become

Output of ansible run

n/a

Anything else we need to know

No response

Contributor guide