cilium_operator_image_repo references wrong image name for offline registry sync (`cilium/operator` vs chart-required `cilium/operator-generic`)
#13252 opened on May 12, 2026
Description
What happened?
Environment
- Kubespray: master branch, commit
bdbfcaae8(v2.30.0-94), bug also present onorigin/masterHEAD - Cilium: 1.19.2 chart and image
- cilium-cli: v0.18.9
- K8s: 1.34.4 with containerd
- Deployment mode: offline registry (
dockerhub_image_reposet)
Summary
Kubespray syncs the cilium/operator image to offline registries, but the
Cilium Helm chart requires cilium/operator-generic for non-cloud
deployments. This mismatch causes broken deployments for offline-registry
users.
Root cause
Cilium chart image naming convention
In cilium/templates/cilium-operator/_helpers.tpl, the chart constructs
the operator image name as:
{repository}-{cloud}{suffix}{tag}{digest}
where {cloud} is determined by cilium.operator.cloud define:
awsifeni.enabledazureifazure.enabledalibabacloudifalibabacloud.enabledgenericotherwise (default for non-cloud, including bare-metal) So the rendered image for a non-cloud deployment with default values is:
quay.io/cilium/operator + "-" + "generic" + "" + ":v1.19.2"
= quay.io/cilium/operator-generic:v1.19.2
The same {cloud} variable is also used in the deployment's command:
command:
- cilium-operator-{{ include "cilium.operator.cloud" . }}
Kubespray mismatch
roles/kubespray_defaults/defaults/main/download.yml:237:
cilium_operator_image_repo: "{{ quay_image_repo }}/cilium/operator"
This value is used in two places:
download.yml:599(image sync entry) — Kubespray pullsquay.io/cilium/operator:vX.Y.Zand pushes it to the offline registry as<registry>/cilium/operator:vX.Y.Z.roles/network_plugin/cilium/templates/values.yaml.j2:154-157— rendered to chart values as:operator: image: repository: <registry>/cilium/operator tag: vX.Y.Z
The chart then applies its helper logic and ends up requesting image
<registry>/cilium/operator-generic:vX.Y.Z — a name not synced to
the offline registry.
Why online-registry users don't hit this
Online users get quay.io/cilium/operator-generic:vX.Y.Z directly (chart
default repository + chart helper). Kubespray's override of the
repository field reuses the same cilium/operator base, so the chart
helper still produces a valid name in quay.io/cilium/operator-generic,
which exists upstream.
Why this hasn't been reported
Offline-registry users typically have an image sync workflow that inadvertently masks the bug:
- The sync sees
cilium/operator:vX.Y.Zin Kubespray's list - The pull from upstream succeeds (this name exists for cloud variants build base)
- Some sync scripts auto-retag the pulled image to additional aliases
including
cilium/operator-generic - Cluster pulls
cilium/operator-genericsuccessfully — but gets the wrong image content (containscilium-operatorbinary, notcilium-operator-generic) - For some Cilium versions/cilium-cli combinations, the resulting
deployment's
commandhappens to match the wrong binary, and the pod runs — appearing as a working deployment This silent failure mode can persist until a chart upgrade synchronizes the deployment'scommandfield with the chart's expectation (cilium-operator-generic), at which point new pods CrashLoopBackOff:
exec: "cilium-operator-generic": executable file not found in $PATH
Reproduction
- Configure Kubespray for offline registry:
dockerhub_image_repo: "<registry>/kubespray" - Use Kubespray's image sync (without any extra retagging) to populate
the offline registry from
cilium_image_list. - Verify only
cilium/operator(notcilium/operator-generic) ends up in the offline registry:curl -s "https://<registry>/v2/kubespray/cilium/operator-generic/tags/list" # 404 or empty - Deploy cilium:
ansible-playbook cluster.yml -i inventory/... --tags cilium - Observe operator pods CrashLoopBackOff with:
Failed to pull image "<registry>/kubespray/cilium/operator-generic:vX.Y.Z": manifest unknown
Evidence
Available on request. Key data points:
- helm template rendering with default values produces
quay.io/cilium/operator-generic:v1.19.2 - helm template with
--set operator.image.repository=<custom>/cilium/operatorstill produces<custom>/cilium/operator-generic:v1.19.2(chart helper unconditionally adds-{cloud}suffix) - A real offline-registry deployment ended up with three different
ReplicaSet specs over several upgrades; the only Ready one used
image=
operator+ command=cilium-operator(matching by luck), while the deterministic chart output (image=operator-generic+ command=cilium-operator-generic) consistently failed
What did you expect to happen?
cilium-operator deployment should be Running with image and command fields that match what the offline registry contains. Specifically:
Kubespray should sync the upstream image with the same name the chart's helper will compute (cilium/operator-generic for non-cloud deployments), OR Kubespray should pass operator.image.override to the chart so the chart skips the helper's suffix logic and uses the explicit image name Kubespray has synced.
In either case, the final deployed image and command should be consistent and point to a valid binary inside the image.
How can we reproduce it (as minimally and precisely as possible)?
-
Configure Kubespray for offline registry deployment:
# inventory/<cluster>/group_vars/all/offline.yml dockerhub_image_repo: "<your-registry>/kubespray" quay_image_repo: "{{ dockerhub_image_repo }}" -
Sync images to your offline registry using whatever workflow you have (typically pulling from
quay.ioand pushing to your private registry). Do not auto-retag — only push the exact image name incilium_image_list. -
Verify the offline registry only contains
cilium/operator(the name Kubespray syncs):curl -s "https://<registry>/v2/kubespray/cilium/operator/tags/list" # {"name":"kubespray/cilium/operator","tags":["v1.19.2"]} curl -s "https://<registry>/v2/kubespray/cilium/operator-generic/tags/list" # {"errors":[{"code":"NAME_UNKNOWN", ...}]} -
Deploy with:
kube_network_plugin: cilium cilium_version: 1.19.2ansible-playbook -i inventory/<cluster>/hosts.yaml cluster.yml --tags cilium -
Observe
cilium-operatorpods CrashLoopBackOff withErrImagePullorexec: "cilium-operator-generic": executable file not found. -
Verify root cause with
helm template:helm template cilium <cilium-1.19.2-chart-path> \ --namespace kube-system \ --set operator.image.repository=<registry>/kubespray/cilium/operator \ --set operator.image.tag=v1.19.2 \ --set operator.image.useDigest=false \ | grep -A2 "name: cilium-operator$" # Shows image: <registry>/kubespray/cilium/operator-generic:v1.19.2 (chart added -generic)
OS
Ubuntu 24
Version of Ansible
ansible [core 2.18.12]
Version of Python
python version = 3.12.4 (main, Jul 5 2024, 11:37:28) [GCC 9.4.0] (/usr/local/python3.12/bin/python3.12)
Version of Kubespray (commit)
commit bdbfcaae847ab0f2adcb3b420b12b1c5b4baffce tag: v2.30.0-94-gbdbfcaae8 (master branch, 94 commits after v2.30.0)
Network plugin used
cilium
Full inventory with variables
https://gist.github.com/Feelings0220/e531c5a94af04ecbc279314086cdfd45
Command used to invoke ansible
ansible-playbook -i inventory//hosts.yaml \ cluster.yml \ -b \ --become-user=root \ -e kube_version=v1.34.4 \ -e cilium_version=1.19.2
Output of ansible run
Welcome to Ubuntu 24.04.4 LTS (GNU/Linux 6.8.0-100-generic x86_64)
- Documentation: https://help.ubuntu.com
- Management: https://landscape.canonical.com
- Support: https://ubuntu.com/pro
System information as of Tue May 12 02:57:30 PM CST 2026
System load: 0.45 Processes: 840 Usage of /home: 1.9% of 10.00TB Users logged in: 1 Memory usage: 2% IPv4 address for ens1f0: 10.8.9.168 Swap usage: 0% IPv4 address for ens1f0: 10.8.9.150 Temperature: 70.0 C
Expanded Security Maintenance for Applications is not enabled.
127 updates can be applied immediately. To see these additional updates run: apt list --upgradable
Enable ESM Apps to receive additional future security updates. See https://ubuntu.com/esm or run: sudo pro status
Failed to connect to https://changelogs.ubuntu.com/meta-release-lts. Check your Internet connection or proxy settings
=== cilium status (after upgrade) ===
/¯¯
/¯¯_/¯¯\ Cilium: OK
_/¯¯_/ Operator: 2 errors
/¯¯_/¯¯\ Envoy DaemonSet: OK
_/¯¯_/ Hubble Relay: 1 errors, 2 warnings
__/ ClusterMesh: disabled
DaemonSet cilium Desired: 8, Ready: 8/8, Available: 8/8
DaemonSet cilium-envoy Desired: 8, Ready: 8/8, Available: 8/8
Deployment cilium-operator Desired: 3, Ready: 1/3, Available: 1/3, Unavailable: 2/3
Deployment hubble-relay Desired: 1, Unavailable: 1/1
Containers: cilium Running: 8
cilium-envoy Running: 8
cilium-operator Running: 3
clustermesh-apiserver
hubble-relay Pending: 1
Cluster Pods: 45/45 managed by Cilium
Helm chart version: 1.19.2
Image versions cilium dockerhub.kubekey.local/kubernetes-kubespray/cilium/cilium:v1.19.2: 8
cilium-envoy dockerhub.kubekey.local/kubernetes-kubespray/cilium/cilium-envoy:v1.34.10-1762597008-ff7ae7d623be00078865cff1b0672cc5d9bfc6d5: 8
cilium-operator dockerhub.kubekey.local/kubernetes-kubespray/cilium/operator:v1.19.2: 3
hubble-relay dockerhub.kubekey.local/kubernetes-kubespray/cilium/hubble-relay:v1.19.2@sha256:9987c73bad48c987fd065185535fd15a6717cbe8a8caf7fc7ef0413532cf490e: 1
Errors: cilium-operator cilium-operator 2 pods of Deployment cilium-operator are not ready
cilium-operator cilium-operator deployment cilium-operator is rolling out - 2 out of 3 pods updated
hubble-relay hubble-relay 1 pods of Deployment hubble-relay are not ready
Warnings: hubble-relay hubble-relay-755c6b7747-xj4rw pod is pending
hubble-relay hubble-relay-755c6b7747-xj4rw pod is pending
=== cilium-operator pods === NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES cilium-operator-55bb5d64cc-j2kz6 1/1 Running 1 (30h ago) 30h 10.8.9.95 worker-a-03 cilium-operator-5d5878f8fb-pxbhn 0/1 CrashLoopBackOff 329 (28s ago) 27h 10.8.9.169 master-03 cilium-operator-5d5878f8fb-s4qrn 0/1 CrashLoopBackOff 327 (115s ago) 27h 10.8.9.94 worker-a-02
=== Most recent crash pod logs (last 30 lines) === Crash pod: pod/cilium-operator-5d5878f8fb-pxbhn pod/cilium-operator-5d5878f8fb-s4qrn /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-h8m79 (ro) Conditions: Type Status PodReadyToStartContainers True Initialized True Ready False ContainersReady False PodScheduled True Volumes: cilium-config-path: Type: ConfigMap (a volume populated by a ConfigMap) Name: cilium-config Optional: false kube-api-access-h8m79: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt Optional: false DownwardAPI: true QoS Class: BestEffort Node-Selectors: kubernetes.io/os=linux Tolerations: op=Exists node.cilium.io/agent-not-ready op=Exists Events: Type Reason Age From Message
Normal Created 53m (x318 over 27h) kubelet Created container: cilium-operator Warning Failed 53m (x318 over 27h) kubelet Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: exec: "cilium-operator-generic": executable file not found in $PATH Warning BackOff 3m15s (x8043 over 27h) kubelet Back-off restarting failed container cilium-operator in pod cilium-operator-5d5878f8fb-s4qrn_kube-system(ffd8a1c5-6ede-4218-a84e-1edf44318473) Normal Pulled 117s (x328 over 27h) kubelet Container image "dockerhub.kubekey.local/kubernetes-kubespray/cilium/operator:v1.19.2" already present on machine
=== Image present in offline registry === dockerhub.kubekey.local/kubernetes-kubespray/cilium/operator-generic v1.19.1 f1b5c176c6ee8 33.4MB dockerhub.kubekey.local/kubernetes-kubespray/cilium/operator-generic v1.19.2 63ae62180908e 45.7MB dockerhub.kubekey.local/kubernetes-kubespray/cilium/operator v1.19.2 63ae62180908e 45.7MB dockerhub.kubekey.local/kubernetes-kubespray/cilium/operator v1.19.1 e5091458a7e48 45.6MB user1@sz-bianyi-112:~/mao.wei11/kubespray-deploy/cilium$
Anything else we need to know
Additional context — Proposed fix If maintainers confirm this is a real issue, I can submit a PR with the following change: File 1: roles/kubespray_defaults/defaults/main/download.yml:237 diff- cilium_operator_image_repo: "{{ quay_image_repo }}/cilium/operator"
- cilium_operator_image_repo: "{{ quay_image_repo }}/cilium/operator-generic" File 2: roles/network_plugin/cilium/templates/values.yaml.j2:154-157 diffoperator: image:
- repository: {{ cilium_operator_image_repo }}
- override: "{{ cilium_operator_image_repo }}:{{ cilium_operator_image_tag }}" tag: {{ cilium_operator_image_tag }} Using operator.image.override prevents the chart helper from adding another -generic suffix (since cilium_operator_image_repo already ends in -generic). Cloud variant considerations The current default targets only non-cloud (generic) deployments, matching the most common Kubespray scenario. For users deploying with eni.enabled, azure.enabled, or alibabacloud.enabled, the fix would need to be conditionalized. Happy to extend the PR if maintainers prefer.