Constraint enforcer counts completed ReplicatedJob tasks against node memory · moby/moby#52652

(3 commenti) (0 reazioni) (0 assegnatari)Go (71.553 star) (18.951 fork)batch import

area/swarmhelp wantedkind/bugstatus/0-triageversion/29.4

Descrizione

Description

Swarm mode constraint enforcer rejects running tasks because completed replicated-job tasks are included in the node's reservation sum. Over time, accumulated Completed job tasks push the enforcer's view of "reserved memory" beyond node capacity, and deployments fail with tasks getting rejected:

assigned node no longer meets constraints

The node has plenty of actual free memory, the enforcer's sum is wrong.

Root cause

If I understand correctly, docker swarm mode wraps the "moby/swarmkit" project. In rejectNoncompliantTasks (https://github.com/moby/swarmkit/blob/12ce3490ef26cbca6ef9b243cb013fffbfe6a6cb/manager/orchestrator/constraintenforcer/constraint_enforcer.go#L117), the task loop filters only on DesiredState:

for _, t := range tasks {
    if t.DesiredState < api.TaskStateAssigned || t.DesiredState > api.TaskStateCompleted {
        continue
    }
    ...
    available.MemoryBytes -= t.Spec.Resources.Reservations.MemoryBytes
    available.NanoCPUs    -= t.Spec.Resources.Reservations.NanoCPUs
}

For services with mode: replicated-job, finished tasks remain in the store with DesiredState = Completed (this is the terminal/normal state for a job task, not pruned by TaskHistoryRetentionLimit). They satisfy the filter and have their reservations subtracted from available, even though nothing is actually running.

Our observation

We had a stack with a "post_deploy" job that runs data migrations etc.

swarm (4 nodes, Docker 29.4.1). The worst-affected node had:

Source	Memory reserved (enforcer view)
Live running tasks	21 GB
Completed `*_post_deploy` job tasks (20 of them across 3 services)	10 GB
Total	31 GB
Node capacity	32 GB

Every stack deploy with services with heavy memory reservations (over 1 GB) that happed to get scheduled to a node with many completed post-deploy jobs would fail with "assigned node no longer meets constraints". The same cluster has another node close to capacity, that one "only" has occasional rejections. The two remaining nodes without accumulated jobs are fine (or we haven't noticed any issues). There seemed to be a correlation with accumulated Completed job tasks.

Removing and re-creating the job-mode services (clearing the Completed task history) immediately stopped the rejections.

Versions

Docker 29.4.1 on all nodes (managers and workers).

Reproduce

Create a ReplicatedJob service with non-trivial Resources.Reservations.MemoryBytes (e.g. 512 MB).
Run it to completion many times (e.g. as a post-deploy hook in CI) so the node accumulates Completed job tasks with DesiredState = Completed.
Run unrelated replicated services on the same node such that sum(reservations of running tasks) + sum(reservations of Completed job tasks) exceeds the node's total memory.
Trigger any node update (label change, heartbeat, restart). Watch live tasks get rejected with assigned node no longer meets constraints, even though free -m shows plenty of memory.

Expected behavior

No response

docker version

Client: Docker Engine - Community
 Version:           29.4.1
 API version:       1.54
 Go version:        go1.26.2
 Git commit:        055a478
 Built:             Mon Apr 20 16:32:37 2026
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          29.4.1
  API version:      1.54 (minimum version 1.40)
  Go version:       go1.26.2
  Git commit:       6c91b92
  Built:            Mon Apr 20 16:32:37 2026
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v2.2.3
  GitCommit:        77c84241c7cbdd9b4eca2591793e3d4f4317c590
 runc:
  Version:          1.3.5
  GitCommit:        v1.3.5-0-g488fc13e
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Client: Docker Engine - Community
 Version:    29.4.1
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.33.0
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v5.1.3
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 192
  Running: 52
  Paused: 0
  Stopped: 140
 Images: 198
 Server Version: 29.4.1
 Storage Driver: overlayfs
  driver-type: io.containerd.snapshotter.v1
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 CDI spec directories:
  /etc/cdi
  /var/run/cdi
 Swarm: active
  NodeID: 5t209n1f9ddy46jt14fcoxs3m
  Is Manager: false
  Node Address: -
  Manager Addresses: -
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 77c84241c7cbdd9b4eca2591793e3d4f4317c590
 runc version: v1.3.5-0-g488fc13e
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.8.0-110-generic
 Operating System: Ubuntu 24.04.4 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 31.34GiB
 Name: swarm4
 ID: 89a641b4-1d45-47d6-b41c-f5f006557916
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  ::1/128
  127.0.0.0/8
 Live Restore Enabled: false
 Firewall Backend: iptables

Additional Info

No response

Guida contributor

Tech stack: go
Dominio: backendinfrastructure
Tipo issue: bug
Difficoltà: 3
Tempo stimato: 1-3 hours
Stato attività: active
Chiarezza: clear
Prerequisiti: GoDockerSwarm
Adatta ai principianti: 30
Direzione di ricerca: Analizza la funzione rejectNoncompliantTasks in constraint enforcer.go e modifica il filtro per escludere i task con DesiredState == Completed per i servizi ReplicatedJob. Verifica la correzione scrivendo un test che simuli l'accumulo di task di job completati.