Overlay ARP stale entries still present on 29.5.0 — fix from #50236 incomplete · moby/moby#52661

(4 评论) (0 反应) (0 负责人)Go (71,553 star) (18,951 fork)batch import

area/networkingarea/networking/d/overlayarea/swarmhelp wantedkind/bugstatus/0-triageversion/29.5

描述

Description

Stale PERMANENT ARP entries accumulate in the overlay network namespace and cause intermittent 502 Bad Gateway / 504 Gateway Timeout errors on Docker Swarm. PR #50236 was milestoned to 29.0.0 but the issue is still reproducible on 29.3.0 and 29.5.0.

This is a follow-up to #50232 with detailed diagnostic data proving the bug is not fixed.

Reproduce

Set up a Docker Swarm cluster with multiple nodes (tested: 7 nodes, 3 managers, 4 workers, all Docker 29.5.0)
Deploy services across multiple nodes with Traefik as reverse proxy on a shared overlay network. Pin some services to specific nodes via placement constraints (node.hostname==<worker>).
Restart or redeploy any service connected to the shared overlay network: docker service update --force <service_name>
Repeat step 3 several times over 2-3 days with different services, simulating normal operational activity (deployments, updates, OOM kills).
After 1-2 days, a service starts responding with 504 Gateway Timeout. The container is healthy on localhost.
Confirm stale entries accumulated: sudo nsenter --net=/run/docker/netns/1- ip neigh show | wc -l The count will be much higher than the number of currently active containers on the network.
Confirm by cross-referencing ARP entries with active containers: docker network inspect --format '{{range .Containers}}{{.IPv4Address}} {{.Name}}{{"\n"}}{{end}}'

Expected behavior

When a container is stopped, removed, or redeployed, all PERMANENT ARP entries associated with its IP address and MAC address should be removed from the overlay network namespace (1-<networkid>) on all Swarm nodes that participate in that overlay network.

The ARP table entry count in the overlay namespace should always match the number of currently active containers (plus lb endpoints) on that network — not accumulate ghost entries from containers that no longer exist.

docker version

- Docker version: 29.5.0 (all 7 nodes)
- Cluster: Docker Swarm, 7 nodes (3 managers, 4 workers)
- OS: Debian 13 
- Proxy: Traefik 3.6.1 + tecnativa/docker-socket-proxy

docker info

Client: Docker Engine - Community
 Version:    29.5.0
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.34.0
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v5.1.3
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 6
  Running: 6
  Paused: 0
  Stopped: 0
 Images: 6
 Server Version: 29.5.0
 Storage Driver: overlayfs
  driver-type: io.containerd.snapshotter.v1
 Logging Driver: fluentd
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 CDI spec directories:
  /etc/cdi
  /var/run/cdi
 Swarm: active
  NodeID: adtgp8pmxczd1884pc5mjz7w7
  Is Manager: false
  Node Address: 10.3.0.100
  Manager Addresses:
   10.3.0.101:2377
   10.3.0.82:2377
   10.3.0.83:2377
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 77c84241c7cbdd9b4eca2591793e3d4f4317c590
 runc version: v1.3.5-0-g488fc13e
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.12.88+deb13-amd64
 Operating System: Debian GNU/Linux 13 (trixie)
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 3.83GiB
 Name: ovh-prod-nd-w0601
 ID: HHPL:HOR6:NHVS:BFZP:CMBG:FP2V:64TX:XYKK:KJ27:HNBN:Y6UA:PHY4
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  ::1/128
  127.0.0.0/8
 Live Restore Enabled: false
 Firewall Backend: iptables
  EnableUserlandProxy: true
  UserlandProxyPath: /usr/bin/docker-proxy

Additional Info

Evidence

Active containers on proxy overlay network at time of outage:

10.0.1.3/24 repo_nexus.1 10.0.1.4/24 proxy-endpoint (lb) Only 2 active containers.

ARP table in overlay namespace — 17 entries:

$ sudo nsenter --net=/run/docker/netns/1-zmq0dp2jpu ip neigh show | wc -l 17 15 ghost entries for IPs with no corresponding running container.

lb namespace ARP — empty (STALE entries already expired):

$ sudo nsenter --net=/run/docker/netns/lb_zmq0dp2jp ip neigh show (empty)

Traefik cannot reach service VIP — TCP times out:

$ sudo nsenter --net=/var/run/docker/netns/<traefik_ns> curl -v http://<service_vip>:/ --max-time 5

Connection timed out after 5001 milliseconds

Service container responds correctly on localhost:

$ sudo nsenter --net=/var/run/docker/netns/<container_ns> curl -v http://localhost/:/ --max-time 5 < HTTP/1.1 200

NetworkDB log showing entry count fluctuation on proxy network:

netID:zmq0dp2jpukn entries:31 <- normal netID:zmq0dp2jpukn entries:33 <- after container event, stale entries added netID:zmq0dp2jpukn entries:31 <- after service restart

Root Cause Analysis

After a container restart or redeployment, stale PERMANENT ARP entries remain in the overlay network namespace (1-<networkid>) pointing to IPs that no longer correspond to any running container. These entries are never removed.

When Traefik tries to reach a service VIP:

The lb namespace forwards the packet toward the destination container IP
The overlay namespace tries to resolve the MAC address via the ARP table
The ARP table contains a stale entry pointing to a now-dead VXLAN peer
The packet is dropped silently at the VXLAN level
ICMP works (stateless, small) but TCP connections hang indefinitely

docker service update --force recreates the container with a fresh network namespace and new IP assignment, which clears the broken state — but the ghost entries for all previous IPs remain in the ARP table and will cause the same problem for future services that get assigned those IPs.

Workaround

docker service update --force <affected_service> restores connectivity temporarily.

#50232 — original report, reportedly fixed by #50236 (milestone 29.0.0) — NOT fixed
#50129 — related DNS/swarm discovery issue

贡献者指南

技术栈: go
领域: backendinfrastructure
议题类型: bug
难度: 5
预计时间: over 1 week
活动状态: fresh
清晰度: clear
前置要求: Docker SwarmOverlay networkingARPMoby codebase
新手友好度: 15
研究方向: Investigate the ARP entry cleanup logic in the overlay network driver. Specifically, look at the code path where containers are removed or updated in the overlay network (likely in libnetwork/drivers/overlay/). Examine why the fix in PR #50236 did not fully remove PERMANENT ARP entries after container deletion. Check the NetworkDB log entries and ARP table management functions in the overlay driver to ensure that when a container's IP changes or container is removed, all associated ARP entries are purged from all nodes in the swarm.

描述