bottlerocket-os/bottlerocket

eliminate use of `systemctl try-restart`

Open

#1,711 opened on Aug 14, 2021

View on GitHub
 (4 comments) (1 reaction) (0 assignees)Rust (6,938 stars) (386 forks)batch import
area/corehelp wantedstatus/iceboxtype/bug

Description

Today we use systemctl try-restart to attempt a service restart after applying settings. Partly this is because we process settings early in the boot, when the affected services haven't been started yet and aren't intended to start.

However, this causes trouble when changing settings at runtime, because if the service isn't running, the command will do nothing.

Services might not be running for a few reasons:

  • they failed to start after bad settings were previously applied
  • they are starting after new settings are applied, but aren't yet started all the way

In a host container running at boot, @vignesh-goutham discovered the following race:

  • host container queries systemd for the status of kubelet
  • waits for it to finish activating (ActiveState=active and SubState=running)
  • issues apiclient set commands to reconfigure kubelet
  • apiserver executes restart commands
  • systemctl try-restart does nothing
  • systemd logs the first Started Kubelet around 1 second later

From this we can infer that two calls to apiclient set kubernetes.<blah> in quick succession will not always result in two kubelet restarts, leaving that service in an undefined state.

For changing settings at runtime, we really need something more like force-stop and force-start to ensure that the restart commands are fully enacted for each transaction.

Contributor guide