To the Home Page

Restarting unhealthy Compose containers with a one-liner

Published on May 12, 2025 · Reading time: 3 minutes

Recently, I’ve run into an issue in one of my personal projects, with Django responding to all incoming requests with 400 (bad request) after a few days of uptime. That’s right, no error messages and no exception tracebacks at all, just a simple HTML message. Docker Compose was not aware of the issue because the main process hasn’t died, otherwise the restart policy would cause the container to be restarted.

Compose offers an optional healthcheck feature, but I didn’t use it until now because it didn’t really work. You can configure the check command and other parameters, but Compose is not an orchestrator and can’t restart containers on its own. There’s Swarm, but it wouldn’t work for this app because it needs to be run as root and have access to /dev directory of the host. You can’t build images and restart the current deployment with a single command. I haven’t found a way to show logs for all containers in the deployment at once. There were many more issues, but I forgot them by now.

I could use the popular willfarrell/autoheal image along with Compose, but I don’t feel comfortable giving access to the Docker socket to a container based on old Alpine OS1, mostly because there are other apps running on this machine.

Solution

Can I have a simple systemd service that can restart faulty containers? It doesn’t look that difficult to implement. Let’s start with this one-liner:

docker compose ps --format '{{ .Service }} {{ .Status }}' | grep -i unhealthy | cut -d " " -f 1 | xargs -r docker compose restart

Using standard Unix commands, this script will:

Of course, it needs to be run from a directory having a compose.yml file, and you can’t have services with a whitespace or the “unhealthy” string in their names.

This bash script has to be run in a loop:

#!/usr/bin/env bash

set -e

while true
do
  docker compose ps --format '{{ .Service }} {{ .Status }}' | grep -i unhealthy | cut -d " " -f 1 | xargs -r docker compose restart
  sleep 2
done

…have an executable bit:

chmod +x healthcheck.sh

…and work as a systemd service:

[Unit]
Description=Restart unhealthy services
After=docker.service
Requires=docker.service

[Service]
Type=simple
User=my_user_name
Group=docker
WorkingDirectory=/home/my_user_name/my_app_name
ExecStart=/home/my_user_name/my_app_name/healthcheck.sh
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

On Debian, this service file can be saved to /etc/systemd/system/my_app_name.service, and enabled like this:

systemctl daemon-reload
systemctl enable my_app_name.service
systemctl start  my_app_name.service

You can check logs of the service to make sure unhealthy services are actually restarted:

systemctl status my_app_name.service
● my_app_name.service - Restart unhealthy services
     Loaded: loaded (/etc/systemd/system/my_app_name.service; enabled; preset: enabled)
     Active: active (running) since Sun 2025-05-11 10:32:12 CEST; 5h 34min ago
   Main PID: 2077 (bash)
      Tasks: 2 (limit: 4757)
        CPU: 20min 31.831s
     CGroup: /system.slice/my_app_name.service
             ├─  2077 bash /home/my_user_name/my_app_name/healthcheck.sh
             └─489754 sleep 2

maj 11 10:32:12 localhost systemd[1]: Started my_app_name.service - Restart unhealthy services.
maj 11 10:32:13 localhost healthcheck.sh[46057]:  Container my_app_name-backend-1  Restarting
maj 11 10:32:13 localhost healthcheck.sh[46057]:  Container my_app_name-backend-1  Started
maj 11 10:33:46 localhost healthcheck.sh[51393]:  Container my_app_name-backend-1  Restarting
maj 11 10:33:47 localhost healthcheck.sh[51393]:  Container my_app_name-backend-1  Started

  1. Alpine 3.21 is the latest version as of May 2025, but 3.18 is being used ↩︎

Check out other blog posts: