Hosting Operations
Build a Restart Loop Timeline
A service keeps restarting and you need to separate the first application failure from later supervisor retries.
Command
journalctl -u app-worker -b --no-pager -o short-iso | grep -E 'Started|Failed|Scheduled restart|Main process exited'
What changed
Nothing changes. The pipeline prints a compact timeline of start, failure, and restart-counter lines.
Danger
safe
When to use it
Use when Restart=on-failure is hiding the first useful failure under repeated retries.
When not to use it
Do not use this as the only diagnosis; read the adjacent app log lines around the first failure.
Undo or recovery
No undo needed because the command is read-only.
Expected output
Timestamped service lifecycle lines showing starts, main-process exits, failed results, and scheduled restarts.
demo script
Disposable terminal steps
journalctl -u app-worker -b --no-pager -o short-isojournalctl -u app-worker -b --no-pager -o short-iso | grep -E 'Started|Failed|Scheduled restart|Main process exited'
simulated output
What it looks like
::fixture-ready::
$ journalctl -u app-worker -b --no-pager -o short-iso
2026-06-25T14:20:58-05:00 vps systemd[1]: Started app-worker.service - Background job worker.
2026-06-25T14:20:58-05:00 vps worker[2081]: loading /etc/app/worker.env
2026-06-25T14:20:58-05:00 vps worker[2081]: ERROR redis connection refused at 127.0.0.1:6379
2026-06-25T14:20:59-05:00 vps systemd[1]: app-worker.service: Failed with result 'exit-code'.
2026-06-25T14:21:04-05:00 vps systemd[1]: app-worker.service: Scheduled restart job, restart counter is at 4.
2026-06-25T14:22:17-05:00 vps systemd[1]: Started app-worker.service - Background job worker.
2026-06-25T14:22:17-05:00 vps systemd[2144]: app-worker.service: Failed to determine user credentials: No such process
2026-06-25T14:22:17-05:00 vps systemd[2144]: app-worker.service: Failed at step USER spawning /srv/app/bin/worker: No such process
2026-06-25T14:22:17-05:00 vps systemd[1]: app-worker.service: Main process exited, code=exited, status=217/USER
2026-06-25T14:22:17-05:00 vps systemd[1]: app-worker.service: Failed with result 'exit-code'.
::exit-code::0
$ journalctl -u app-worker -b --no-pager -o short-iso | grep -E 'Started|Failed|Scheduled restart|Main process exited'
2026-06-25T14:20:58-05:00 vps systemd[1]: Started app-worker.service - Background job worker.
2026-06-25T14:20:59-05:00 vps systemd[1]: app-worker.service: Failed with result 'exit-code'.
2026-06-25T14:21:04-05:00 vps systemd[1]: app-worker.service: Scheduled restart job, restart counter is at 4.
2026-06-25T14:22:17-05:00 vps systemd[1]: Started app-worker.service - Background job worker.
2026-06-25T14:22:17-05:00 vps systemd[2144]: app-worker.service: Failed to determine user credentials: No such process
2026-06-25T14:22:17-05:00 vps systemd[2144]: app-worker.service: Failed at step USER spawning /srv/app/bin/worker: No such process
2026-06-25T14:22:17-05:00 vps systemd[1]: app-worker.service: Main process exited, code=exited, status=217/USER
2026-06-25T14:22:17-05:00 vps systemd[1]: app-worker.service: Failed with result 'exit-code'.
::exit-code::0
YouTube Short
Make the restart loop visible.
When systemd retries a service, line up the starts, exits, failures, and restart counters before blaming the latest line.
LinkedIn hook
Restart loops make more sense when you line up starts, failures, and counters.
Question: When a service is flapping, how do you find the first useful failure?
experiments
A/B tests to run
Metric: save_rate
A: Line up starts and failures.
B: Restart loops hide the first clue.