Back to lessons

Hosting Operations

Group Journal Errors by Unit

Recent journal errors mention several processes and you need to see which unit or source is producing most of them.

Command

journalctl -p err..alert --since "2 hours ago" --no-pager -o short-iso | awk '{split($3,a,"["); unit=a[1]; count[unit]++} END {for (u in count) print count[u], u}' | sort -nr

What changed

Nothing changes. The command groups severe journal entries by source field.

Danger

safe

When to use it

Use after a severity summary to decide which service log deserves attention first.

When not to use it

Do not assume the noisiest unit caused the incident; it may only be reporting downstream failure.

Undo or recovery

No undo needed because the command is read-only.

Expected output

Counts followed by unit or process names.

demo script

Disposable terminal steps

  1. journalctl -p err..alert --since "2 hours ago" --no-pager -o short-iso
  2. journalctl -p err..alert --since "2 hours ago" --no-pager -o short-iso | awk '{split($3,a,"["); unit=a[1]; count[unit]++} END {for (u in count) print count[u], u}' | sort -nr

simulated output

What it looks like

disposable vessel
::fixture-ready::
$ journalctl -p err..alert --since "2 hours ago" --no-pager -o short-iso
2026-06-25T14:03:08+00:00 vps api[1842]: err request_id=req-103 ERROR database timeout after 30000ms
2026-06-25T14:03:12+00:00 vps api[1842]: err request_id=req-103 ERROR retry failed upstream=db
2026-06-25T14:05:10+00:00 vps worker[2201]: crit FATAL job runner exited code=137
2026-06-25T14:06:33+00:00 vps api[1842]: err request_id=req-107 ERROR payment provider returned 500
::exit-code::0
$ journalctl -p err..alert --since "2 hours ago" --no-pager -o short-iso | awk '{split($3,a,"["); unit=a[1]; count[unit]++} END {for (u in count) print count[u], u}' | sort -nr
3 api
1 worker
::exit-code::0

YouTube Short

Find the noisiest unit.

Group severe journal lines by source. It quickly tells you whether the incident is centered on the app, worker, kernel, or supervisor.

LinkedIn hook

A noisy incident usually has a noisy source.

Question: What is your quickest way to find the noisiest service during an incident?

experiments

A/B tests to run

Metric: short_click_through_rate

A: Find the noisiest service.

B: Group errors before reading details.