Hosting Operations
Group Journal Errors by Unit
Recent journal errors mention several processes and you need to see which unit or source is producing most of them.
Command
journalctl -p err..alert --since "2 hours ago" --no-pager -o short-iso | awk '{split($3,a,"["); unit=a[1]; count[unit]++} END {for (u in count) print count[u], u}' | sort -nr
What changed
Nothing changes. The command groups severe journal entries by source field.
Danger
safe
When to use it
Use after a severity summary to decide which service log deserves attention first.
When not to use it
Do not assume the noisiest unit caused the incident; it may only be reporting downstream failure.
Undo or recovery
No undo needed because the command is read-only.
Expected output
Counts followed by unit or process names.
demo script
Disposable terminal steps
journalctl -p err..alert --since "2 hours ago" --no-pager -o short-isojournalctl -p err..alert --since "2 hours ago" --no-pager -o short-iso | awk '{split($3,a,"["); unit=a[1]; count[unit]++} END {for (u in count) print count[u], u}' | sort -nr
simulated output
What it looks like
::fixture-ready::
$ journalctl -p err..alert --since "2 hours ago" --no-pager -o short-iso
2026-06-25T14:03:08+00:00 vps api[1842]: err request_id=req-103 ERROR database timeout after 30000ms
2026-06-25T14:03:12+00:00 vps api[1842]: err request_id=req-103 ERROR retry failed upstream=db
2026-06-25T14:05:10+00:00 vps worker[2201]: crit FATAL job runner exited code=137
2026-06-25T14:06:33+00:00 vps api[1842]: err request_id=req-107 ERROR payment provider returned 500
::exit-code::0
$ journalctl -p err..alert --since "2 hours ago" --no-pager -o short-iso | awk '{split($3,a,"["); unit=a[1]; count[unit]++} END {for (u in count) print count[u], u}' | sort -nr
3 api
1 worker
::exit-code::0
YouTube Short
Find the noisiest unit.
Group severe journal lines by source. It quickly tells you whether the incident is centered on the app, worker, kernel, or supervisor.
LinkedIn hook
A noisy incident usually has a noisy source.
Question: What is your quickest way to find the noisiest service during an incident?
experiments
A/B tests to run
Metric: short_click_through_rate
A: Find the noisiest service.
B: Group errors before reading details.