Hosting Operations
Count App Errors by Minute
You need to see when severe application log lines clustered during an incident.
Command
awk 'tolower($0) ~ /(error|fatal|timeout|exception)/ {minute=substr($1,1,16); count[minute]++} END {for (m in count) print count[m], m}' fixtures/incidents/app.log | sort -nr
What changed
Nothing changes. The command counts severe application lines by minute.
Danger
safe
When to use it
Use when you need to compare an error spike with deploys, restarts, or external alerts.
When not to use it
Do not use it as a metrics replacement; it is a quick log-derived approximation.
Undo or recovery
No undo needed because the command is read-only.
Expected output
Counts followed by minute timestamps.
demo script
Disposable terminal steps
cat fixtures/incidents/app.logawk 'tolower($0) ~ /(error|fatal|timeout|exception)/ {minute=substr($1,1,16); count[minute]++} END {for (m in count) print count[m], m}' fixtures/incidents/app.log | sort -nr
simulated output
What it looks like
::fixture-ready::
$ cat fixtures/incidents/app.log
2026-06-25T14:00:01Z level=INFO service=api request_id=req-100 msg=started release=2026.06.25.1
2026-06-25T14:01:14Z level=INFO service=worker request_id=req-101 msg=queue_depth value=18
2026-06-25T14:02:06Z level=WARN service=api request_id=req-102 msg=upstream_slow upstream=db latency_ms=2200
2026-06-25T14:03:08Z level=ERROR service=api request_id=req-103 msg=database_timeout timeout_ms=30000
2026-06-25T14:03:12Z level=ERROR service=api request_id=req-103 msg=retry_failed upstream=db
2026-06-25T14:04:44Z level=INFO service=deploy request_id=req-104 msg=release_switch release=2026.06.25.2
2026-06-25T14:05:10Z level=FATAL service=worker request_id=req-105 msg=job_runner_exit code=137
2026-06-25T14:05:12Z level=INFO service=system request_id=req-106 msg=worker_restarted
2026-06-25T14:06:33Z level=ERROR service=api request_id=req-107 msg=payment_provider_500 provider=demo-pay
2026-06-25T14:07:01Z level=WARN service=api request_id=req-108 msg=token=demoTOKEN123 should_be_redacted
::exit-code::0
$ awk 'tolower($0) ~ /(error|fatal|timeout|exception)/ {minute=substr($1,1,16); count[minute]++} END {for (m in count) print count[m], m}' fixtures/incidents/app.log | sort -nr
2 2026-06-25T14:03
1 2026-06-25T14:06
1 2026-06-25T14:05
::exit-code::0
YouTube Short
Bucket errors by minute.
Count severe app lines by minute. It tells you whether the incident was one sharp spike or a continuing failure.
LinkedIn hook
A minute-by-minute count shows whether an incident is a spike or a drip.
Question: When logs show errors, do you bucket them by time before reading each line?
experiments
A/B tests to run
Metric: shorts_3_second_hold_rate
A: Spike or drip?
B: Bucket errors by minute.