Find the Files Eating Your Disk
The disk was full, but guessing at folders was the slow part.
find /var -type f -printf '%s %p\n' | sort -nr | head -20
problem area
The commands to inspect a machine before guessing.
48 checked fixes
The disk was full, but guessing at folders was the slow part.
find /var -type f -printf '%s %p\n' | sort -nr | head -20
The app was failing now. Opening a giant log file was the wrong move.
tail -n 80 -f /var/log/nginx/error.log
The error was in the log. The problem was finding it without reading noise.
grep -iE 'error|failed|denied|timeout' /var/log/nginx/error.log | tail -40
The error was there. The useful part was knowing exactly where it was.
grep -inE 'error|failed|denied|timeout' /var/log/nginx/error.log
The disk was full. The fastest clue was the folder, not the file.
du -sh /var/* 2>/dev/null | sort -h
The log had old failures too. I only cared about the newest ones.
grep -iE 'error|failed|denied|timeout' /var/log/nginx/error.log | tail -10
The file existed. The owner and mode explained why it still failed.
stat -c '%A %U:%G %n' /var/www/example/index.html
The server felt slow. Memory pressure was the first thing to rule out.
ps -eo pid,comm,%mem,%cpu --sort=-%mem | head
Byte counts are precise. Human units are faster under pressure.
find /var -type f -printf '%s %p\n' | sort -nr | head -10 | awk '{printf "%.1f MB %s\n", $1/1024/1024, $2}'
You can inspect an archive without extracting it.
tar -tf archives/site-backup.tar | sort | head
A quick extension count can show whether expected content made it into the source tree.
find source -type f -printf '%f\n' | sed -n 's/.*\.//p' | sort | uniq -c | sort -nr
Before package triage, prove what OS family and release you are actually on.
. /etc/os-release && printf '%s %s %s\n' "$ID" "$VERSION_ID" "$VERSION_CODENAME"
The distro version and kernel version answer different questions.
printf 'kernel=%s arch=%s distro=%s\n' "$(uname -r)" "$(uname -m)" "$(lsb_release -ds)"
A package inventory beats memory when a server is drifting.
dpkg-query -W -f='${Package}\t${Version}\t${Architecture}\n' | sort
Before you upgrade anything, list what would move.
apt list --upgradable
apt policy explains where the next version would come from.
apt policy nginx
For one package, dpkg-query gives a clean status line.
dpkg-query -W -f='${Status} ${Version}\n' openssl
That binary came from somewhere. dpkg can tell you where.
dpkg-query -S /usr/sbin/nginx
Not every package row is cleanly installed.
dpkg-query -W -f='${db:Status-Abbrev}\t${Package}\n' | awk '$1 !~ /^ii$/'
Disk cleanup starts with evidence, not random package removal.
dpkg-query -W -f='${Installed-Size}\t${Package}\n' | sort -nr | head -20
One unexpected architecture can explain confusing dependency output.
dpkg-query -W -f='${Architecture}\t${Package}\n' | awk '$1 != "amd64" && $1 != "all"'
Huge logs often point to loops, noisy tests, or runaway debug output.
find logs/ -type f -printf '%s %p\n' | sort -nr | head -10
Turn noisy test logs into a ranked failure list.
grep -RhoE '[A-Za-z0-9_./-]+\.(test|spec)\.(js|ts|py|rb)' logs/ | sort | uniq -c | sort -nr | head
The first error often explains more than the last one.
awk '{buf[NR%5]=$0} tolower($0) ~ /(error|exception|fatal)/ {for (i=NR-4;i<=NR;i++) if (i>0) print buf[i%5]; exit}' fixtures/incidents/app.log
Exit code 137 often means the kernel has something to say.
journalctl -k --since "2 hours ago" --no-pager -o short-iso | grep -Ei 'out of memory|oom|killed process'
The file mode can look fine while a parent directory blocks the whole path.
namei -l fixtures/perm-audit/current/app/config/prod.token
Linux memory numbers look scary until you know which column matters.
free -h
A high load number is a clue, not a diagnosis.
uptime
Cron problems often hide behind comments, blank lines, and copied folklore.
crontab -l | sed -n '/^[[:space:]]*#/d;/^[[:space:]]*$/d;p'
Cron is easier to debug when the schedule and command stop blending together.
crontab -l | awk 'NF && $1 !~ /^#/ {printf "%-16s %s\n", $1" "$2" "$3" "$4" "$5, substr($0,index($0,$6))}'
A timer is only half the scheduled job. The service is the payload.
systemctl list-timers --all --no-pager --plain | awk 'NR==1 || /\.timer/ {print $(NF-1), "->", $NF}'
Before querying a database file, see what tables are actually inside it.
sqlite3 app.db ".tables"
Before comparing sitemap coverage, print the URLs plainly.
grep -o '[^<]* ' public/sitemap.xml | sed 's###;s# ##'
One command tells you which services systemd already knows are broken.
systemctl --failed --no-pager
Make systemctl status safe for scripts, screenshots, and quick incident notes.
systemctl status nginx --no-pager --lines=30
Ignore stale logs and inspect only what happened since this boot.
journalctl -u nginx -b --no-pager -n 80
Before deleting random logs, ask journald how much disk it owns.
journalctl --disk-usage
Find which units made your VPS boot slowly.
systemd-analyze blame | head -20
Running now does not mean it will survive the next reboot.
systemctl is-enabled nginx
Get a clean yes-or-no service state without the full status page.
systemctl is-active nginx
Confirm whether the server actually rebooted and when.
last -x reboot | head -5
See whether memory is actually tight before restarting services.
free -h
Cron is not the only scheduler on modern Linux servers.
systemctl list-timers --all --no-pager
The status page often tells you the failed startup step before you open every log.
systemctl status app-worker --no-pager --lines=50
Turn a noisy service failure into four fields you can paste into an incident note.
systemctl show app-worker --property=Result,ExecMainCode,ExecMainStatus,NRestarts --no-pager
The bug may be in an override file, not the main unit.
systemctl cat app-worker
Clear the red failed state only after you have captured the evidence.
systemctl reset-failed app-worker
Put the failed step next to the unit config that created it.
systemctl status app-worker --no-pager --lines=50 && systemctl cat app-worker