Your Site Is Not Down. DNS Might Be Lying.
The browser said the site was gone. The server was answering fine.
curl --resolve example.com:443:203.0.113.10 https://example.com/
problem area
Small checks that separate DNS, TLS, Nginx, and app failures.
21 checked fixes
The browser said the site was gone. The server was answering fine.
curl --resolve example.com:443:203.0.113.10 https://example.com/
The app was running. The port was not listening.
ss -tulpn | grep ':80\|:443'
One glance tells you which release directory production is pointing at.
readlink -f releases/current && ls -ld releases/current
A deploy is not done until the endpoint answers.
curl -fsS -o /dev/null -w '%{http_code} %{time_total}s\n' https://example.com/health
Disk pressure during deploys often starts in old release directories.
du -sh releases/* 2>/dev/null | sort -h | tail -10
One resolver can still have the old edge IP while another has the new one.
for r in 1.1.1.1 8.8.8.8 9.9.9.9; do printf '%s ' "$r"; dig @"$r" +short edge.test A; done
The recursive resolver was not the problem. One nameserver disagreed.
for ns in $(dig +short NS edge.test); do printf '%s ' "$ns"; dig @"$ns" +short edge.test A; done
The fix was correct. The TTL explained why users still saw the old edge.
dig +noall +answer edge.test A
The apex was right. The www name pointed through a different path.
dig +short www.edge.test CNAME
IPv4 worked. IPv6 sent users to a different edge.
printf 'A '; dig +short edge.test A; printf 'AAAA '; dig +short edge.test AAAA
The certificate request failed because DNS allowed the wrong issuer.
dig +short edge.test CAA
The outage was not the web server. The edge certificate had expired.
openssl s_client -connect edge.test:443 -servername edge.test /dev/null | openssl x509 -noout -dates
The cert was valid, but not for this hostname.
openssl s_client -connect edge.test:443 -servername edge.test /dev/null | openssl x509 -noout -subject -ext subjectAltName
The IP was right. The SNI name selected the wrong certificate.
openssl s_client -connect 203.0.113.10:443 -servername wrong.edge.test /dev/null | openssl x509 -noout -subject -ext subjectAltName
The certificate was fine. The TLS negotiation told the rest of the story.
openssl s_client -connect edge.test:443 -servername edge.test /dev/null | awk '/Protocol|Cipher|Verify return code/ {print}'
Restart loops hide in plain sight unless you filter for them.
docker ps -a --filter status=restarting --format 'table {{.Names}}\t{{.Status}}\t{{.Image}}'
Skip the million-line log scroll and read only the recent failure window.
docker logs --since 10m --tail 100 api
When a service is unreachable, confirm Docker is publishing the port you think it is.
docker port web
A full disk can break logins, uploads, databases, and deploys.
df -h
Once you know a filesystem is full, the next question is where.
du -xh --max-depth=1 /var 2>/dev/null | sort -h
A broken internal link is easiest to catch before it becomes a 404.
grep -Rho --include='*.html' 'href="/[^"]*"' public | sed 's#href="##;s#"##' | while read -r path; do test -e "public${path}" || echo "$path"; done | sort -u