Difference between revisions of "Havarie-Plan"
(Created page with "= Havarie-Plan = == zuviele Zugriffe == <pre> var/run/openvpn .status auf undef prüfen, ob es zuviele sind cat *.status | grep UNDEF | wc -l (zählt die undef) wenn ja...") |
(→Havarie-Plan) |
||
| Line 1: | Line 1: | ||
= Havarie-Plan = | = Havarie-Plan = | ||
| + | <pre> | ||
| + | == Werkzeugkasten == | ||
| + | |||
| + | == Systemzustand & Ressourcen (CPU, RAM, Load)== | ||
| + | Grundlegend | ||
| + | |||
| + | top / htop – laufende Prozesse, CPU/RAM-Last | ||
| + | |||
| + | atop – sehr detailliert, inkl. I/O & Netzwerk | ||
| + | |||
| + | uptime – Load Average | ||
| + | |||
| + | free -h – Speicherbelegung | ||
| + | |||
| + | vmstat 1 – CPU-Wait, I/O, Speicher | ||
| + | |||
| + | watch -n1 free -h | ||
| + | |||
| + | Tiefergehend | ||
| + | |||
| + | pidstat – Prozessbezogene CPU/RAM/I/O-Statistik | ||
| + | |||
| + | mpstat – CPU-Auslastung pro Core | ||
| + | |||
| + | numactl, numastat – NUMA-Analyse (Server!) | ||
| + | |||
| + | 2️⃣ Storage & I/O-Probleme | ||
| + | Klassiker | ||
| + | |||
| + | df -hT – Dateisysteme & Typen | ||
| + | |||
| + | du -sh * – Speicherfresser | ||
| + | |||
| + | lsblk -f – Blockgeräte & Mounts | ||
| + | |||
| + | mount, findmnt | ||
| + | |||
| + | I/O-Analyse | ||
| + | |||
| + | iostat -xz 1 – Latenz & I/O-Wait (sehr wichtig) | ||
| + | |||
| + | iotop – Prozesse mit hoher Disk-Last | ||
| + | |||
| + | blktrace, blkparse – Low-Level (Expertenmodus) | ||
| + | |||
| + | Dateisysteme | ||
| + | |||
| + | fsck – Konsistenzprüfung | ||
| + | |||
| + | tune2fs, dumpe2fs | ||
| + | |||
| + | xfs_repair, xfs_growfs | ||
| + | |||
| + | 3️⃣ Netzwerk-Analyse & Connectivity | ||
| + | Basis | ||
| + | |||
| + | ip a, ip r, ip n | ||
| + | |||
| + | ss -tulpn – Ports & Services | ||
| + | |||
| + | ping, tracepath, traceroute | ||
| + | |||
| + | arp, ip neigh | ||
| + | |||
| + | Traffic & Debugging | ||
| + | |||
| + | tcpdump – unverzichtbar | ||
| + | |||
| + | termshark – TUI-Frontend für tcpdump | ||
| + | |||
| + | iftop – Live-Traffic pro Verbindung | ||
| + | |||
| + | nload – Gesamttraffic | ||
| + | |||
| + | ethtool, ethtool -k/-S | ||
| + | |||
| + | Erweiterte Tools | ||
| + | |||
| + | conntrack, conntrack-tools | ||
| + | |||
| + | tc – Traffic Control | ||
| + | |||
| + | mtr – Ping + Traceroute kombiniert | ||
| + | |||
| + | 4️⃣ Logs & Events (oft der Schlüssel 🔑) | ||
| + | Standard | ||
| + | |||
| + | journalctl -xe | ||
| + | |||
| + | journalctl -u <service> | ||
| + | |||
| + | dmesg -T | ||
| + | |||
| + | /var/log/syslog, /var/log/messages | ||
| + | |||
| + | Analyse & Suche | ||
| + | |||
| + | grep, egrep, rg (ripgrep) | ||
| + | |||
| + | awk, sed | ||
| + | |||
| + | less +F (Live-Follow) | ||
| + | |||
| + | Log-Probleme | ||
| + | |||
| + | logrotate -d | ||
| + | |||
| + | ausearch, auditctl (Auditd) | ||
| + | |||
| + | 5️⃣ Prozesse, Services & Abstürze | ||
| + | Systemd | ||
| + | |||
| + | systemctl status | ||
| + | |||
| + | systemctl list-units --failed | ||
| + | |||
| + | systemctl show <service> | ||
| + | |||
| + | systemd-analyze blame | ||
| + | |||
| + | systemd-analyze critical-chain | ||
| + | |||
| + | Debugging | ||
| + | |||
| + | strace -p <PID> | ||
| + | |||
| + | lsof -p <PID> / lsof -i | ||
| + | |||
| + | pstree -ap | ||
| + | |||
| + | coredumpctl | ||
| + | |||
| + | 6️⃣ Hardware & Kernel | ||
| + | Hardwareinfos | ||
| + | |||
| + | lscpu, lsmem | ||
| + | |||
| + | lsusb, lspci | ||
| + | |||
| + | dmidecode | ||
| + | |||
| + | free, numactl | ||
| + | |||
| + | Kernel & Treiber | ||
| + | |||
| + | uname -a | ||
| + | |||
| + | modprobe, lsmod | ||
| + | |||
| + | sysctl -a | ||
| + | |||
| + | /proc, /sys | ||
| + | |||
| + | 7️⃣ Sicherheit & Zugriffe | ||
| + | |||
| + | last, lastlog, who | ||
| + | |||
| + | faillog | ||
| + | |||
| + | getenforce, sestatus (SELinux) | ||
| + | |||
| + | ausearch, auditctl | ||
| + | |||
| + | iptables -L -nv / nft list ruleset | ||
| + | |||
| + | 8️⃣ Performance- & Spezialtools (optional, aber stark) | ||
| + | |||
| + | perf – Kernel/CPU-Profiling | ||
| + | |||
| + | bpftrace – moderne Live-Analyse | ||
| + | |||
| + | sysdig – Events & Container | ||
| + | |||
| + | dstat – Alles auf einmal | ||
| + | |||
| + | sar / sysstat – Historische Performance | ||
| + | |||
| + | 9️⃣ Container & Virtualisierung (falls relevant) | ||
| + | Docker | ||
| + | |||
| + | docker stats | ||
| + | |||
| + | docker inspect | ||
| + | |||
| + | docker logs | ||
| + | |||
| + | Kubernetes | ||
| + | |||
| + | kubectl describe | ||
| + | |||
| + | kubectl logs | ||
| + | |||
| + | kubectl top | ||
| + | |||
| + | Virtualisierung | ||
| + | |||
| + | virsh | ||
| + | |||
| + | virt-top | ||
| + | |||
| + | 🔟 Typische Fehlerfälle → Tool-Empfehlung | ||
| + | Problem Tools | ||
| + | Server „lahm“ top, vmstat, iostat, atop | ||
| + | Netzwerk spinnt ip, ss, tcpdump, mtr | ||
| + | Disk voll df, du, lsof +L1 | ||
| + | Service startet nicht systemctl, journalctl | ||
| + | Sporadische Hänger sar, perf, bpftrace | ||
| + | Kernel-Fehler dmesg, journalctl -k | ||
| + | </pre> | ||
| + | |||
| + | |||
== zuviele Zugriffe == | == zuviele Zugriffe == | ||
Revision as of 12:59, 15 January 2026
Havarie-Plan
== Werkzeugkasten == == Systemzustand & Ressourcen (CPU, RAM, Load)== Grundlegend top / htop – laufende Prozesse, CPU/RAM-Last atop – sehr detailliert, inkl. I/O & Netzwerk uptime – Load Average free -h – Speicherbelegung vmstat 1 – CPU-Wait, I/O, Speicher watch -n1 free -h Tiefergehend pidstat – Prozessbezogene CPU/RAM/I/O-Statistik mpstat – CPU-Auslastung pro Core numactl, numastat – NUMA-Analyse (Server!) 2️⃣ Storage & I/O-Probleme Klassiker df -hT – Dateisysteme & Typen du -sh * – Speicherfresser lsblk -f – Blockgeräte & Mounts mount, findmnt I/O-Analyse iostat -xz 1 – Latenz & I/O-Wait (sehr wichtig) iotop – Prozesse mit hoher Disk-Last blktrace, blkparse – Low-Level (Expertenmodus) Dateisysteme fsck – Konsistenzprüfung tune2fs, dumpe2fs xfs_repair, xfs_growfs 3️⃣ Netzwerk-Analyse & Connectivity Basis ip a, ip r, ip n ss -tulpn – Ports & Services ping, tracepath, traceroute arp, ip neigh Traffic & Debugging tcpdump – unverzichtbar termshark – TUI-Frontend für tcpdump iftop – Live-Traffic pro Verbindung nload – Gesamttraffic ethtool, ethtool -k/-S Erweiterte Tools conntrack, conntrack-tools tc – Traffic Control mtr – Ping + Traceroute kombiniert 4️⃣ Logs & Events (oft der Schlüssel 🔑) Standard journalctl -xe journalctl -u <service> dmesg -T /var/log/syslog, /var/log/messages Analyse & Suche grep, egrep, rg (ripgrep) awk, sed less +F (Live-Follow) Log-Probleme logrotate -d ausearch, auditctl (Auditd) 5️⃣ Prozesse, Services & Abstürze Systemd systemctl status systemctl list-units --failed systemctl show <service> systemd-analyze blame systemd-analyze critical-chain Debugging strace -p <PID> lsof -p <PID> / lsof -i pstree -ap coredumpctl 6️⃣ Hardware & Kernel Hardwareinfos lscpu, lsmem lsusb, lspci dmidecode free, numactl Kernel & Treiber uname -a modprobe, lsmod sysctl -a /proc, /sys 7️⃣ Sicherheit & Zugriffe last, lastlog, who faillog getenforce, sestatus (SELinux) ausearch, auditctl iptables -L -nv / nft list ruleset 8️⃣ Performance- & Spezialtools (optional, aber stark) perf – Kernel/CPU-Profiling bpftrace – moderne Live-Analyse sysdig – Events & Container dstat – Alles auf einmal sar / sysstat – Historische Performance 9️⃣ Container & Virtualisierung (falls relevant) Docker docker stats docker inspect docker logs Kubernetes kubectl describe kubectl logs kubectl top Virtualisierung virsh virt-top 🔟 Typische Fehlerfälle → Tool-Empfehlung Problem Tools Server „lahm“ top, vmstat, iostat, atop Netzwerk spinnt ip, ss, tcpdump, mtr Disk voll df, du, lsof +L1 Service startet nicht systemctl, journalctl Sporadische Hänger sar, perf, bpftrace Kernel-Fehler dmesg, journalctl -k
zuviele Zugriffe
var/run/openvpn .status auf undef prüfen, ob es zuviele sind cat *.status | grep UNDEF | wc -l (zählt die undef) wenn ja, kann in der Einstellung die Anmeldungen pro Zeiteinheit herunter gesetzt werden var/lcportal/persistent/dc/openvpn/server-tun0.conf connect-freq 10 2 ( 10 Anmeldungen pro 2 sek)
Systemauslastung
htop