1 / 27
Lecture 08: When disaster strikes and all else fails Hands-on Unix - - PowerPoint PPT Presentation
Lecture 08: When disaster strikes and all else fails Hands-on Unix - - PowerPoint PPT Presentation
Lecture 08: When disaster strikes and all else fails Hands-on Unix system administration DeCal 2012-10-22 1 / 27 Projects groups of four people Projects Tools of the submit one form per group with trade Disasters proposed
Projects
❖ Projects Tools of the trade Disasters Alleviating the pain 2 / 27
- groups of four people
- submit one form per group with
proposed project ideas and SSH public keys
- we’ll be provisioning VMs and sending
- ut an announcement
Tools of the trade
❖ Projects Tools of the trade ❖ What’s up? ❖ What’s hosing? ❖ What’s in use? ❖ Too much traffic ❖ Too many files ❖ Low-level “files” ❖ Too many terminals ❖ sudo ❖ Other tools Disasters Alleviating the pain 3 / 27
What’s up?
❖ Projects Tools of the trade ❖ What’s up? ❖ What’s hosing? ❖ What’s in use? ❖ Too much traffic ❖ Too many files ❖ Low-level “files” ❖ Too many terminals ❖ sudo ❖ Other tools Disasters Alleviating the pain 4 / 27
- uptime: how long continuously
running, what’s the load average
✦
1, 5, 15 min average number of processes waiting for CPU (or IO)
- w, who: who’s logged in on machine
✦
write: write to a logged-in user
✦
wall: write to all logged-in users
What’s hosing?
❖ Projects Tools of the trade ❖ What’s up? ❖ What’s hosing? ❖ What’s in use? ❖ Too much traffic ❖ Too many files ❖ Low-level “files” ❖ Too many terminals ❖ sudo ❖ Other tools Disasters Alleviating the pain 5 / 27
- top, htop (Linux), ps (ps aux,
ps elf)
- similarly iftop for network interface
bandwidth, iotop (Linux) for disk IO
What’s in use?
❖ Projects Tools of the trade ❖ What’s up? ❖ What’s hosing? ❖ What’s in use? ❖ Too much traffic ❖ Too many files ❖ Low-level “files” ❖ Too many terminals ❖ sudo ❖ Other tools Disasters Alleviating the pain 6 / 27
“The action can’t be completed. . . in use” (Windows) “The operation can’t be completed. . . in use” (Mac OS X)
- lsof for files
- lsof -i for network ports
- see also: netstat -pant, fuser
Too much traffic
❖ Projects Tools of the trade ❖ What’s up? ❖ What’s hosing? ❖ What’s in use? ❖ Too much traffic ❖ Too many files ❖ Low-level “files” ❖ Too many terminals ❖ sudo ❖ Other tools Disasters Alleviating the pain 7 / 27
- netcat: “pipe” over TCP/UDP
- wireshark, tshark, tcpdump:
packet sniffer/analyzer
- nmap: network scanner
Too many files
❖ Projects Tools of the trade ❖ What’s up? ❖ What’s hosing? ❖ What’s in use? ❖ Too much traffic ❖ Too many files ❖ Low-level “files” ❖ Too many terminals ❖ sudo ❖ Other tools Disasters Alleviating the pain 8 / 27
- du, df: directory, filesystem disk
space usage
- scp (secure copy): transfer files over
SSH
- rsync (remote sync): intelligently
transfer files (often over SSH)
- tar (tape archiver): combine files
into a tarball
Low-level “files”
❖ Projects Tools of the trade ❖ What’s up? ❖ What’s hosing? ❖ What’s in use? ❖ Too much traffic ❖ Too many files ❖ Low-level “files” ❖ Too many terminals ❖ sudo ❖ Other tools Disasters Alleviating the pain 9 / 27
- fdisk, parted (Linux): edit
partition table
- fsck: check filesystem for errors
- dd: copy block devices
Too many terminals
❖ Projects Tools of the trade ❖ What’s up? ❖ What’s hosing? ❖ What’s in use? ❖ Too much traffic ❖ Too many files ❖ Low-level “files” ❖ Too many terminals ❖ sudo ❖ Other tools Disasters Alleviating the pain 10 / 27
- screen, tmux
- “metaterminal”
✦
access multiple terminal sessions inside a single terminal session
- ther features: persistence (after
logging off), session sharing (between users)
sudo
❖ Projects Tools of the trade ❖ What’s up? ❖ What’s hosing? ❖ What’s in use? ❖ Too much traffic ❖ Too many files ❖ Low-level “files” ❖ Too many terminals ❖ sudo ❖ Other tools Disasters Alleviating the pain 11 / 27
- sudo: switch user do (usually used
to give your command root powers)
via xkcd.com
Other tools
❖ Projects Tools of the trade ❖ What’s up? ❖ What’s hosing? ❖ What’s in use? ❖ Too much traffic ❖ Too many files ❖ Low-level “files” ❖ Too many terminals ❖ sudo ❖ Other tools Disasters Alleviating the pain 12 / 27
- ldd (shared library dependencies),
truss or strace (trace system calls)
- md5sum: file checksum
- watch: execute command and
repeatedly show output
- seq: print sequence of numbers
Disasters
❖ Projects Tools of the trade Disasters ❖ Software meltdowns ❖ Hardware meltdowns ❖ Criminals on the loose ❖ Escalation of problems ❖ 2003 Northeast blackout ❖ 2003 Northeast blackout Alleviating the pain 13 / 27
Software meltdowns
❖ Projects Tools of the trade Disasters ❖ Software meltdowns ❖ Hardware meltdowns ❖ Criminals on the loose ❖ Escalation of problems ❖ 2003 Northeast blackout ❖ 2003 Northeast blackout Alleviating the pain 14 / 27
- system load (uptime command) too
damn high
- remote access (networking, firewall,
SSH) broken
Hardware meltdowns
❖ Projects Tools of the trade Disasters ❖ Software meltdowns ❖ Hardware meltdowns ❖ Criminals on the loose ❖ Escalation of problems ❖ 2003 Northeast blackout ❖ 2003 Northeast blackout Alleviating the pain 15 / 27
- failed hard drives
- failed fans, power supplies, CPU, RAM
Criminals on the loose
❖ Projects Tools of the trade Disasters ❖ Software meltdowns ❖ Hardware meltdowns ❖ Criminals on the loose ❖ Escalation of problems ❖ 2003 Northeast blackout ❖ 2003 Northeast blackout Alleviating the pain 16 / 27
- crackers will do Bad Things
- compromised accounts
- looks can be deceiving, uncertain what
to trust
Escalation of problems
❖ Projects Tools of the trade Disasters ❖ Software meltdowns ❖ Hardware meltdowns ❖ Criminals on the loose ❖ Escalation of problems ❖ 2003 Northeast blackout ❖ 2003 Northeast blackout Alleviating the pain 17 / 27
- we like to build systems on top of each
- ther
- if one thing fails, it may break other
things, causing other things to fail
2003 Northeast blackout
18 / 27
August 13, 2003, 9:21pm EDT (via en.wikipedia.org)
2003 Northeast blackout
19 / 27
August 14, 2003, 9:03pm EDT (via en.wikipedia.org)
Alleviating the pain
❖ Projects Tools of the trade Disasters Alleviating the pain ❖ Be Prepared ❖ Power management ❖ Out-of-band management ❖ Redundancy ❖ Monitoring ❖ Security ❖ Backups 20 / 27
Be Prepared
❖ Projects Tools of the trade Disasters Alleviating the pain ❖ Be Prepared ❖ Power management ❖ Out-of-band management ❖ Redundancy ❖ Monitoring ❖ Security ❖ Backups 21 / 27
- Boy Scout motto
- Murphy’s Law: “Anything that can go
wrong, will go wrong.”
- s— happens
Power management
❖ Projects Tools of the trade Disasters Alleviating the pain ❖ Be Prepared ❖ Power management ❖ Out-of-band management ❖ Redundancy ❖ Monitoring ❖ Security ❖ Backups 22 / 27
- Uninterruptible Power Supply (UPS)
- many UPSes can remotely power cycle
servers
Out-of-band management
❖ Projects Tools of the trade Disasters Alleviating the pain ❖ Be Prepared ❖ Power management ❖ Out-of-band management ❖ Redundancy ❖ Monitoring ❖ Security ❖ Backups 23 / 27
- separate hardware that can be
remotely accessed
- independent from rest of hardware,
dedicated NIC
- can access BIOS, power cycle, provide
visual display
- e.g., IPMI, Dell DRAC, Sun LOM
Redundancy
❖ Projects Tools of the trade Disasters Alleviating the pain ❖ Be Prepared ❖ Power management ❖ Out-of-band management ❖ Redundancy ❖ Monitoring ❖ Security ❖ Backups 24 / 27
- dual redundant power supplies typical
- RAID
- failover servers for high availability
- spare parts (hard drives!) for swapping
Monitoring
❖ Projects Tools of the trade Disasters Alleviating the pain ❖ Be Prepared ❖ Power management ❖ Out-of-band management ❖ Redundancy ❖ Monitoring ❖ Security ❖ Backups 25 / 27
- many large scale operations (Google,
Facebook) have many failed servers at any point in time, monitoring servers reroute traffic appropriately
- monitor syslog
- SNMP traps
- alarm notification by email, text
message
Security
❖ Projects Tools of the trade Disasters Alleviating the pain ❖ Be Prepared ❖ Power management ❖ Out-of-band management ❖ Redundancy ❖ Monitoring ❖ Security ❖ Backups 26 / 27
- subscribe to OS security
announcements
- Intrusion Detection Software (e.g.,
snort, bro)
- be wary of lax permissions
- limit root access
Backups
❖ Projects Tools of the trade Disasters Alleviating the pain ❖ Be Prepared ❖ Power management ❖ Out-of-band management ❖ Redundancy ❖ Monitoring ❖ Security ❖ Backups 27 / 27
- user data, system configuration
- ideally daily, weekly, monthly rotations
- RAID is not a backup
- e.g., rsync, cron, rsnapshot