Lecture 08: When disaster strikes and all else fails Hands-on Unix - - PowerPoint PPT Presentation

lecture 08 when disaster strikes and all else fails
SMART_READER_LITE
LIVE PREVIEW

Lecture 08: When disaster strikes and all else fails Hands-on Unix - - PowerPoint PPT Presentation

Lecture 08: When disaster strikes and all else fails Hands-on Unix system administration DeCal 2012-10-22 1 / 27 Projects groups of four people Projects Tools of the submit one form per group with trade Disasters proposed


slide-1
SLIDE 1

1 / 27

Lecture 08: When disaster strikes and all else fails

Hands-on Unix system administration DeCal

2012-10-22

slide-2
SLIDE 2

Projects

❖ Projects Tools of the trade Disasters Alleviating the pain 2 / 27

  • groups of four people
  • submit one form per group with

proposed project ideas and SSH public keys

  • we’ll be provisioning VMs and sending
  • ut an announcement
slide-3
SLIDE 3

Tools of the trade

❖ Projects Tools of the trade ❖ What’s up? ❖ What’s hosing? ❖ What’s in use? ❖ Too much traffic ❖ Too many files ❖ Low-level “files” ❖ Too many terminals ❖ sudo ❖ Other tools Disasters Alleviating the pain 3 / 27

slide-4
SLIDE 4

What’s up?

❖ Projects Tools of the trade ❖ What’s up? ❖ What’s hosing? ❖ What’s in use? ❖ Too much traffic ❖ Too many files ❖ Low-level “files” ❖ Too many terminals ❖ sudo ❖ Other tools Disasters Alleviating the pain 4 / 27

  • uptime: how long continuously

running, what’s the load average

1, 5, 15 min average number of processes waiting for CPU (or IO)

  • w, who: who’s logged in on machine

write: write to a logged-in user

wall: write to all logged-in users

slide-5
SLIDE 5

What’s hosing?

❖ Projects Tools of the trade ❖ What’s up? ❖ What’s hosing? ❖ What’s in use? ❖ Too much traffic ❖ Too many files ❖ Low-level “files” ❖ Too many terminals ❖ sudo ❖ Other tools Disasters Alleviating the pain 5 / 27

  • top, htop (Linux), ps (ps aux,

ps elf)

  • similarly iftop for network interface

bandwidth, iotop (Linux) for disk IO

slide-6
SLIDE 6

What’s in use?

❖ Projects Tools of the trade ❖ What’s up? ❖ What’s hosing? ❖ What’s in use? ❖ Too much traffic ❖ Too many files ❖ Low-level “files” ❖ Too many terminals ❖ sudo ❖ Other tools Disasters Alleviating the pain 6 / 27

“The action can’t be completed. . . in use” (Windows) “The operation can’t be completed. . . in use” (Mac OS X)

  • lsof for files
  • lsof -i for network ports
  • see also: netstat -pant, fuser
slide-7
SLIDE 7

Too much traffic

❖ Projects Tools of the trade ❖ What’s up? ❖ What’s hosing? ❖ What’s in use? ❖ Too much traffic ❖ Too many files ❖ Low-level “files” ❖ Too many terminals ❖ sudo ❖ Other tools Disasters Alleviating the pain 7 / 27

  • netcat: “pipe” over TCP/UDP
  • wireshark, tshark, tcpdump:

packet sniffer/analyzer

  • nmap: network scanner
slide-8
SLIDE 8

Too many files

❖ Projects Tools of the trade ❖ What’s up? ❖ What’s hosing? ❖ What’s in use? ❖ Too much traffic ❖ Too many files ❖ Low-level “files” ❖ Too many terminals ❖ sudo ❖ Other tools Disasters Alleviating the pain 8 / 27

  • du, df: directory, filesystem disk

space usage

  • scp (secure copy): transfer files over

SSH

  • rsync (remote sync): intelligently

transfer files (often over SSH)

  • tar (tape archiver): combine files

into a tarball

slide-9
SLIDE 9

Low-level “files”

❖ Projects Tools of the trade ❖ What’s up? ❖ What’s hosing? ❖ What’s in use? ❖ Too much traffic ❖ Too many files ❖ Low-level “files” ❖ Too many terminals ❖ sudo ❖ Other tools Disasters Alleviating the pain 9 / 27

  • fdisk, parted (Linux): edit

partition table

  • fsck: check filesystem for errors
  • dd: copy block devices
slide-10
SLIDE 10

Too many terminals

❖ Projects Tools of the trade ❖ What’s up? ❖ What’s hosing? ❖ What’s in use? ❖ Too much traffic ❖ Too many files ❖ Low-level “files” ❖ Too many terminals ❖ sudo ❖ Other tools Disasters Alleviating the pain 10 / 27

  • screen, tmux
  • “metaterminal”

access multiple terminal sessions inside a single terminal session

  • ther features: persistence (after

logging off), session sharing (between users)

slide-11
SLIDE 11

sudo

❖ Projects Tools of the trade ❖ What’s up? ❖ What’s hosing? ❖ What’s in use? ❖ Too much traffic ❖ Too many files ❖ Low-level “files” ❖ Too many terminals ❖ sudo ❖ Other tools Disasters Alleviating the pain 11 / 27

  • sudo: switch user do (usually used

to give your command root powers)

via xkcd.com

slide-12
SLIDE 12

Other tools

❖ Projects Tools of the trade ❖ What’s up? ❖ What’s hosing? ❖ What’s in use? ❖ Too much traffic ❖ Too many files ❖ Low-level “files” ❖ Too many terminals ❖ sudo ❖ Other tools Disasters Alleviating the pain 12 / 27

  • ldd (shared library dependencies),

truss or strace (trace system calls)

  • md5sum: file checksum
  • watch: execute command and

repeatedly show output

  • seq: print sequence of numbers
slide-13
SLIDE 13

Disasters

❖ Projects Tools of the trade Disasters ❖ Software meltdowns ❖ Hardware meltdowns ❖ Criminals on the loose ❖ Escalation of problems ❖ 2003 Northeast blackout ❖ 2003 Northeast blackout Alleviating the pain 13 / 27

slide-14
SLIDE 14

Software meltdowns

❖ Projects Tools of the trade Disasters ❖ Software meltdowns ❖ Hardware meltdowns ❖ Criminals on the loose ❖ Escalation of problems ❖ 2003 Northeast blackout ❖ 2003 Northeast blackout Alleviating the pain 14 / 27

  • system load (uptime command) too

damn high

  • remote access (networking, firewall,

SSH) broken

slide-15
SLIDE 15

Hardware meltdowns

❖ Projects Tools of the trade Disasters ❖ Software meltdowns ❖ Hardware meltdowns ❖ Criminals on the loose ❖ Escalation of problems ❖ 2003 Northeast blackout ❖ 2003 Northeast blackout Alleviating the pain 15 / 27

  • failed hard drives
  • failed fans, power supplies, CPU, RAM
slide-16
SLIDE 16

Criminals on the loose

❖ Projects Tools of the trade Disasters ❖ Software meltdowns ❖ Hardware meltdowns ❖ Criminals on the loose ❖ Escalation of problems ❖ 2003 Northeast blackout ❖ 2003 Northeast blackout Alleviating the pain 16 / 27

  • crackers will do Bad Things
  • compromised accounts
  • looks can be deceiving, uncertain what

to trust

slide-17
SLIDE 17

Escalation of problems

❖ Projects Tools of the trade Disasters ❖ Software meltdowns ❖ Hardware meltdowns ❖ Criminals on the loose ❖ Escalation of problems ❖ 2003 Northeast blackout ❖ 2003 Northeast blackout Alleviating the pain 17 / 27

  • we like to build systems on top of each
  • ther
  • if one thing fails, it may break other

things, causing other things to fail

slide-18
SLIDE 18

2003 Northeast blackout

18 / 27

August 13, 2003, 9:21pm EDT (via en.wikipedia.org)

slide-19
SLIDE 19

2003 Northeast blackout

19 / 27

August 14, 2003, 9:03pm EDT (via en.wikipedia.org)

slide-20
SLIDE 20

Alleviating the pain

❖ Projects Tools of the trade Disasters Alleviating the pain ❖ Be Prepared ❖ Power management ❖ Out-of-band management ❖ Redundancy ❖ Monitoring ❖ Security ❖ Backups 20 / 27

slide-21
SLIDE 21

Be Prepared

❖ Projects Tools of the trade Disasters Alleviating the pain ❖ Be Prepared ❖ Power management ❖ Out-of-band management ❖ Redundancy ❖ Monitoring ❖ Security ❖ Backups 21 / 27

  • Boy Scout motto
  • Murphy’s Law: “Anything that can go

wrong, will go wrong.”

  • s— happens
slide-22
SLIDE 22

Power management

❖ Projects Tools of the trade Disasters Alleviating the pain ❖ Be Prepared ❖ Power management ❖ Out-of-band management ❖ Redundancy ❖ Monitoring ❖ Security ❖ Backups 22 / 27

  • Uninterruptible Power Supply (UPS)
  • many UPSes can remotely power cycle

servers

slide-23
SLIDE 23

Out-of-band management

❖ Projects Tools of the trade Disasters Alleviating the pain ❖ Be Prepared ❖ Power management ❖ Out-of-band management ❖ Redundancy ❖ Monitoring ❖ Security ❖ Backups 23 / 27

  • separate hardware that can be

remotely accessed

  • independent from rest of hardware,

dedicated NIC

  • can access BIOS, power cycle, provide

visual display

  • e.g., IPMI, Dell DRAC, Sun LOM
slide-24
SLIDE 24

Redundancy

❖ Projects Tools of the trade Disasters Alleviating the pain ❖ Be Prepared ❖ Power management ❖ Out-of-band management ❖ Redundancy ❖ Monitoring ❖ Security ❖ Backups 24 / 27

  • dual redundant power supplies typical
  • RAID
  • failover servers for high availability
  • spare parts (hard drives!) for swapping
slide-25
SLIDE 25

Monitoring

❖ Projects Tools of the trade Disasters Alleviating the pain ❖ Be Prepared ❖ Power management ❖ Out-of-band management ❖ Redundancy ❖ Monitoring ❖ Security ❖ Backups 25 / 27

  • many large scale operations (Google,

Facebook) have many failed servers at any point in time, monitoring servers reroute traffic appropriately

  • monitor syslog
  • SNMP traps
  • alarm notification by email, text

message

slide-26
SLIDE 26

Security

❖ Projects Tools of the trade Disasters Alleviating the pain ❖ Be Prepared ❖ Power management ❖ Out-of-band management ❖ Redundancy ❖ Monitoring ❖ Security ❖ Backups 26 / 27

  • subscribe to OS security

announcements

  • Intrusion Detection Software (e.g.,

snort, bro)

  • be wary of lax permissions
  • limit root access
slide-27
SLIDE 27

Backups

❖ Projects Tools of the trade Disasters Alleviating the pain ❖ Be Prepared ❖ Power management ❖ Out-of-band management ❖ Redundancy ❖ Monitoring ❖ Security ❖ Backups 27 / 27

  • user data, system configuration
  • ideally daily, weekly, monthly rotations
  • RAID is not a backup
  • e.g., rsync, cron, rsnapshot