Intrusion Recovery using Selective Re-execution Taesoo Kim, Xi - - PowerPoint PPT Presentation

intrusion recovery using selective re execution
SMART_READER_LITE
LIVE PREVIEW

Intrusion Recovery using Selective Re-execution Taesoo Kim, Xi - - PowerPoint PPT Presentation

Intrusion Recovery using Selective Re-execution Taesoo Kim, Xi Wang, Nickolai Zeldovich , M. Frans Kaashoek MIT CSAIL Attackers routinely compromise system integrity Attackers routinely compromise system integrity Attackers routinely


slide-1
SLIDE 1

Intrusion Recovery using Selective Re-execution

Taesoo Kim, Xi Wang, Nickolai Zeldovich, M. Frans Kaashoek MIT CSAIL

slide-2
SLIDE 2

Attackers routinely compromise system integrity

slide-3
SLIDE 3

Attackers routinely compromise system integrity

slide-4
SLIDE 4

Attackers routinely compromise system integrity

slide-5
SLIDE 5

Compromises inevitable

  • Difficult to write bug-free software
  • Administrators mis-configure policies
  • Users choose weak, guessable passwords
slide-6
SLIDE 6

Compromises inevitable

  • Difficult to write bug-free software
  • Administrators mis-configure policies
  • Users choose weak, guessable passwords
  • Need both “proactive” security,

and “reactive” recovery mechanisms

slide-7
SLIDE 7

Limited existing recovery tools

  • Anti-virus tools
  • Only repair for predictable attacks
  • Backup tools
  • Restoring from backup discards all changes
slide-8
SLIDE 8

Limited existing recovery tools

  • Anti-virus tools
  • Only repair for predictable attacks
  • Backup tools
  • Restoring from backup discards all changes
  • Administrators spend days or weeks manually

tracking down all effects of the attack

  • No guarantee if they found everything
slide-9
SLIDE 9

Challenge: disentangle changes by attacker and legitimate user

  • Adversary could have modified many files directly
  • Legitimate processes may have been affected
  • Users ran trojaned pdflatex or ls
  • SSH server read a modified /etc/passwd
  • Those processes are now suspect as well
slide-10
SLIDE 10

Our approach: help users disentangle on one machine

  • Record history of all computations on machine
  • After intrusion found, roll back affected objects
  • Re-execute actions that were indirectly affected
  • Minimize user input required to disentangle
  • User edited attacker's file with emacs
  • External effects outside of our control
slide-11
SLIDE 11

Contributions

  • New approach to system-wide intrusion recovery
  • Action history graph tracks computations and repairs
  • Techniques: re-execution, predicates, and refinement
  • Retro: prototype recovery system for Linux
  • Recovers from 10 real-world and synthetic attacks
  • No user input required in most cases
slide-12
SLIDE 12

Contributions

  • New approach to system-wide intrusion recovery
  • Action history graph tracks computations and repairs
  • Techniques: re-execution, predicates, and refinement
  • Retro: prototype recovery system for Linux
  • Recovers from 10 real-world and synthetic attacks
  • No user input required in most cases
  • Instead of spending days on manual recovery,

admin can use Retro to automatically recover, and ensure that all effects of attack are caught

slide-13
SLIDE 13

Example attack scenario

  • Attacker not targeting Alice, wants to run botnet
  • Attacker modifies /etc/passwd to add new account
  • Installs trojan pdflatex, ls to restart, hide botnet
  • Alice logs in via SSH
  • SSH server reads /etc/passwd
  • Alice runs trojaned pdflatex, ls
  • Admin modifies /etc/passwd

to add account for Alice

slide-14
SLIDE 14

Strawman 1: Taint tracking

… Attacker process passwd file pdflatex binary botnet.c adduser alice Alice's login LaTeX process Alice's shell Admin's shell Alice's paper Alice's files Alice's PDF file

slide-15
SLIDE 15

Strawman 1: Taint tracking

… Attacker process passwd file pdflatex binary botnet.c adduser alice Alice's login LaTeX process Alice's shell Admin's shell Alice's paper Alice's files Alice's PDF file

  • Log all OS-level dependencies in system
slide-16
SLIDE 16

Strawman 1: Taint tracking

… Attacker process passwd file pdflatex binary botnet.c adduser alice Alice's login LaTeX process Alice's shell Admin's shell Alice's paper Alice's files Alice's PDF file

  • Given attack, track down all affected files, and

restore just those files from backup

Attack

slide-17
SLIDE 17

Strawman 1: Taint tracking

… Attacker process passwd file pdflatex binary botnet.c adduser alice Alice's login LaTeX process Alice's shell Admin's shell Alice's paper Alice's files Alice's PDF file

  • Given attack, track down all affected files, and

restore just those files from backup

Attack

slide-18
SLIDE 18

Problem with taint tracking: false positives

… Attacker process passwd file pdflatex binary botnet.c adduser alice Alice's login LaTeX process Alice's shell Admin's shell Alice's paper Alice's files Alice's PDF file

  • Taint tracking conservatively propagates

everywhere through shared files

Attack

slide-19
SLIDE 19

Problem with taint tracking: false positives

… Attacker process passwd file pdflatex binary botnet.c adduser alice Alice's login LaTeX process Alice's shell Admin's shell Alice's paper Alice's files Alice's PDF file

  • Taint tracking conservatively propagates

everywhere through shared files

Attack

Alice's account and files are lost!

slide-20
SLIDE 20

Strawman 2: VM

Time

Virtual machine

slide-21
SLIDE 21

Strawman 2: VM

Time

Virtual machine Inputs Outputs

slide-22
SLIDE 22

Periodic VM checkpoints

Time

Virtual machine Inputs Outputs

slide-23
SLIDE 23

Step 1: identify attack input

Time

Virtual machine Inputs Outputs

Attack input

slide-24
SLIDE 24

Step 2: roll back to checkpoint

Time

Virtual machine Inputs Outputs

Attack input

slide-25
SLIDE 25

Step 3: replay non-attack inputs

Time

Virtual machine Inputs Outputs

Attack input

X

slide-26
SLIDE 26

Problem with VM strawman: re-execution is expensive, diverges

Time

Inputs Outputs

Attack input

X

  • May take one week to re-execute for a week-old attack
  • Original VM inputs may be meaningless for new system
  • Non-determinism: new SSH crypto keys, inode #s, app state, …
  • Can't do deterministic re-execution, since some inputs changed
slide-27
SLIDE 27

Retro's approach: selective re-execution

  • Record fine-grained action history graph
  • Includes system call arguments, function calls, …
  • Assume tamper-proof kernel, storage
  • Roll back objects directly affected by attack
  • Avoid the false positives of taint tracking
  • Re-execute actions indirectly affected by attack
  • Avoid expense, non-determinism of whole-VM re-exec.
slide-28
SLIDE 28

Action history graph: Objects represent files, processes

Time

attacker's process password file adduser alice admin's shell

slide-29
SLIDE 29

Action history graph: Actions represent execution

Time

attacker's process password file adduser alice admin's shell

slide-30
SLIDE 30

w r i t e (

  • f

f s e t , d a t a )

Action history graph: Actions have dependencies

Time

attacker's process password file adduser alice admin's shell

slide-31
SLIDE 31

w r i t e (

  • f

f s e t , d a t a ) e x e c ( p r

  • g

, a r g s , . . )

Action history graph: Actions have dependencies

Time

attacker's process password file adduser alice admin's shell

slide-32
SLIDE 32

w r i t e (

  • f

f s e t , d a t a ) w r i t e (

  • f

f s e t , d a t a ) e x e c ( p r

  • g

, a r g s , . . ) r e a d (

  • f

f s e t , d a t a )

Action history graph: Actions have dependencies

Time

attacker's process password file adduser alice admin's shell

slide-33
SLIDE 33

w r i t e (

  • f

f s e t , d a t a ) e x e c ( p r

  • g

, a r g s , . . ) r e a d (

  • f

f s e t , d a t a ) e x i t ( s t a t u s ) w r i t e (

  • f

f s e t , d a t a )

Action history graph: Actions have dependencies

Time

attacker's process password file adduser alice admin's shell

slide-34
SLIDE 34

w r i t e (

  • f

f s e t , d a t a ) e x e c ( p r

  • g

, a r g s , . . ) r e a d (

  • f

f s e t , d a t a ) e x i t ( s t a t u s ) w r i t e (

  • f

f s e t , d a t a )

Action history graph: Objects have checkpoints

Time

attacker's process password file adduser alice admin's shell

slide-35
SLIDE 35

w r i t e (

  • f

f s e t , d a t a ) e x e c ( p r

  • g

, a r g s , . . ) r e a d (

  • f

f s e t , d a t a ) e x i t ( s t a t u s ) w r i t e (

  • f

f s e t , d a t a )

Step 1: find attack action

Time

attacker's process password file adduser alice admin's shell

slide-36
SLIDE 36

w r i t e (

  • f

f s e t , d a t a ) e x e c ( p r

  • g

, a r g s , . . ) r e a d (

  • f

f s e t , d a t a ) e x i t ( s t a t u s ) w r i t e (

  • f

f s e t , d a t a )

Step 2: roll back affected objects

Time

attacker's process password file adduser alice admin's shell

slide-37
SLIDE 37

r e a d (

  • f

f s e t , d a t a ) e x e c ( p r

  • g

, a r g s , . . ) e x i t ( s t a t u s ) w r i t e (

  • f

f s e t , d a t a )

X

Step 3: redo non-attack actions

Time

attacker's process password file adduser alice admin's shell

w r i t e (

  • f

f s e t , d a t a )

slide-38
SLIDE 38

r e a d (

  • f

f s e t , d a t a ) e x e c ( p r

  • g

, a r g s , . . ) w r i t e (

  • f

f s e t , d a t a ) e x i t ( s t a t u s )

X

Repeat step 2: roll back objects

Time

attacker's process password file adduser alice admin's shell

w r i t e (

  • f

f s e t , d a t a )

slide-39
SLIDE 39

r e a d (

  • f

f s e t , d a t a ) e x e c ( p r

  • g

, a r g s , . . ) w r i t e (

  • f

f s e t , d a t a ) e x i t ( s t a t u s )

X

Repeat step 3: redo actions

Time

attacker's process password file adduser alice admin's shell

Key advantage over VM strawman: Re-run only adduser, not entire VM. w r i t e (

  • f

f s e t , d a t a )

slide-40
SLIDE 40

Repeat step 3: redo actions

Time

attacker's process password file adduser alice admin's shell

r e a d (

  • f

f s e t , d a t a ) e x e c ( p r

  • g

, a r g s , . . ) w r i t e (

  • f

f s e t , d a t a ) e x i t ( s t a t u s )

X

w r i t e (

  • f

f s e t , d a t a )

slide-41
SLIDE 41

Repeat step 3: redo actions

Time

attacker's process password file adduser alice admin's shell

r e a d (

  • f

f s e t , d a t a ) e x e c ( p r

  • g

, a r g s , . . ) w r i t e (

  • f

f s e t , d a t a ) e x i t ( s t a t u s )

X

w r i t e (

  • f

f s e t , d a t a )

Better than either VM

  • r taint tracking:

Alice account preserved, no re-run of entire VM

slide-42
SLIDE 42

Challenge: how to avoid re-executing everything?

Time

attacker's process password file adduser alice admin's shell

r e a d (

  • f

f s e t , d a t a ) e x e c ( p r

  • g

, a r g s , . . ) w r i t e (

  • f

f s e t , d a t a ) e x i t ( s t a t u s )

X

w r i t e (

  • f

f s e t , d a t a )

Exit status affects shell, which affects sshd, and so on… Naïve process-level re-execution still re-executes entire system!

slide-43
SLIDE 43

Observation: many suspect computations are not affected

  • Attacker adds 1 account to password file
  • Alice's sshd reads password file,

but looks up Alice's account instead of attacker's

  • Attacker adds 1 line to pdflatex to restart botnet
  • Alice's pdflatex process may restart botnet,

but otherwise does legitimate work

  • Significant changes → can detect attack earlier
slide-44
SLIDE 44

Approach: minimize re-execution

  • Predicates: Retro skips equivalent computations
  • Predicate checks whether inputs are the same
  • If so, assume original result OK, avoid re-execution
  • Refinement: Retro re-executes fine-grained actions
  • Avoid re-executing entire process or login session,

when only a small part of it was affected

slide-45
SLIDE 45

Example 1: exit status to shell unchanged

Time

attacker's process password file adduser alice admin's shell

r e a d (

  • f

f s e t , d a t a ) e x e c ( p r

  • g

, a r g s , . . ) w r i t e (

  • f

f s e t , d a t a ) e x i t ( s t a t u s )

X

w r i t e (

  • f

f s e t , d a t a )

slide-46
SLIDE 46

Predicates: avoid equivalent re-execution

Time

attacker's process password file adduser alice admin's shell

r e a d (

  • f

f s e t , d a t a ) e x e c ( p r

  • g

, a r g s , . . ) w r i t e (

  • f

f s e t , d a t a ) e x i t ( s t a t u s )

X

w r i t e (

  • f

f s e t , d a t a ) Same input (exit status) as before? No need to re-run shell action.

slide-47
SLIDE 47

r e a d (

  • f

f s e t , d a t a )

X

w r i t e (

  • f

f s e t , d a t a )

Example 2: user's password unchanged

Time

attacker's process password file alice's sshd

slide-48
SLIDE 48

r e a d (

  • f

f s e t , d a t a ) c a l l g e t p w n a m ( “ a l i c e ” ) r e t u r n ( A l i c e ' s p a s s w

  • r

d )

X

Refinement: re-execute individual functions

Time

attacker's process password file getpwnam function

w r i t e (

  • f

f s e t , d a t a )

alice's sshd

slide-49
SLIDE 49

r e t u r n ( A l i c e ' s p a s s w

  • r

d ) c a l l g e t p w n a m ( “ a l i c e ” ) r e a d (

  • f

f s e t , d a t a )

X

Refinement: re-execute individual functions

Time

attacker's process password file getpwnam function

w r i t e (

  • f

f s e t , d a t a ) Same return value as before?

alice's sshd

slide-50
SLIDE 50

Remaining challenge: external dependencies

  • What if the attack was externally-visible?
  • Attacker sent spam, or user saw wrong output from ls
  • Cannot solve general case (spam already sent)
  • Will need to pause repair and ask for user input
  • Can do compensating actions in some cases
slide-51
SLIDE 51

Compensating action for terminals: email diff to user

nickolai@karakum:~$ cd undosys/libundo nickolai@karakum:~/undosys/libundo$ ls -l

  • rw-r--r-- 1 nickolai nickolai 493 2010-05-13 09:46 Makefile
  • -rw-r--r-- 1 nickolai nickolai 2124 2010-05-13 10:22 attack.c

drwxr-xr-x 2 nickolai nickolai 4096 2010-05-13 09:46 bdb

  • rwxr-xr-x 1 nickolai nickolai 973 2010-05-13 09:46 mailserver.py

drwxr-xr-x 2 nickolai nickolai 4096 2010-05-13 09:46 php

  • rw-r--r-- 1 nickolai nickolai 5221 2010-05-13 09:46 pwd.c
  • rw-r--r-- 1 nickolai nickolai 1424 2010-05-13 09:46 undo.py

+ -rw-r--r-- 1 nickolai nickolai 662 2010-05-13 09:46 undocall.c + -rw-r--r-- 1 nickolai nickolai 1340 2010-05-13 09:46 undocall.h + -rw-r--r-- 1 nickolai nickolai 755 2010-05-13 09:46 undotest.c + -rwxr-xr-x 1 nickolai nickolai 360 2010-05-13 09:46 undotest.py

  • rw-r--r-- 1 nickolai nickolai 6603 2010-05-13 09:46 undowrap.c

nickolai@karakum:~/undosys/libundo$ du -ks .

  • 84 .

+ 96 . nickolai@karakum:~/undosys/libundo$ cd .. nickolai@karakum:~/undosys$

slide-52
SLIDE 52

Retro implementation

Linux kernel Retro module Processes libc wrappers Action history graph Snapshots Log File system . . . Repair managers OS mgr File system Terminal Network Repair controller

slide-53
SLIDE 53

Retro implementation

Linux kernel Retro module Processes libc wrappers Action history graph Snapshots Log File system . . . Repair managers OS mgr File system Terminal Network Repair controller 700 lines

  • f C

3,300 lines

  • f C

4,800 lines

  • f Python

200 lines

  • f Python
slide-54
SLIDE 54

Retro implementation

Linux kernel Retro module Processes libc wrappers Action history graph Snapshots Log File system . . . Repair managers OS mgr File system Terminal Network Repair controller Existing checkpointing file system (e.g., btrfs)

Preserve inode numbers by only reusing inodes that are free in every snapshot

slide-55
SLIDE 55

Retro implementation

Linux kernel Retro module Processes libc wrappers Action history graph Snapshots Log File system . . . Repair managers OS mgr File system Terminal Network Repair controller

Shepherd re-execution using ptrace to detect and skip equivalent system calls (e.g., exec)

slide-56
SLIDE 56

Retro implementation

Linux kernel Retro module Processes libc wrappers Action history graph Snapshots Log File system . . . Repair managers OS mgr File system Terminal Network Repair controller

Well-defined API: rollback, redo, equiv, connect

slide-57
SLIDE 57

Evaluation questions

  • How much better is Retro than manual repair?
  • What is Retro's cost during normal execution?
slide-58
SLIDE 58

Evaluation setup

  • 2 real-world attacks from honeypot
  • Remove log entries, add accounts, run botnet
  • 2 synthetic challenge attacks
  • Running example (LaTeX trojan) and sshd trojan
  • 6 attacks from Taser recovery system [Goel'05]
  • File sharing, web servers, databases, desktop apps
  • Website backdoors, trojans in ls, new accounts
slide-59
SLIDE 59

Retro repairs from all attacks

Attack Retro User input required Root pw change Skip attacker's login attempt Log cleaning – LaTeX trojan – sshd trojan Packet replay req'd – conflict! Illegal storage – Content destruct. – (generates terminal diff) Unhappy student – (generates terminal diff) Compromised DB – Browser plugin Skip re-execution of browser Weak password Skip attacker's login attempt

slide-60
SLIDE 60

Retro repairs from all attacks

Attack Retro User input required Root pw change Skip attacker's login attempt Log cleaning – LaTeX trojan – sshd trojan Packet replay req'd – conflict! Illegal storage – Content destruct. – (generates terminal diff) Unhappy student – (generates terminal diff) Compromised DB – Browser plugin Skip re-execution of browser Weak password Skip attacker's login attempt

slide-61
SLIDE 61

6/10 cases: no user input needed, automatic re-execution suffices

Attack Retro User input required Root pw change Skip attacker's login attempt Log cleaning – LaTeX trojan – sshd trojan Packet replay req'd – conflict! Illegal storage – Content destruct. – (generates terminal diff) Unhappy student – (generates terminal diff) Compromised DB – Browser plugin Skip re-execution of browser Weak password Skip attacker's login attempt

slide-62
SLIDE 62

2/10 cases: user input needed to skip attacker's SSH logins

Attack Retro User input required Root pw change Skip attacker's login attempt Log cleaning – LaTeX trojan – sshd trojan Packet replay req'd – conflict! Illegal storage – Content destruct. – (generates terminal diff) Unhappy student – (generates terminal diff) Compromised DB – Browser plugin Skip re-execution of browser Weak password Skip attacker's login attempt

slide-63
SLIDE 63

2/10 cases: user input needed to handle legitimate network I/O

Attack Retro User input required Root pw change Skip attacker's login attempt Log cleaning – LaTeX trojan – sshd trojan Packet replay req'd – conflict! Illegal storage – Content destruct. – (generates terminal diff) Unhappy student – (generates terminal diff) Compromised DB – Browser plugin Skip re-execution of browser Weak password Skip attacker's login attempt

slide-64
SLIDE 64

Repair cost: Retro repairs few objects

Attack Objects repaired by Retro Root pw change 7 (0.5%) Log cleaning 99 (8%) LaTeX trojan 190 (15%) sshd trojan 880 (70%)

slide-65
SLIDE 65

Repair cost: Retro repairs few objects

  • Repair cost proportional to extent of attack

Attack Objects repaired by Retro Root pw change 7 (0.5%) Log cleaning 99 (8%) LaTeX trojan 190 (15%) sshd trojan 880 (70%)

slide-66
SLIDE 66

Repair time depends largely on # objects, not log size

Total size of Retro log (action history graph) Repair time for 136 objects / 399 syscalls 399 system calls 0.3 seconds 5,699,149 system calls 4.7 seconds

slide-67
SLIDE 67

Repair time depends largely on # objects, not log size

  • 10,000X increase in workload leads to

10X increase in repair time

  • Much more efficient than whole-VM re-execution

Total size of Retro log (action history graph) Repair time for 136 objects / 399 syscalls 399 system calls 0.3 seconds 5,699,149 system calls 4.7 seconds

slide-68
SLIDE 68

Runtime overheads

Workload CPU cost Storage overhead HotCRP conference web site 35% 4GB / day

slide-69
SLIDE 69

Runtime overheads

Workload CPU cost Storage overhead HotCRP conference web site 35% 4GB / day Apache, small static files 127% 100GB / day Continuous kernel recompile 89% 150GB / day

  • Can store 2 weeks of logs on 2TB disk ($100)

even for worst-case extreme workloads

slide-70
SLIDE 70

Runtime overheads

Workload CPU cost w/ 2nd core Storage overhead HotCRP conference web site 35% 2% 4GB / day Apache, small static files 127% 33% 100GB / day Continuous kernel recompile 89% 18% 150GB / day

  • Can store 2 weeks of logs on 2TB disk ($100)

even for worst-case extreme workloads

  • Can off-load CPU overhead to extra core
slide-71
SLIDE 71

Related work

  • Tracking down intrusions
  • BackTracker [King'03], IntroVirt [Joshi'05]
  • Taint tracking to find, revert affected files
  • Taser [Goel'05], Polygraph [Mahajan'09]
  • Selective undo and re-execution
  • Undoable mail store [Brown'03]

(fixing configuration errors in a single app)

slide-72
SLIDE 72

Conclusion

  • Hard to recover from attacks and preserve

legitimate user changes

  • Retro repairs attacks, keeps legitimate changes
  • Key idea: re-execution of legitimate actions
  • Predicates and refinement minimize re-execution
slide-73
SLIDE 73

Additional slides follow

slide-74
SLIDE 74

Non-deterministic re-execution

  • Goal: an acceptable execution
  • An execution that could have happened in the

absence of the attack

  • What if program is non-deterministic?
  • Re-run may lead to another acceptable execution
  • Result will not be influenced by attack
  • If significant differences arise (e.g., new crypto keys),

might need user input to re-execute