Decoupling Dynamic Information Flow Tracking with a Dedicated - - PowerPoint PPT Presentation

decoupling dynamic information flow tracking with a
SMART_READER_LITE
LIVE PREVIEW

Decoupling Dynamic Information Flow Tracking with a Dedicated - - PowerPoint PPT Presentation

Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor Hari Kannan , Michael Dalton, Christos Kozyrakis Computer Systems Laboratory Stanford University Motivation Dynamic analysis help better understand SW behavior


slide-1
SLIDE 1

Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor

Hari Kannan, Michael Dalton, Christos Kozyrakis

Computer Systems Laboratory Stanford University

slide-2
SLIDE 2

Motivation

Dynamic analysis help better understand SW behavior

Security, Debugging, Full system profiling

Hardware support for such analyses very useful

Provides speed advantage over SW solutions Systems manage m etadata for analysis in hardware

Implementation challenges

Storage overheads of metadata (Suh’05) Processing of metadata

Need fast processing (low overheads)

Need cost effective implementation

Solution: Tightly coupled coprocessor for analysis

2

slide-3
SLIDE 3

Case Study – DIFT (Dynamic Information Flow Tracking)

DIFT taints data from untrusted sources

Extra tag bit per word marks if untrusted

Propagate taint during program execution

Operations with tainted data produce tainted results

Check for suspicious uses of tainted data

Tainted code execution Tainted pointer dereference (code & data) Tainted SQL command

Can detect both low-level & high-level threats

3

slide-4
SLIDE 4

r1:0 r2:idx r3:&buffer r4:0 Data T r5:x

DIFT Example: Memory Corruption

Tainted pointer dereference security trap int idx = tainted_input; buffer[idx] = x; // memory corruption Vulnerable C Code set r1 &tainted_input load r2 M[r1] add r4 r2 + r3 store M[r4] r5 r4:&buffer+idx r1:&input r2:idx=input TRAP

4

slide-5
SLIDE 5

HW Option 1: In-core DIFT

Policy Decode Tag ALU Tag Check

Decode

D-Cache

RegFile ALU I-Cache

Traps

W B

5

Integrated DIFT hardware [ Dalton’07, Suh’04, Chen’05]

No performance, minor power, and minor area overhead Invasive changes to processor High design and validation costs

Synchronizes metadata and data per instruction

slide-6
SLIDE 6

Core 1 (App)

HW Option 2: Offloading DIFT

Capture Trace

Log buffer (L2 cache)

6

Core 2 (DIFT)

Analyze Trace

SW DI FT on modified multi-core chip (e.g., CMU’s LBA)

Flexible support for various analyses Large area & power overhead (2nd core, trace compress) Large performance overhead (DBT, memory traffic) Significant changes to processor & memory hierarchy

General Purpose Core General Purpose Core

slide-7
SLIDE 7

Our Proposal: DIFT Coprocessor

7

Off-core DIFT coprocessor (similar to watchdog processors)

Small performance, power, and area overhead Minor changes to processor Reuse across processor designs

L2 Cache Cache Main Core

Tag Cache

Tag Core

Instructions

Exceptions

DIFT Coprocessor General Purpose Core

slide-8
SLIDE 8

Outline

Motivation & Overview Software Interface of the coprocessor Architecture of the coprocessor Performance & Security Evaluation Conclusion

8

slide-9
SLIDE 9

Coprocessor Setup

A pair of policy registers

Accessible via coprocessor instructions

Could also be memory-mapped

Policy granularity: operation type

Select input operands to be checked (if tainted) Select input operands that propagate taint to output Select the propagation mode (and, or, xor)

ISA instructions decomposed to 1 operations

Types: ALU, logical, branch, memory, compare, FP, … Makes policies independent of ISA packaging

Same HW policies for both RISC & CISC ISAs 9

slide-10
SLIDE 10

r1:0 r2:idx r3:&buffer r4:0 Data T r5:x

What happens without Proc/Coproc Synchronization?

int idx = tainted_input; buffer[idx] = x; // memory corruption Vulnerable C Code set r1 &tainted_input load r2 M[r1] add r4 r2 + r3 store M[r4] r5 r4:&buffer+idx r1:&input r2:idx=input EXPLOIT

10

… exec (sys call)

Attacker executes system call system com prom ise

SYSTEM COMPROMISE

slide-11
SLIDE 11

System Calls as Sync points

Key I dea: Main core and coproc sync at system calls Security:

This prevents attacker from executing system calls Application’s corrupted address space can be discarded Does not weaken the DIFT model

DIFT detects attack only at time of exploit, not corruption

Performance:

Synchronization overhead typically tens of cycles

Function of decoupling queue size

Lost in the noise of system call overheads (hundreds of cycles)

11

slide-12
SLIDE 12

r1:0 r2:idx r3:&buffer r4:0 Data T r5:x

System Call Synchronization

int idx = tainted_input; buffer[idx] = x; // memory corruption Vulnerable C Code set r1 &tainted_input load r2 M[r1] add r4 r2 + r3 store M[r4] r5 r4:&buffer+idx r1:&input r2:idx=input TRAP

12

… exec (sys call) STALL Tainted pointer dereference security exception

slide-13
SLIDE 13

Coprocessor Design

DIFT functionality in a coprocessor

4 tag bits of metadata per word of data

Coprocessor Interface (via decoupling queue)

Pass committed instruction information Instruction encoding could be at micro-op granularity (in x86) Physical address obviates need for MMU in coprocessor

Processor Core

I Cache D Cache

Policy Decode Tag ALU Tag Check Tag Cache Tag RF W B

DIFT Coprocessor

PC Inst Encoding Physical Address

Security exception L2 Cache

13

Decoupling queue

Stall

slide-14
SLIDE 14

Prototype

Leon-3 @40MH z 512MB DRAM Ethernet AoE

Ethernet AoE Leon-3 @65MHz 512MB DRAM

14

Hardware

Paired with simple SPARC V8 core (Leon-3) Mapped to FPGA board

Software

Fully-featured Linux 2.6

Design statistics

Clock frequency: same as original Logic: + 7.5% overhead

  • f simple in-order core with no speculation
slide-15
SLIDE 15

System Performance Overheads

Runtime overhead < 1 % over SPEC benchmarks

512 byte tag cache 6-entry decoupling queue

15

0.00% 0.20% 0.40% 0.60% 0.80% 1.00% gzip gap vpr gcc mcf crafty parser vortex bzip2 twolf

Runtim e Overhead ( % )

slide-16
SLIDE 16

Scaling the tag cache

Worst case micro-benchmark

512-byte tag cache provides good performance

16

0% 5% 10% 15% 20% 25% 16B 32B 64B 128B 256B 512B 1K

Runtim e Overhead ( % ) Size of the Tag Cache

Queue full stalls Memory contention Stalls Queue full Stalls

slide-17
SLIDE 17

Scaling the decoupling queue

Worst case micro-benchmark

6 entry queue reduces performance overhead

17

0% 2% 4% 6% 8% 10% 12% 2 4 6

Runtim e Overhead ( % )

Size of the Queue ( no. of entries) Queue fill Stalls Memory contention Stalls Queue full Stalls

slide-18
SLIDE 18

Coprocessors for complex cores

Modest overheads with higher IPC cores

Because main core rarely achieves peak IPC (= 1) Coprocessor performs very simple operations

Implies coprocessor can be paired with complex cores

18

0.9 0.95 1 1.05 1.1 1.15 1.2 1 1.5 2 Relative Overhead Ratio of m ain core's clock to coprocessor's clock gzip gcc twolf

slide-19
SLIDE 19

Security Policies Overview

19

P Bit T Bit B Bit S Bit Buffer Overflow Policy Identify all pointers, and track data taint. Check for illegal tainted ptr use. Y Y Offset-based attacks ( control ptr) Track data taint, and bounds check to validate. Y Form at String Policy Check tainted args to print commands. Y Y SQL/ XSS Check tainted commands. Y Y Red zone Policy Sandbox heap data. Y Sandboxing Policy Protect the security handler. Y

slide-20
SLIDE 20

Security Experiments

Program Lang. Attack Detected Vulnerability tar C Directory Traversal Open tainted dir gzip C Directory Traversal Open tainted dir Wu-FTPD C Format String Tainted ‘%n’ in vfprintf string SUS C Format String Tainted ‘%n’ in syslog quotactl syscall C User/kernel pointer dereference Tainted pointer to kernelspace sendmail C Buffer (BSS) Overflow Tainted code ptr polymorph C Buffer Overflow Tainted code ptr htdig C++ Cross-site Scripting Tainted <script> tag Scry PHP Cross-site Scripting Tainted <script> tag Unmodified SPARC binaries from real-world programs

Basic/ net utilities, servers, web apps, search engine

20

slide-21
SLIDE 21

Security Experiments

Protection against low-level memory corruptions

Both in userspace and kernelspace

21

Program Lang. Attack Detected Vulnerability tar C Directory Traversal Open tainted dir gzip C Directory Traversal Open tainted dir Wu-FTPD C Format String Tainted ‘%n’ in vfprintf string SUS C Format String Tainted ‘%n’ in syslog quotactl syscall C User/kernel pointer dereference Tainted pointer to kernelspace sendmail C Buffer (BSS) Overflow Tainted code ptr polymorph C Buffer Overflow Tainted code ptr htdig C++ Cross-site Scripting Tainted <script> tag Scry PHP Cross-site Scripting Tainted <script> tag

slide-22
SLIDE 22

Security Experiments

Protection against semantic vulnerabilities

22

Program Lang. Attack Detected Vulnerability tar C Directory Traversal Open tainted dir gzip C Directory Traversal Open tainted dir Wu-FTPD C Format String Tainted ‘%n’ in vfprintf string SUS C Format String Tainted ‘%n’ in syslog quotactl syscall C User/kernel pointer dereference Tainted pointer to kernelspace sendmail C Buffer (BSS) Overflow Tainted code ptr polymorph C Buffer Overflow Tainted code ptr htdig C++ Cross-site Scripting Tainted <script> tag Scry PHP Cross-site Scripting Tainted <script> tag

slide-23
SLIDE 23

Security Experiments

Protection is independent of programming language

Propagation & checks at the level of basic ops

23

Program Lang. Attack Detected Vulnerability tar C Directory Traversal Open tainted dir gzip C Directory Traversal Open tainted dir Wu-FTPD C Format String Tainted ‘%n’ in vfprintf string SUS C Format String Tainted ‘%n’ in syslog quotactl syscall C User/kernel pointer dereference Tainted pointer to kernelspace sendmail C Buffer (BSS) Overflow Tainted code ptr polymorph C Buffer Overflow Tainted code ptr htdig C++ Cross-site Scripting Tainted <script> tag Scry PHP Cross-site Scripting Tainted <script> tag

slide-24
SLIDE 24

Conclusions

Hardware dynamic analyses aid program understanding

Decoupling analyses from main core essential for practicality

Proposed a tightly coupled coprocessor for DIFT

Does not compromise security model Has low performance and area overheads

Full-system FPGA prototype

Reliably catches exploits in user & kernel-space

24