 
              Security Needs a Better Hardware-Software Contract Gernot Heiser | gernot@unsw.edu.au | @GernotHeiser • DAC’19, Las Vegas, 5 June 2019 https://trustworthy.systems
Threats Speculation An “unknown unknown” until recently A “known unknown” for decades Microarchitectural Timing Channel 2 | DAC, Las Vegas, 5 June 2019
What Are Timing Channels? 3 | DAC, Las Vegas, 5 June 2019
Timing Channels Information leakage through timing of events • Typically by observing response latencies or own execution speed Covert channel: Information flow that bypasses the security policy High Low Trojan Victim Attacker Spy executes encodes observes observes normally info Side channel: Covert channel exploitable without insider help 4 | DAC, Las Vegas, 5 June 2019
Cause : Competition for Shared HW Resources High Low Shared hardware Affect execution speed • Inter-process interference • Competing access to micro- architectural features • Hidden by the HW-SW contract! 5 | DAC, Las Vegas, 5 June 2019
Security: A HW-SW Codesign Issue 6 | DAC, Las Vegas, 5 June 2019
Enforcing Security High Low Enforce policies Operating System HW-SW Contract Hardware (CPU etc) Provide mechanisms 7 | DAC, Las Vegas, 5 June 2019
Why Hardware Cannot Do Security Alone • Security policies are high-level • Course-grain: “applications” are sets of cooperating processes • Hardware mechanisms are fine-grain: instructions, pages, address spaces • Much semantics lost in mapping to hardware level • Security policies are complex: “Can A talk to B?” is too simple • maybe one-way communication is allowed • maybe communication is allowed under certain conditions • maybe low-bandwidth leakage doesn’t matter • maybe secrets only matter for a short time • maybe only subset of {confidentiality, integrity, availability} is important 8 | DAC, Las Vegas, 5 June 2019
Why the ISA is an Insufficient Contract • The ISA is a purely operational contract • Sufficient for ensuring functional correctness The ISA intentionally • Insufficient for ensuring confidentiality or availability abstracts time away Affect execution speed: Availability violation High Low Observe execution speed: Confidentiality violation 9 | DAC, Las Vegas, 5 June 2019
What Is Needed? 10 | DAC, Las Vegas, 5 June 2019
Confidentiality Needs Time Protection High Low Time protection: A collection of OS mechanisms which collectively prevent interference between security domains that make execution Traditionally OSes enforce speed in one domain security by memory protection , dependent on the activities of i.e. enforcing spatial isolation another. [Ge et al. EuroSys’19] 11 | DAC, Las Vegas, 5 June 2019
Time Protection: Partition Hardware Low Low High High Temporally partition Flush Cache Cache Need Need Spatially partition both! both! Flushing useless for Cannot spatially partition on- Low High concurrent access core caches (L1, TLB, branch predictor, pre-fetchers) • HW threads • virtually-indexed • cores • OS cannot control Cache 12 | DAC, Las Vegas, 5 June 2019
Requirements for Time Protection Off-core state & stateless HW Timing channels can be closed iff the OS can • (spatially) partition or • reset all shared hardware On-core state 13 | DAC, Las Vegas, 5 June 2019
Sharing 1: Stateless Interconnect H/W is bandwidth-limited High Low • Interference during concurrent access • Generally reveals no data or addresses • Must encode info into access patterns Shared • Only usable as covert channel, not interconnect side channel No effective defence Memory with present hardware! 14 | DAC, Las Vegas, 5 June 2019
Sharing 2: Stateful Hardware High Low HW is capacity-limited • Interference during • concurrent access • time-shared access • Collisions reveal addresses • Usable as side channel Cache Solvable problem – focus of this work Any state-holding microarchitectural feature: • cache, branch predictor, pre-fetcher state machine 15 | DAC, Las Vegas, 5 June 2019
Implementing Time Protection on Stateful Hardware 16 | DAC, Las Vegas, 5 June 2019
Spatial Partitioning: Cache Colouring High Low • Partitions get frames of disjoint colours TCB PT TCB PT • seL4: userland supplies kernel memory ⇒ colouring userland colours dynamic kernel memory • Per-partition kernel image to colour kernel [Ge et al. EuroSys’19] Cache RAM 17 | DAC, Las Vegas, 5 June 2019
Temporal Partitioning: Flush on Switch Must remove any Latency depends history dependence! on prior execution! 1. T 0 = current_time() 2. Switch user context 3. Flush on-core state Time padding to Remove 4. Touch all shared data needed for return dependency 5. while (T 0 +WCET < current_time()) ; 6. Reprogram timer Ensure 7. return deterministic execution 18 | DAC, Las Vegas, 5 June 2019
Reality Check: Flushing On-Core State 19 | DAC, Las Vegas, 5 June 2019
Evaluating Intra-Core Channels Low Low High High Flush Flush Cache Cache Mitigation on Intel and Arm processors: • Disable data prefetcher (just to be sure) • On context switch, perform all architected flush operations: • Intel: wbinvd + invpcid (no targeted L1-cache flush supported!) • Arm: DCCISW + ICIALLU + TLBIALL + BPIALL 20 | DAC, Las Vegas, 5 June 2019
Methodology: Prime and Probe High Low Trojan Spy encodes observes 1. Fill cache with own data 2. Touch n cache lines Output Signal 3. Traverse cache, measure execution time Input Signal 21 | DAC, Las Vegas, 5 June 2019
Methodology: Channel Matrix Probing time (cycles) 0.04 12000 0.035 datafile using 1:2:($3>pmax ? pmax : $3) 11000 0.03 Raw I-cache channel 10000 0.025 0.02 Intel Sandy Bridge 9000 0.015 8000 0.01 7000 0.005 0 0 10 20 30 40 50 60 Cache sets accessed Channel Matrix: Horizontal • Conditional probability of variation indicates observing time, t , given input, n . channel • Represented as heat map: • bright = high probability 22 | DAC, Las Vegas, 5 June 2019
I-Cache Channel With Full State Flush 0.01 Time (cycles) datafile using 1:2:3 64000 63000 CHANNEL! Intel Sandy Bridge 62000 0.001 61000 60000 0 10 20 30 40 50 60 14000 Time (cycles) datafile using 1:2:3 0.01 13500 CHANNEL! Intel Haswell 13000 0.001 12500 0 2 4 6 8 10 Output (cycles) No evidence 11000 datafile using 1:2:3 0.00100 Intel Skylake 10000 9000 of channel 8000 7000 0.00010 0 10 20 30 40 50 60 Input (sets) 94000 Time (cycles) SMALL CHANNEL! datafile using 1:2:3 0.00100 HiSilicon A53 92000 90000 0.00010 0 5 10 15 20 25 30 35 40 Cache sets 23 | DAC, Las Vegas, 5 June 2019
HiSilicon A53 Branch History Buffer Branch history buffer (BHB) Channel! • One-bit channel • All reset operations applied 10 -1 Spy execution time 1000 10 -2 800 10 -3 10 -4 600 10 -5 400 Trojan signal 0 1 24 | DAC, Las Vegas, 5 June 2019
Intel Haswell Branch Target Buffer Branch target buffer Spy execution time • All reset operations 34000 applied Time (cycles) datafile using 1:2:3 33000 0.01 32000 0.001 31000 3500 4000 4500 5000 Trojan cache footprint Channel! Found residual channels in all recent Intel and ARM processors examined! 25 | DAC, Las Vegas, 5 June 2019
Intel Spectre Defences Intel added indirect branch control (IBC) feature, which closes most channels, but… Intel Skylake Branch history buffer Small channel! https://ts.data61.csiro.au/projects/TS/timingchannels/arch-mitigation.pml 26 | DAC, Las Vegas, 5 June 2019
Requirements on Hardware 27 | DAC, Las Vegas, 5 June 2019
New HW/SW Contract: aISA Augmented ISA supporting time protection For all shared microarchitectural resources: 1. Resource must be spatially partitionable or flushable 2. Concurrently shared resources must be spatially partitioned 3. Resource accessed solely by virtual address must be flushed and not concurrently accessed • Implies cannot share HW threads across security domains! 4. Mechanisms must be sufficiently specified for OS to partition or reset 5. Mechanisms must be constant time, or of specified, bounded latency 6. Desirable: OS should know if resettable state is derived from data, instructions, data addresses or instruction addresses 28 | DAC, Las Vegas, 5 June 2019
THANK YOU Gernot Heiser | gernot@unsw.edu.au | @GernotHeiser https://trustworthy.systems
Recommend
More recommend