PAGURUS: Low-Overhead Dynamic Information Flow Tracking on Loosely - - PowerPoint PPT Presentation

pagurus low overhead dynamic information flow tracking on
SMART_READER_LITE
LIVE PREVIEW

PAGURUS: Low-Overhead Dynamic Information Flow Tracking on Loosely - - PowerPoint PPT Presentation

ACM/IEEE CODES+ISSS 2018, Turin, Italy PAGURUS: Low-Overhead Dynamic Information Flow Tracking on Loosely Coupled Accelerators Luca Piccolboni, Giuseppe Di Guglielmo and Luca P. Carloni Columbia University, NY, USA Systems-on-Chip (SoCs) Are


slide-1
SLIDE 1

PAGURUS: Low-Overhead Dynamic Information Flow Tracking on Loosely Coupled Accelerators

Luca Piccolboni, Giuseppe Di Guglielmo and Luca P. Carloni

Columbia University, NY, USA

ACM/IEEE CODES+ISSS 2018, Turin, Italy

slide-2
SLIDE 2

ACM/IEEE CODES + ISSS 2018, Turin, Italy 2 / 16

[M. Gautschi et al., IEEE VLSI ’17]

Data

RAM

Instr.

RAM

Boot.

RAM

AXI

UART SPI M.

APB

PULPino Processor Core (RI5CY)

Systems-on-Chip (SoCs)

Are Vulnerable to Software Attacks

slide-3
SLIDE 3

ACM/IEEE CODES + ISSS 2018, Turin, Italy 3 / 16

main memory ... buff[0] = sw(7) buff[1] = sw(7) buff[9] = sw(7) fun = 0xAA num = 10 val = 7

int nt buff[10], k; int (*fun)(int) = foo; int int num = atoi(argv[1]); int int val = atoi(argv[2]); /* this is a bad idea */ for for (k = 0; k < num; ++k) buff[k] = sw(val); fun(1); // call foo?

Buffer-Overflow Attack

Attacking PULPino

memory location: 0xAA

slide-4
SLIDE 4

ACM/IEEE CODES + ISSS 2018, Turin, Italy 3 / 16

main memory ... buff[0] = sw(7) buff[1] = sw(7) buff[9] = sw(7) fun = sw(7) num = 11 val = 7

int nt buff[10], k; int (*fun)(int) = foo; int int num = atoi(argv[1]); int int val = atoi(argv[2]); /* this is a bad idea */ for for (k = 0; k < num; ++k) buff[k] = sw(val); fun(1); // call foo?

memory location: 0xAA

Buffer-Overflow Attack

Attacking PULPino

can be used to call a malicious function

slide-5
SLIDE 5

ACM/IEEE CODES + ISSS 2018, Turin, Italy 3 / 16

Dynamic Information Flow Tracking (DIFT)

main memory ... buff[0] = sw(0x7) buff[1] = sw(0x7) buff[9] = sw(0x7) func = 0xAA num = 0x7 val = 0xA

1 1 1 1 1

main memory ... buff[0] = sw(7) buff[1] = sw(7) buff[9] = sw(7) num = 7 val = 11

Attacking PULPino

int nt buff[10], k; int (*fun)(int) = foo; int int num = atoi(argv[1]); int int val = atoi(argv[2]); /* this is a bad idea */ for for (k = 0; k < num; ++k) buff[k] = sw(val); fun(1); // call fun

memory location: 0xAA

fun = sw(7) [G. E. Suh et al., ACM ASPLOS ’04] tags

slide-6
SLIDE 6

ACM/IEEE CODES + ISSS 2018, Turin, Italy 4 / 16

[M. Gautschi et al., IEEE VLSI ’17] UART SPI M.

PULPino Data

RAM

Instr.

RAM

Boot.

RAM

Processor Core (RI5CY)

[C. Palmiero et al., IEEE HPEC ’18] DIFT Extensions

Now Secured with DIFT

Homogenous SoCs

AXI APB

slide-7
SLIDE 7

ACM/IEEE CODES + ISSS 2018, Turin, Italy 5 / 16

[M. Gautschi et al., IEEE VLSI ’17] UART SPI M.

Data

RAM

Instr.

RAM

Boot.

RAM

Processor Core (RI5CY)

Loosely Coupled Accelerator #1

[C. Palmiero et al., IEEE HPEC ’18]

PULPino

No-More-Secured with DIFT

Heterogeneous SoCs

DIFT Extensions

Loosely Coupled Accelerator #2

AXI APB

slide-8
SLIDE 8

ACM/IEEE CODES + ISSS 2018, Turin, Italy 6 / 16

Attacking PULPino (Again)

int nt buff[10] = {0}; int (*f)(int) = foo; int int num = atoi(argv[1]); int int val = atoi(argv[2]); /* this is a bad idea */ hw(num, val, buff);

Buffer-Overflow Attack

main memory ... buff[0] = sw(0x7) buff[1] = sw(0x7) buff[9] = sw(0x7) func = 0xAA num = 0x7 val = 0xA

1 1 1 1

main memory ... buff[0] = 0 buff[1] = 0 buff[9] = 0 num = 11 val = 7 tags fun = 0xAA

1

slide-9
SLIDE 9

ACM/IEEE CODES + ISSS 2018, Turin, Italy 6 / 16

Attacking PULPino (Again)

the accelerator is not able to propagate the tags

Buffer-Overflow Attack

1

main memory ... buff[0] = sw(0x7) buff[1] = sw(0x7) buff[9] = sw(0x7) func = 0xAA num = 0x7 val = 0xA main memory ... buff[0] = hw(7) buff[1] = hw(7) buff[9] = hw(7) num = 11 val = 7 fun = hw(7)

can be used to call a malicious function

tags

int nt buff[10] = {0}; int (*f)(int) = foo; int int num = atoi(argv[1]); int int val = atoi(argv[2]); /* this is a bad idea */ hw(num, val, buff);

slide-10
SLIDE 10

ACM/IEEE CODES + ISSS 2018, Turin, Italy 7 / 16

Contributions

  • 1. We propose PAGURUS, a methodology to design a

circuit shell that adds DIFT support to accelerators

slide-11
SLIDE 11

DIFT Shell

ACM/IEEE CODES + ISSS 2018, Turin, Italy 7 / 16 AXI

UART SPI M.

APB

PULPino System-on-Chip Data

RAM

Instr.

RAM

Boot.

RAM

Processor Core (RI5CY)

DIFT Shell

Contributions

Loosely Coupled Accelerator #1 Loosely Coupled Accelerator #2

slide-12
SLIDE 12

ACM/IEEE CODES + ISSS 2018, Turin, Italy 7 / 16

Contributions

  • 2. We propose a metric to quantitatively measure

the security guarantees provided by the shell

a) The shell design is independent from the design of the accelerators and vice versa b) The shell has low overheads on both the performance and cost of accelerators

  • 1. We propose PAGURUS, a methodology to design a

circuit shell that adds DIFT support to accelerators

slide-13
SLIDE 13

ACM/IEEE CODES + ISSS 2018, Turin, Italy 8 / 16

  • 1. The hardware is safe: no hardware Trojans

Assumptions and Attack Model

Preliminaries

  • 2. The software is not safe: it contains bugs

and vulnerabilities useful for the attackers

The attackers exploit these vulnerabilities through common I/O interfaces with the goal of affecting the integrity and/or the confidentiality of the hardware-accelerated software applications

slide-14
SLIDE 14

ACM/IEEE CODES + ISSS 2018, Turin, Italy 8 / 16

main memory

value #1

  • 1. Coupled Scheme

value #2 value #3

Preliminaries

Tagging Scheme

tag #1 tag #2 tag #3

tags

[J. Porquet et al., ACM/IEEE CODES’13]

slide-15
SLIDE 15

ACM/IEEE CODES + ISSS 2018, Turin, Italy 8 / 16

main memory

value #1

  • 1. Coupled Scheme

tag #1 tag #3

protected region in memory

value #2 value #3 tag #2

  • 2. Decoupled Scheme

Preliminaries

Tagging Scheme

[J. Porquet et al., ACM/IEEE CODES’13]

slide-16
SLIDE 16

ACM/IEEE CODES + ISSS 2018, Turin, Italy 8 / 16

main memory

value #1

  • 1. Coupled Scheme

tag #1 tag #3 value #2 value #3 tag #2

  • 2. Decoupled Scheme

2.1. Interleaved Scheme

tag offset = # words in memory between two consecutive values

(tag offset = 1)

Preliminaries

Tagging Scheme

[J. Porquet et al., ACM/IEEE CODES’13]

slide-17
SLIDE 17

ACM/IEEE CODES + ISSS 2018, Turin, Italy 9 / 16

Contributions

a) The shell design is independent from the design of the accelerators and vice versa b) The shell has low overheads on both the performance and cost of accelerators

  • 1. We propose PAGURUS, a methodology to design a

circuit shell that adds DIFT support to accelerators

slide-18
SLIDE 18

ACM/IEEE CODES + ISSS 2018, Turin, Italy 9 / 16

Architecture

Loosely Coupled

Accelerator

main memory

register #1 register #2 register #K

...

Accelerators

configuration reg #1

reg #K private local memory / scratchpad

bank bank bank bank

slide-19
SLIDE 19

ACM/IEEE CODES + ISSS 2018, Turin, Italy 9 / 16 Loosely Coupled

Accelerator

main memory

input

compute

burst length

Accelerators

configuration

load input

val val val

configuration reg #1

reg #K private local memory / scratchpad

bank bank bank bank

Architecture

slide-20
SLIDE 20

ACM/IEEE CODES + ISSS 2018, Turin, Italy 9 / 16 Loosely Coupled

Accelerator

main memory load compute load input

Accelerators

configuration load input

val val val val val val

configuration reg #1

reg #K private local memory / scratchpad

bank bank bank bank

Architecture

slide-21
SLIDE 21
  • utput

ACM/IEEE CODES + ISSS 2018, Turin, Italy 9 / 16 Loosely Coupled

Accelerator

main memory load store

burst length

store output

Accelerators

load load input

val val val val val val

compute load input

val val val

configuration reg #1

reg #K private local memory / scratchpad

bank bank bank bank

Architecture

slide-22
SLIDE 22

ACM/IEEE CODES + ISSS 2018, Turin, Italy 10 / 16

DIFT Shell

Architecture

DIFT Shell

Accelerator Loosely Coupled

Accelerator

slide-23
SLIDE 23

ACM/IEEE CODES + ISSS 2018, Turin, Italy 10 / 16

DIFT Shell

Architecture

main memory

register #1 register #2 register #K

...

shell configuration

  • reg. #K+1: src_tag
  • reg. #K+2: dst_tag

dst_tag src_tag

Accelerator Loosely Coupled

Accelerator

DIFT Shell

slide-24
SLIDE 24

ACM/IEEE CODES + ISSS 2018, Turin, Italy 10 / 16

DIFT Shell

Architecture

shell configuration main memory

input src_tag src_tag

shell load logic

if tag != src_tag

DIFT_exception!

val val val tag val tag

Accelerator

dst_tag src_tag

burst length

Loosely Coupled

Accelerator

DIFT Shell

slide-25
SLIDE 25

ACM/IEEE CODES + ISSS 2018, Turin, Italy 10 / 16

DIFT Shell

Architecture

shell configuration main memory shell load logic shell store logic

val val val tag val tag

  • utput

dst_tag dst_tag val tag val tag

dst_tag src_tag

Accelerator

burst length

Loosely Coupled

Accelerator

DIFT Shell

slide-26
SLIDE 26

ACM/IEEE CODES + ISSS 2018, Turin, Italy 11 / 16

Contributions

a) The shell design is independent from the design of the accelerators and vice versa b) The shell has low overheads on both the performance and cost of accelerators

  • 1. We propose PAGURUS, a methodology to design a

circuit shell that adds DIFT support to accelerators

  • 2. We propose a metric to quantitatively measure

the security guarantees provided by the shell

slide-27
SLIDE 27

ACM/IEEE CODES + ISSS 2018, Turin, Italy 11 / 16

A Security Metric

Definition

main memory

input value #1 src_tag value #2 value #3

slide-28
SLIDE 28

ACM/IEEE CODES + ISSS 2018, Turin, Italy 11 / 16

A Security Metric

Definition

main memory

input value #1 [overwritten] src_tag [overwritten] value #2 [overwritten] value #3 [overwritten] value #1

Loosely Coupled

Accelerator

DIFT Shell val tag val val

  • utput
slide-29
SLIDE 29

ACM/IEEE CODES + ISSS 2018, Turin, Italy 11 / 16

A Security Metric

main memory

input value #1 [overwritten] src_tag [overwritten] value #2 [overwritten] value #3 [overwritten]

Loosely Coupled

Accelerator

DIFT Shell val tag val val value #1

DIFT_exception!

Information Leakage

  • Quantitative metric for security

Definition

  • utput
slide-30
SLIDE 30

ACM/IEEE CODES + ISSS 2018, Turin, Italy 12 / 16

A Security Metric

  • Information Leakage: amount of data that can be

produced as output by an accelerator before its shell realizes that the input has been corrupted

I/O ratio: the number of load bursts necessary to produce a store burst

Analysis

  • 1. Tag offset:

tag offset leakage

  • 2. Algorithm:

I/O ratio leakage

slide-31
SLIDE 31

ACM/IEEE CODES + ISSS 2018, Turin, Italy 12 / 16

A Security Metric

  • Information Leakage: amount of data that can be

produced as output by an accelerator before its shell realizes that the input has been corrupted

  • 1. Tag offset:

tag offset leakage

  • 2. Algorithm:

I/O ratio leakage

  • 3. Implementation:

burst len.

leakage

  • 4. Workload:
  • work. size

leakage

Analysis

slide-32
SLIDE 32

Experimental Results

13 / 16

Experimental Setup (1/2)

ACM/IEEE CODES + ISSS 2018, Turin, Italy

  • We designed three loosely coupled accelerators:
  • GRAY: converts a RGB image into a grayscale image
  • MEAN: calculates the mean of a 2D matrix (columns)
  • MULTS: mutiplies a 2D matrix by its transpose
slide-33
SLIDE 33

Experimental Results

13 / 16

Experimental Setup (1/2)

ACM/IEEE CODES + ISSS 2018, Turin, Italy

  • We designed three loosely coupled accelerators:
  • GRAY: converts a RGB image into a grayscale image
  • MEAN: calculates the mean of a 2D matrix (columns)
  • MULTS: mutiplies a 2D matrix by its transpose

GRAY

load burst store burst

slide-34
SLIDE 34

Experimental Results

13 / 16

Experimental Setup (1/2)

ACM/IEEE CODES + ISSS 2018, Turin, Italy

  • We designed three loosely coupled accelerators:
  • GRAY: converts a RGB image into a grayscale image
  • MEAN: calculates the mean of a 2D matrix (columns)
  • MULTS: mutiplies a 2D matrix by its transpose

GRAY … MEAN

slide-35
SLIDE 35

Experimental Results

13 / 16

Experimental Setup (1/2)

ACM/IEEE CODES + ISSS 2018, Turin, Italy

  • We designed three loosely coupled accelerators:
  • GRAY: converts a RGB image into a grayscale image
  • MEAN: calculates the mean of a 2D matrix (columns)
  • MULTS: mutiplies a 2D matrix by its transpose

GRAY … MEAN … MULTS

slide-36
SLIDE 36

Experimental Results

13 / 16

Experimental Setup (1/2)

ACM/IEEE CODES + ISSS 2018, Turin, Italy

  • We designed three loosely coupled accelerators:
  • GRAY: converts a RGB image into a grayscale image
  • MEAN: calculates the mean of a 2D matrix (columns)
  • MULTS: mutiplies a 2D matrix by its transpose
  • We used Cadence Stratus HLS for high-level synthesis

and Xilinx Vivado for logic synthesis à Virtex-7 FPGA

  • We designed the accelerators and the shell in SystemC
slide-37
SLIDE 37

Experimental Results

14 / 16

Experimental Setup (2/2)

ACM/IEEE CODES + ISSS 2018, Turin, Italy

We explored different alternatives by varying:

  • accelerator
  • tag offset
  • burst size
  • workload

[P. Mantovani et al., ACM/IEEE DAC ’16]

Embedded Scalable Platforms

[L. P. Carloni, ACM/IEEE DAC ’16]

Loosely Coupled

Accelerator

Processor

Core (Leon3)

+ Shell

Memory Controller I/O channels and peripher.

Network-on-Chip

  • 128 x 128 - small
  • 512 x 512 - medium
  • 2048 x 2048 - large
slide-38
SLIDE 38

Experimental Results

14 / 16 ACM/IEEE CODES + ISSS 2018, Turin, Italy

Quantitative Security Analysis - MEAN

0% 20% 40% 60% 80% 100% 26 27 28 29 210 211 212 213 information leakage (%) burst size (bytes)

0% 20% 40% 60% 80% 100% 26 27 28 29 210 211 212 213 burst size (bytes) 0% 20% 40% 60% 80% 100% 26 27 28 29 210 211 212 213 information leakage (%) burst size (bytes)

medium small

large

220 213 224 215 216 211

slide-39
SLIDE 39

0% 20% 40% 60% 80% 100% 26 27 28 29 210 211 212 213 information leakage (%) burst size (bytes)

0% 20% 40% 60% 80% 100% 26 27 28 29 210 211 212 213 burst size (bytes) 0% 20% 40% 60% 80% 100% 26 27 28 29 210 211 212 213 information leakage (%) burst size (bytes)

Experimental Results

14 / 16 ACM/IEEE CODES + ISSS 2018, Turin, Italy

Quantitative Security Analysis - MEAN

max information leakage => the highest tag offset min information leakage => the lowest tag offset 220 213

medium

224 215

small

large

216 211

slide-40
SLIDE 40

Experimental Results

14 / 16 ACM/IEEE CODES + ISSS 2018, Turin, Italy

Quantitative Security Analysis - MEAN

0% 20% 40% 60% 80% 100% 26 27 28 29 210 211 212 213 information leakage (%) burst size (bytes)

0% 20% 40% 60% 80% 100% 26 27 28 29 210 211 212 213 burst size (bytes) 0% 20% 40% 60% 80% 100% 26 27 28 29 210 211 212 213 information leakage (%) burst size (bytes)

220 213

medium

224 215

small

large

216 211

slide-41
SLIDE 41

Experimental Results

14 / 16 ACM/IEEE CODES + ISSS 2018, Turin, Italy

Quantitative Security Analysis - GRAY

0% 20% 40% 60% 80% 100% 26 27 28 29 210 211 212 213 information leakage (%) burst size (bytes) 0% 20% 40% 60% 80% 100% 26 27 28 29 210 211 212 213 information leakage (%) burst size (bytes)

0% 20% 40% 60% 80% 100% 26 27 28 29 210 211 212 213 information leakage (%) burst size (bytes)

220 213

medium

224 215

small

large

216 211

slide-42
SLIDE 42

Experimental Results

14 / 16 ACM/IEEE CODES + ISSS 2018, Turin, Italy

Quantitative Security Analysis - GRAY

0% 20% 40% 60% 80% 100% 26 27 28 29 210 211 212 213 information leakage (%) burst size (bytes)

0% 20% 40% 60% 80% 100% 26 27 28 29 210 211 212 213 information leakage (%) burst size (bytes) 0% 20% 40% 60% 80% 100% 26 27 28 29 210 211 212 213 information leakage (%) burst size (bytes)

219 25

medium

223 25

small

large

215 25

slide-43
SLIDE 43

Experimental Results

14 / 16 ACM/IEEE CODES + ISSS 2018, Turin, Italy

Quantitative Security Analysis - MULTS

0% 20% 40% 60% 80% 100% 26 27 28 29 210 211 212 213 burst size (bytes)

0.00% 0.02% 0.04% 0.06% 0.08% 0.10% 0.12% 0.14%

0% 20% 40% 60% 80% 100% 26 27 28 29 210 211 212 213 burst size (bytes)

0.0% 0.1% 0.2% 0.3% 0.4% 0.5%

0% 20% 40% 60% 80% 100% 26 27 28 29 210 211 212 213 information leakage (%) burst size (bytes)

0.0% 0.2% 0.4% 0.6% 0.8% 1.0% 1.2% 1.4% 1.6%

220 214

medium

224 216

small

large

216 212

slide-44
SLIDE 44

0% 20% 40% 60% 80% 100% 26 27 28 29 210 211 212 213 burst size (bytes)

0.00% 0.02% 0.04% 0.06% 0.08% 0.10% 0.12% 0.14%

0% 20% 40% 60% 80% 100% 26 27 28 29 210 211 212 213 information leakage (%) burst size (bytes)

0.0% 0.2% 0.4% 0.6% 0.8% 1.0% 1.2% 1.4% 1.6%

0% 20% 40% 60% 80% 100% 26 27 28 29 210 211 212 213 burst size (bytes)

0.0% 0.1% 0.2% 0.3% 0.4% 0.5%

Experimental Results

14 / 16 ACM/IEEE CODES + ISSS 2018, Turin, Italy

Quantitative Security Analysis - MULTS

220 214

medium

224 216

small

large

216 212

slide-45
SLIDE 45

Experimental Results

14 / 16 ACM/IEEE CODES + ISSS 2018, Turin, Italy

Performance Analysis - GRAY

1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 26 27 28 29 210 211 212 213

gray - large

normalized execution time burst size (bytes)

1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 26 27 28 29 210 211 212 213

gray - medium

normalized execution time burst size (bytes) 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 26 27 28 29 210 211 212 213

gray - small

normalized execution time burst size (bytes)

20 26 212 no tags

medium small

large

slide-46
SLIDE 46

Experimental Results

14 / 16 ACM/IEEE CODES + ISSS 2018, Turin, Italy

1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 26 27 28 29 210 211 212 213

gray - large

normalized execution time burst size (bytes)

1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 26 27 28 29 210 211 212 213

gray - medium

normalized execution time burst size (bytes) 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 26 27 28 29 210 211 212 213

gray - small

normalized execution time burst size (bytes)

20 26 212 no tags

Performance Analysis - GRAY

medium small

large

slide-47
SLIDE 47

Experimental Results

14 / 16 ACM/IEEE CODES + ISSS 2018, Turin, Italy

1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 26 27 28 29 210 211 212 213

gray - large

normalized execution time burst size (bytes)

1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 26 27 28 29 210 211 212 213

gray - medium

normalized execution time burst size (bytes) 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 26 27 28 29 210 211 212 213

gray - small

normalized execution time burst size (bytes)

20 26 212 no tags

Performance Analysis - GRAY

medium small

large

slide-48
SLIDE 48

Experimental Results

14 / 16 ACM/IEEE CODES + ISSS 2018, Turin, Italy

1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 26 27 28 29 210 211 212 213

gray - large

normalized execution time burst size (bytes)

1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 26 27 28 29 210 211 212 213

gray - small

normalized execution time burst size (bytes) 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 26 27 28 29 210 211 212 213

gray - medium

normalized execution time burst size (bytes)

20 26 212 no tags

Performance Analysis - GRAY

medium small

large

slide-49
SLIDE 49

15 / 16

  • We propose PAGURUS, a flexible methodology to

design a shell that extends DIFT to accelerators

  • 1. The shell design is independent from

the accelerator design and vice versa

  • 2. The shell has negligible cost overhead

and reasonable performance overhead

  • We define the metric of information leakage for

accelerators to quantitatively measure security

ACM/IEEE CODES + ISSS 2018, Turin, Italy

Conclusions

slide-50
SLIDE 50

Speaker: Luca Piccolboni Columbia University, NY

Questions?

ACM/IEEE CODES + ISSS 2018, Turin, Italy

PAGURUS: Low-Overhead Dynamic Information Flow Tracking on Loosely Couple Accelerators