A Run-Time Memory Protection Methodology Udaya Seshua, Nagaraju - - PowerPoint PPT Presentation

a run time memory protection methodology
SMART_READER_LITE
LIVE PREVIEW

A Run-Time Memory Protection Methodology Udaya Seshua, Nagaraju - - PowerPoint PPT Presentation

A Run-Time Memory Protection Methodology Udaya Seshua, Nagaraju Bussa*, Bart Vermeulen NXP Semiconductors, *Philips Research 12 th Asian and South Pacific Design Automation Conference 2007 January 25, 2007, Yokohama, Japan Agenda


slide-1
SLIDE 1

Udaya Seshua, Nagaraju Bussa*, Bart Vermeulen NXP Semiconductors, *Philips Research 12th Asian and South Pacific Design Automation Conference 2007 January 25, 2007, Yokohama, Japan

A Run-Time Memory Protection Methodology

slide-2
SLIDE 2

2

Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007

Agenda

  • Introduction
  • Motivation
  • Debugging Run-Time Memory Corruption
  • Prior Work
  • Proposed Debug Methodology

– Hardware Design – Software Design

  • Experimental Results
  • Conclusion
slide-3
SLIDE 3

3

Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007

Introduction

  • System chips are becoming more and more complex

– More transistors per mm2, customer requirements, embedded processors & SW, mixed processes...

# Transistors per die

1k 10B 1B 100M 10M 1M 100k 1975 1980 1985 1990 1995 2000 2005 2010 1970 CPU Memory Frequency

source: Intel, ITRS roadmap

Code Size Evolution of High End TV Software

1 2 4 8 16 32 64 256 512 1024 2048 3000 4096 12000 32000 64000 100000

1k 10k 100k 1M 10M 100M 1978 1982 1986 1990 1994 1998 2002 2006 2009 10K

source Rob van Ommering, PRLE Informatica Colloquium, October 2005

TV ROM size

slide-4
SLIDE 4

4

Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007

Introduction

  • Extensive pre-silicon verification

– Formal Verification – Simulation – Timing Verification – Emulation – DRC, LVS …

Effort as % of Project Time

Verification

47%

Design

53%

source: Collet International Research Inc.

  • No guarantee that all HW and SW errors

are removed before silicon – Too many use cases – Mandatory trade-off between amount

  • f detail and speed
  • Debugging embedded software on

prototype silicon is a necessity – Find remaining SW and HW errors

1 2 3 4 5 or above 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

ASSP ASIC

Industry Silicon Spins

source: Numetrics Management Systems, Inc.

slide-5
SLIDE 5

5

Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007

Motivation

  • In any application nearly 70% of code deals with memory transfers
  • Memory-related bugs are among the most prevalent and difficult to

catch – particularly in applications written in an unsafe language such as C/C++

  • In an embedded system, a single memory access error can cause an

application to behave unpredictably or even a delayed crash

  • A good debug infrastructure capable of locating memory-related bugs

quickly is key to reducing the effort spent on software debug

slide-6
SLIDE 6

6

Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007

  • A single incorrect memory access can

crash an application and/or threaten its security

Debugging Run-Time Memory Corruption

0x1234

Memory

0x1234 Data

  • 1. Fetch Pointer Value
  • 2. Access data

referenced by pointer Processor Processor

slide-7
SLIDE 7

7

Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007

  • A single incorrect memory access can

crash an application and/or threaten its security

Debugging Run-Time Memory Corruption

Memory

0x1234

  • 2. Access unintended data

referenced by corrupted pointer

0x1340

Processor Processor

Corrupted Pointer caused by bug or security breach

  • 1. Fetch Pointer Value

0x1234 0x1340 Data Unintended Data

  • How do we detect these errors efficiently at run-time?
slide-8
SLIDE 8

8

Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007

Prior Work

  • Mostly software-only methods (“Purify, xGCC and the like”)

– High performance penalty (5-10x not uncommon) – Not acceptable in real-time, embedded systems

  • Available HW support often used on ad-hoc basis

– a Memory Management Unit – a Processor data breakpoint

  • “Whatever is available can and will be used!”

– Even if it wasn’t designed for this purpose

  • Results in long and unpredictable debug times

– Slipping deadlines, market and possibly customer loss

slide-9
SLIDE 9

9

Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007

Proposed Debug Methodology

  • Structured Integrated Hardware/Software Approach

– Monitor memory accesses of an application

  • Flag invalid accesses for QoS, security or debug

– Perform frequently recurring tasks in hardware

  • Compare memory addresses with valid regions

– Keep configurability in software for flexibility

  • Configure valid regions
  • Make optimal trade-off between

– Hardware cost, i.e. silicon area – Software cost, i.e. performance drop

slide-10
SLIDE 10

10

Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007

Proposed Debug Methodology

Run-Time Memory Protection Architecture

bus (e.g. AXI)

Processor Processor Peripheral 1 Peripheral 1 Memory Interface Memory Interface

  • • •

Peripheral N Peripheral N Region Protection Module Region Protection Module

main() { func1(); func2(); } func1() { p = malloc(127); int a[10]; int b[10]; free(p); a[10]=0; } func2() { int a[10]; int b[10]; } main() { func1(); func2(); } func1() { p = malloc(127); int a[10]; int b[10]; free(p); a[10]=0; } func2() { int a[10]; int b[10]; } rpu_id=1; main() { rpus_initialize(0); func1(); func2(); } func1() { id = rpu_id++; p = malloc(127); rpus_heap_enable(127,p); int a[10]; rpus_stack_enable(10,a,id); int b[10]; rpus_stack_enable(10,b,id); free(p); rpus_heap_disable(p); rpus_check_access(a+10); a[10]=0; rpus_stack_disable(id); } func2() { id = rpu_id++; int a[10]; rpus_stack_enable(10,a,id); Int b[10]; rpus_stack_enable(10,b,id); rpus_stack_disable(id); } rpu_id=1; main() { rpus_initialize(0); func1(); func2(); } func1() { id = rpu_id++; p = malloc(127); rpus_heap_enable(127,p); int a[10]; rpus_stack_enable(10,a,id); int b[10]; rpus_stack_enable(10,b,id); free(p); rpus_heap_disable(p); rpus_check_access(a+10); a[10]=0; rpus_stack_disable(id); } func2() { id = rpu_id++; int a[10]; rpus_stack_enable(10,a,id); Int b[10]; rpus_stack_enable(10,b,id); rpus_stack_disable(id); }

memory access violation detected

signal debugger SW

slide-11
SLIDE 11

11

Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007

Proposed Debug Methodology

RPM Hardware Architecture

Bus Adapter

heap RPU 1 heap RPU N stack RPU 1 stack RPU M

  • r
  • • •
  • • •

stack_in stack_fallback_n heap_in rpu_mode rpu_data

RPU controller

data_in address data_out read write

slide-12
SLIDE 12

12

Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007

Proposed Debug Methodology

Heap RPU Hardware Block Diagram

cascade_out cascade_in mode data clk

control

base size

sub A<B?

  • r
slide-13
SLIDE 13

13

Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007

Proposed Debug Methodology

RPM Hardware Design Flow

Benchmark applications

RPU module

IP generation & instantiation RPU Design Algorithm

<?xmlversion = "1.0"?> <rpu_slave> <name>rdt_rpu32_slave</name> <size>32</size> <rpu><type>heap</type><max_size>8</max_size><bits>256</bits></rpu> <rpu><type>heap</type><max_size>6</max_size><bits>64</bits></rpu> <rpu><type>heap</type><max_size>9</max_size><bits>400</bits></rpu> <rpu><type>heap</type><max_size>5</max_size><bits>32</bits></rpu> <rpu><type>heap</type><max_size>7</max_size><bits>128</bits></rpu> <rpu><type>stack</type><max_size>8</max_size><bits>200</bits></rpu> <rpu><type>stack</type><max_size>7</max_size><bits>128</bits></rpu> <rpu><type>heap</type><max_size>10</max_size><bits>786</bits></rpu> <rpu><type>heap</type><max_size>5</max_size><bits>32</bits></rpu> <rpu><type>heap</type><max_size>10</max_size><bits>600</bits></rpu> <rpu><type>heap</type><max_size>10</max_size><bits>800</bits></rpu> </rpu_slave> <?xmlversion = "1.0"?> <rpu_slave> <name>rdt_rpu32_slave</name> <size>32</size> <rpu><type>heap</type><max_size>8</max_size><bits>256</bits></rpu> <rpu><type>heap</type><max_size>6</max_size><bits>64</bits></rpu> <rpu><type>heap</type><max_size>9</max_size><bits>400</bits></rpu> <rpu><type>heap</type><max_size>5</max_size><bits>32</bits></rpu> <rpu><type>heap</type><max_size>7</max_size><bits>128</bits></rpu> <rpu><type>stack</type><max_size>8</max_size><bits>200</bits></rpu> <rpu><type>stack</type><max_size>7</max_size><bits>128</bits></rpu> <rpu><type>heap</type><max_size>10</max_size><bits>786</bits></rpu> <rpu><type>heap</type><max_size>5</max_size><bits>32</bits></rpu> <rpu><type>heap</type><max_size>10</max_size><bits>600</bits></rpu> <rpu><type>heap</type><max_size>10</max_size><bits>800</bits></rpu> </rpu_slave>

XML desciption of required RPU components

Memory Usage Analysis

Memory statistics

# RPUs 1000 2000 3000 4000 5000 6000 7000 RPU size (in bits) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # RPUs 2 4 6 8 10 RPU size (in bits) 1 2 3 4 5 6 7 8 9 10 11 12 13 14
slide-14
SLIDE 14

14

Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007

Proposed Debug Methodology

Hardware Features

  • Features

– Adds fine-grain memory protection

  • Complementary to MMU’s page-based protection

– Reconfigurable at run-time – Area-efficient – Scalable – Fits any (industry-)standard bus interface

  • AXI, OCP, DTL, MTL …
  • Options

– Direct bus snoop ⇔ Address sent by SW – Generate interrupt ⇔ Valid query in SW – Complementary IEEE 1149.1 (JTAG) access

slide-15
SLIDE 15

15

Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007

Proposed Debug Methodology

Software Design Flow

Application instrumentation by compiler

  • Application compile time

– Identify regions to protect per thread using the compiler – Instrument application

rpu_id=1; main() { rpus_initialize(0); func1(); func2(); } func1() { id = rpu_id++; p = malloc(127); rpus_heap_enable(127,p); int a[10]; rpus_stack_enable(10,a,id); int b[10]; rpus_stack_enable(10,b,id); free(p); rpus_heap_disable(p); rpus_check_access(a+10); a[10]=0; rpus_stack_disable(id); } func2() { id = rpu_id++; int a[10]; rpus_stack_enable(10,a,id); Int b[10]; rpus_stack_enable(10,b,id); rpus_stack_disable(id); } rpu_id=1; main() { rpus_initialize(0); func1(); func2(); } func1() { id = rpu_id++; p = malloc(127); rpus_heap_enable(127,p); int a[10]; rpus_stack_enable(10,a,id); int b[10]; rpus_stack_enable(10,b,id); free(p); rpus_heap_disable(p); rpus_check_access(a+10); a[10]=0; rpus_stack_disable(id); } func2() { id = rpu_id++; int a[10]; rpus_stack_enable(10,a,id); Int b[10]; rpus_stack_enable(10,b,id); rpus_stack_disable(id); }

New SoC Application

main() { func1(); func2(); } func1() { p = malloc(127); int a[10]; int b[10]; free(p); a[10]=0; } func2() { int a[10]; int b[10]; } main() { func1(); func2(); } func1() { p = malloc(127); int a[10]; int b[10]; free(p); a[10]=0; } func2() { int a[10]; int b[10]; }

Processor Processor RPU module RPU module

  • Application run-time

– Memory region violations detected by RPU hardware – Handling is done by

  • CPU software, and/or
  • Debugger software
slide-16
SLIDE 16

16

Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007

Proposed Debug Methodology

Software API Example

static int rpu_id=1; main() { rpus_initialize(); func1(); func2(); } func1() { id = rpu_id++; p = malloc(127); rpus_heap_enable(127,p); int a[10]; rpus_stack_enable(10,a,id); int b[10]; rpus_stack_enable(10,b,id); free(p); rpus_heap_disable(p); rpus_check_access(a+10); a[10]=0; rpus_stack_disable(id); } func2() { id = rpu_id++; int a[10]; rpus_stack_enable(10,a,id); Int b[10]; rpus_stack_enable(10,b,id); rpus_stack_disable(id); } static int rpu_id=1; main() { rpus_initialize(); func1(); func2(); } func1() { id = rpu_id++; p = malloc(127); rpus_heap_enable(127,p); int a[10]; rpus_stack_enable(10,a,id); int b[10]; rpus_stack_enable(10,b,id); free(p); rpus_heap_disable(p); rpus_check_access(a+10); a[10]=0; rpus_stack_disable(id); } func2() { id = rpu_id++; int a[10]; rpus_stack_enable(10,a,id); Int b[10]; rpus_stack_enable(10,b,id); rpus_stack_disable(id); } main() { func1(); func2(); } func1() { p = malloc(127); int a[10]; int b[10]; free(p); a[10]=0; } func2() { int a[10]; int b[10]; } main() { func1(); func2(); } func1() { p = malloc(127); int a[10]; int b[10]; free(p); a[10]=0; } func2() { int a[10]; int b[10]; }

Compiler Compiler

slide-17
SLIDE 17

17

Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007

Experimental Results

  • Modified open-source GCC compiler on Linux
  • ARM Cross-compiler
  • MiBench (http://www.eecs.umich.edu/mibench/)

– Commercially representative embedded benchmarks – Automotive, Consumer, Network, Office, Security, and Telecommunication

  • Measured:

– Software performance drop – Minimum number of required RPUs

slide-18
SLIDE 18

18

Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007

Experimental Results

Application Speed per Benchmark

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

basicmath bitcnts qsort susan_corners susan_edges susan_smoothing sha crc32 fft ifft adpcm_c adpcm_d ispell search

Benchmark

Application Speed (Original = 100%)

Mudflap RPM (no snooping) RPM (snooping)

slide-19
SLIDE 19

19

Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007

Experimental Results

RPU Hardware Cost

0.0% 0.2% 0.4% 0.6% 0.8% 1.0% 1.2% 1.4% 1.6% 1.8% 2.0%

basicmath bitcnts qsort susan_corners susan_edges susan_smoothing sha crc32 fft ifft adpcm_c adpcm_d ispell search

Mudflap RPM (no snooping) RPM (snooping)

Hardware Area (CPU = 100%)

Benchmark

slide-20
SLIDE 20

20

Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007

Conclusions

  • Run-Time Memory Protection Architecture

– Effective against memory corruption – Efficient through

  • Re-use of existing RPU hardware
  • Optimal trade-off between HW and SW cost
  • We developed tool support for

– Memory allocation & access analysis – Hardware and software trade-off – RPU hardware design – Application instrumentation

slide-21
SLIDE 21

21

Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007

Thank You