A Run-Time Memory Protection Methodology Udaya Seshua, Nagaraju - - PowerPoint PPT Presentation
A Run-Time Memory Protection Methodology Udaya Seshua, Nagaraju - - PowerPoint PPT Presentation
A Run-Time Memory Protection Methodology Udaya Seshua, Nagaraju Bussa*, Bart Vermeulen NXP Semiconductors, *Philips Research 12 th Asian and South Pacific Design Automation Conference 2007 January 25, 2007, Yokohama, Japan Agenda
2
Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007
Agenda
- Introduction
- Motivation
- Debugging Run-Time Memory Corruption
- Prior Work
- Proposed Debug Methodology
– Hardware Design – Software Design
- Experimental Results
- Conclusion
3
Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007
Introduction
- System chips are becoming more and more complex
– More transistors per mm2, customer requirements, embedded processors & SW, mixed processes...
# Transistors per die
1k 10B 1B 100M 10M 1M 100k 1975 1980 1985 1990 1995 2000 2005 2010 1970 CPU Memory Frequency
source: Intel, ITRS roadmap
Code Size Evolution of High End TV Software
1 2 4 8 16 32 64 256 512 1024 2048 3000 4096 12000 32000 64000 100000
1k 10k 100k 1M 10M 100M 1978 1982 1986 1990 1994 1998 2002 2006 2009 10K
source Rob van Ommering, PRLE Informatica Colloquium, October 2005
TV ROM size
4
Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007
Introduction
- Extensive pre-silicon verification
– Formal Verification – Simulation – Timing Verification – Emulation – DRC, LVS …
Effort as % of Project Time
Verification
47%
Design
53%
source: Collet International Research Inc.
- No guarantee that all HW and SW errors
are removed before silicon – Too many use cases – Mandatory trade-off between amount
- f detail and speed
- Debugging embedded software on
prototype silicon is a necessity – Find remaining SW and HW errors
1 2 3 4 5 or above 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
ASSP ASIC
Industry Silicon Spins
source: Numetrics Management Systems, Inc.
5
Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007
Motivation
- In any application nearly 70% of code deals with memory transfers
- Memory-related bugs are among the most prevalent and difficult to
catch – particularly in applications written in an unsafe language such as C/C++
- In an embedded system, a single memory access error can cause an
application to behave unpredictably or even a delayed crash
- A good debug infrastructure capable of locating memory-related bugs
quickly is key to reducing the effort spent on software debug
6
Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007
- A single incorrect memory access can
crash an application and/or threaten its security
Debugging Run-Time Memory Corruption
0x1234
Memory
0x1234 Data
- 1. Fetch Pointer Value
- 2. Access data
referenced by pointer Processor Processor
7
Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007
- A single incorrect memory access can
crash an application and/or threaten its security
Debugging Run-Time Memory Corruption
Memory
0x1234
- 2. Access unintended data
referenced by corrupted pointer
0x1340
Processor Processor
Corrupted Pointer caused by bug or security breach
- 1. Fetch Pointer Value
0x1234 0x1340 Data Unintended Data
- How do we detect these errors efficiently at run-time?
8
Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007
Prior Work
- Mostly software-only methods (“Purify, xGCC and the like”)
– High performance penalty (5-10x not uncommon) – Not acceptable in real-time, embedded systems
- Available HW support often used on ad-hoc basis
– a Memory Management Unit – a Processor data breakpoint
- “Whatever is available can and will be used!”
– Even if it wasn’t designed for this purpose
- Results in long and unpredictable debug times
– Slipping deadlines, market and possibly customer loss
9
Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007
Proposed Debug Methodology
- Structured Integrated Hardware/Software Approach
– Monitor memory accesses of an application
- Flag invalid accesses for QoS, security or debug
– Perform frequently recurring tasks in hardware
- Compare memory addresses with valid regions
– Keep configurability in software for flexibility
- Configure valid regions
- Make optimal trade-off between
– Hardware cost, i.e. silicon area – Software cost, i.e. performance drop
10
Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007
Proposed Debug Methodology
Run-Time Memory Protection Architecture
bus (e.g. AXI)
Processor Processor Peripheral 1 Peripheral 1 Memory Interface Memory Interface
- • •
Peripheral N Peripheral N Region Protection Module Region Protection Module
main() { func1(); func2(); } func1() { p = malloc(127); int a[10]; int b[10]; free(p); a[10]=0; } func2() { int a[10]; int b[10]; } main() { func1(); func2(); } func1() { p = malloc(127); int a[10]; int b[10]; free(p); a[10]=0; } func2() { int a[10]; int b[10]; } rpu_id=1; main() { rpus_initialize(0); func1(); func2(); } func1() { id = rpu_id++; p = malloc(127); rpus_heap_enable(127,p); int a[10]; rpus_stack_enable(10,a,id); int b[10]; rpus_stack_enable(10,b,id); free(p); rpus_heap_disable(p); rpus_check_access(a+10); a[10]=0; rpus_stack_disable(id); } func2() { id = rpu_id++; int a[10]; rpus_stack_enable(10,a,id); Int b[10]; rpus_stack_enable(10,b,id); rpus_stack_disable(id); } rpu_id=1; main() { rpus_initialize(0); func1(); func2(); } func1() { id = rpu_id++; p = malloc(127); rpus_heap_enable(127,p); int a[10]; rpus_stack_enable(10,a,id); int b[10]; rpus_stack_enable(10,b,id); free(p); rpus_heap_disable(p); rpus_check_access(a+10); a[10]=0; rpus_stack_disable(id); } func2() { id = rpu_id++; int a[10]; rpus_stack_enable(10,a,id); Int b[10]; rpus_stack_enable(10,b,id); rpus_stack_disable(id); }memory access violation detected
signal debugger SW
11
Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007
Proposed Debug Methodology
RPM Hardware Architecture
Bus Adapter
heap RPU 1 heap RPU N stack RPU 1 stack RPU M
- r
- • •
- • •
stack_in stack_fallback_n heap_in rpu_mode rpu_data
RPU controller
data_in address data_out read write
12
Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007
Proposed Debug Methodology
Heap RPU Hardware Block Diagram
cascade_out cascade_in mode data clk
control
base size
sub A<B?
- r
13
Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007
Proposed Debug Methodology
RPM Hardware Design Flow
Benchmark applications
RPU module
IP generation & instantiation RPU Design Algorithm
<?xmlversion = "1.0"?> <rpu_slave> <name>rdt_rpu32_slave</name> <size>32</size> <rpu><type>heap</type><max_size>8</max_size><bits>256</bits></rpu> <rpu><type>heap</type><max_size>6</max_size><bits>64</bits></rpu> <rpu><type>heap</type><max_size>9</max_size><bits>400</bits></rpu> <rpu><type>heap</type><max_size>5</max_size><bits>32</bits></rpu> <rpu><type>heap</type><max_size>7</max_size><bits>128</bits></rpu> <rpu><type>stack</type><max_size>8</max_size><bits>200</bits></rpu> <rpu><type>stack</type><max_size>7</max_size><bits>128</bits></rpu> <rpu><type>heap</type><max_size>10</max_size><bits>786</bits></rpu> <rpu><type>heap</type><max_size>5</max_size><bits>32</bits></rpu> <rpu><type>heap</type><max_size>10</max_size><bits>600</bits></rpu> <rpu><type>heap</type><max_size>10</max_size><bits>800</bits></rpu> </rpu_slave> <?xmlversion = "1.0"?> <rpu_slave> <name>rdt_rpu32_slave</name> <size>32</size> <rpu><type>heap</type><max_size>8</max_size><bits>256</bits></rpu> <rpu><type>heap</type><max_size>6</max_size><bits>64</bits></rpu> <rpu><type>heap</type><max_size>9</max_size><bits>400</bits></rpu> <rpu><type>heap</type><max_size>5</max_size><bits>32</bits></rpu> <rpu><type>heap</type><max_size>7</max_size><bits>128</bits></rpu> <rpu><type>stack</type><max_size>8</max_size><bits>200</bits></rpu> <rpu><type>stack</type><max_size>7</max_size><bits>128</bits></rpu> <rpu><type>heap</type><max_size>10</max_size><bits>786</bits></rpu> <rpu><type>heap</type><max_size>5</max_size><bits>32</bits></rpu> <rpu><type>heap</type><max_size>10</max_size><bits>600</bits></rpu> <rpu><type>heap</type><max_size>10</max_size><bits>800</bits></rpu> </rpu_slave>
XML desciption of required RPU components
Memory Usage Analysis
Memory statistics
# RPUs 1000 2000 3000 4000 5000 6000 7000 RPU size (in bits) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # RPUs 2 4 6 8 10 RPU size (in bits) 1 2 3 4 5 6 7 8 9 10 11 12 13 1414
Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007
Proposed Debug Methodology
Hardware Features
- Features
– Adds fine-grain memory protection
- Complementary to MMU’s page-based protection
– Reconfigurable at run-time – Area-efficient – Scalable – Fits any (industry-)standard bus interface
- AXI, OCP, DTL, MTL …
- Options
– Direct bus snoop ⇔ Address sent by SW – Generate interrupt ⇔ Valid query in SW – Complementary IEEE 1149.1 (JTAG) access
15
Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007
Proposed Debug Methodology
Software Design Flow
Application instrumentation by compiler
- Application compile time
– Identify regions to protect per thread using the compiler – Instrument application
rpu_id=1; main() { rpus_initialize(0); func1(); func2(); } func1() { id = rpu_id++; p = malloc(127); rpus_heap_enable(127,p); int a[10]; rpus_stack_enable(10,a,id); int b[10]; rpus_stack_enable(10,b,id); free(p); rpus_heap_disable(p); rpus_check_access(a+10); a[10]=0; rpus_stack_disable(id); } func2() { id = rpu_id++; int a[10]; rpus_stack_enable(10,a,id); Int b[10]; rpus_stack_enable(10,b,id); rpus_stack_disable(id); } rpu_id=1; main() { rpus_initialize(0); func1(); func2(); } func1() { id = rpu_id++; p = malloc(127); rpus_heap_enable(127,p); int a[10]; rpus_stack_enable(10,a,id); int b[10]; rpus_stack_enable(10,b,id); free(p); rpus_heap_disable(p); rpus_check_access(a+10); a[10]=0; rpus_stack_disable(id); } func2() { id = rpu_id++; int a[10]; rpus_stack_enable(10,a,id); Int b[10]; rpus_stack_enable(10,b,id); rpus_stack_disable(id); }New SoC Application
main() { func1(); func2(); } func1() { p = malloc(127); int a[10]; int b[10]; free(p); a[10]=0; } func2() { int a[10]; int b[10]; } main() { func1(); func2(); } func1() { p = malloc(127); int a[10]; int b[10]; free(p); a[10]=0; } func2() { int a[10]; int b[10]; }Processor Processor RPU module RPU module
- Application run-time
– Memory region violations detected by RPU hardware – Handling is done by
- CPU software, and/or
- Debugger software
16
Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007
Proposed Debug Methodology
Software API Example
static int rpu_id=1; main() { rpus_initialize(); func1(); func2(); } func1() { id = rpu_id++; p = malloc(127); rpus_heap_enable(127,p); int a[10]; rpus_stack_enable(10,a,id); int b[10]; rpus_stack_enable(10,b,id); free(p); rpus_heap_disable(p); rpus_check_access(a+10); a[10]=0; rpus_stack_disable(id); } func2() { id = rpu_id++; int a[10]; rpus_stack_enable(10,a,id); Int b[10]; rpus_stack_enable(10,b,id); rpus_stack_disable(id); } static int rpu_id=1; main() { rpus_initialize(); func1(); func2(); } func1() { id = rpu_id++; p = malloc(127); rpus_heap_enable(127,p); int a[10]; rpus_stack_enable(10,a,id); int b[10]; rpus_stack_enable(10,b,id); free(p); rpus_heap_disable(p); rpus_check_access(a+10); a[10]=0; rpus_stack_disable(id); } func2() { id = rpu_id++; int a[10]; rpus_stack_enable(10,a,id); Int b[10]; rpus_stack_enable(10,b,id); rpus_stack_disable(id); } main() { func1(); func2(); } func1() { p = malloc(127); int a[10]; int b[10]; free(p); a[10]=0; } func2() { int a[10]; int b[10]; } main() { func1(); func2(); } func1() { p = malloc(127); int a[10]; int b[10]; free(p); a[10]=0; } func2() { int a[10]; int b[10]; }
Compiler Compiler
17
Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007
Experimental Results
- Modified open-source GCC compiler on Linux
- ARM Cross-compiler
- MiBench (http://www.eecs.umich.edu/mibench/)
– Commercially representative embedded benchmarks – Automotive, Consumer, Network, Office, Security, and Telecommunication
- Measured:
– Software performance drop – Minimum number of required RPUs
18
Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007
Experimental Results
Application Speed per Benchmark
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
basicmath bitcnts qsort susan_corners susan_edges susan_smoothing sha crc32 fft ifft adpcm_c adpcm_d ispell search
Benchmark
Application Speed (Original = 100%)
Mudflap RPM (no snooping) RPM (snooping)
19
Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007
Experimental Results
RPU Hardware Cost
0.0% 0.2% 0.4% 0.6% 0.8% 1.0% 1.2% 1.4% 1.6% 1.8% 2.0%
basicmath bitcnts qsort susan_corners susan_edges susan_smoothing sha crc32 fft ifft adpcm_c adpcm_d ispell search
Mudflap RPM (no snooping) RPM (snooping)
Hardware Area (CPU = 100%)
Benchmark
20
Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007
Conclusions
- Run-Time Memory Protection Architecture
– Effective against memory corruption – Efficient through
- Re-use of existing RPU hardware
- Optimal trade-off between HW and SW cost
- We developed tool support for
– Memory allocation & access analysis – Hardware and software trade-off – RPU hardware design – Application instrumentation
21
Udaya Seshua, “A Run-Time Memory Protection Methodology”, 12th Asian and South Pacific Design Automation Conference 2007, January 25th, 2007