A Case for Clumsy Packet Processors Arindam Mallik and Gokhan Memik - - PowerPoint PPT Presentation
A Case for Clumsy Packet Processors Arindam Mallik and Gokhan Memik - - PowerPoint PPT Presentation
A Case for Clumsy Packet Processors Arindam Mallik and Gokhan Memik Electrical and Computer Engineering Dept. Northwestern University Overview Faults Correctness is overrated What if the higher levels take care of it? Processor
12/15/2004 International Symposium on Microarchitecture - MICRO 37 2
Overview
Faults Correctness is overrated What if the higher levels take care of it?
Processor can be even more aggressive/speculative
Application-specific correctness
Networking applications How do we measure?
Tools for architects
Relation between overclocking and faults
Treat correctness as an objective, not a requirement
12/15/2004 International Symposium on Microarchitecture - MICRO 37 3
Outline
Introduction Application description and error metrics Error models for overclocking a cache Processor configuration Measurement definitions Simulations
12/15/2004 International Symposium on Microarchitecture - MICRO 37 4
Motivation
Performance, energy requirements Reliability / Probabilistic Circuits Circuit designers have to be conservative
Worst-case design
12/15/2004 International Symposium on Microarchitecture - MICRO 37 5
Introduction
Inherent possibility of fault occurrence
Adverse environmental conditions Aggressive scaling of supply voltage Smaller manufacturing technologies
Need for analysis
More Transistors Higher fault probability
Effect on system integrity
Transient faults Permanent faults
12/15/2004 International Symposium on Microarchitecture - MICRO 37 6
Application Errors
For desktop processor or server
Capture and eliminate all faults
Networking – Communication
A certain level of error is acceptable Nevertheless
The integrity of the system behavior must be maintained System impact Excessive “resubmission”
Program output
12/15/2004 International Symposium on Microarchitecture - MICRO 37 7
Overview of Approach
Overclocking vs. Fault Modeling Application Error Metrics Simulator
- Performance
- Application Errors
Configuration Comparison Metric
12/15/2004 International Symposium on Microarchitecture - MICRO 37 8
Error Classification
Fault vs. Error Effect or duration
Volatile Error
Occurs mostly while processing a packet Effects unit data element Error in a single packet
Non-volatile Error
Occurs in the static data structures Effects seen in many elements Error in routing table
12/15/2004 International Symposium on Microarchitecture - MICRO 37 9
Error Metrics for Applications
Categorization of NetBench Applications
Low or micro-level
- Routines related to lowest layers of network stack
Routing-level
- Applications similar to traditional IP routing (Layer 3-4 of the
network stack)
Application-level
- Traditional as well as emerging applications
Common property of all applications
Control level tasks Data level tasks
12/15/2004 International Symposium on Microarchitecture - MICRO 37 10
Error Measurement Procedure
Mark data structures in NetBench apps
Important Data Structures
- Routing Table Entries, TTL Value, …
Outputs of Key Function Units
- Checksum Value, NAT Address
Perform simulation
Introduce hardware faults
Mark the change
Data values change Application behavior changes
Define the application error rate
12/15/2004 International Symposium on Microarchitecture - MICRO 37 11
A Sample Application - Route
Route – one of the most common networking
applications
Implements IPv4 routing Receives each packet – table lookup – processes
it to decide the next network hop
Error Keys
Routing Table Initialization (IMPORTANT !!) Checksum value TTL Value Path traversed in Routing Table for each packet
12/15/2004 International Symposium on Microarchitecture - MICRO 37 12
Fault Models for Overclocking
Overclock a component
Increased performance Reduced energy Increase in fault probability
Goal
Find fault vs. overclocking aggressiveness
Particular circuit design
Parameters
Voltage swing, noise
12/15/2004 International Symposium on Microarchitecture - MICRO 37 13
Opportunity for overclocking
Voltage swing
Rapid increase at first Slow increase later
Voltage Swing vs. Time
12/15/2004 International Symposium on Microarchitecture - MICRO 37 14
Not so fast, my friend!
Noise (inductive and/or capacitive)
Signal deviation
Overclocking
Reduced immunity
12/15/2004 International Symposium on Microarchitecture - MICRO 37 15
Analyze each component separately 6-transistor SRAM cell
Input, clock, feedback loop
Approach
12/15/2004 International Symposium on Microarchitecture - MICRO 37 16
Finding fault probability
Analyze the impact of noise on the feedback
loop Noise immunity curves
Different noise amplitude probabilities
Check all switching combinations
Vfs 0.89Vfs 0.78Vfs 0.67Vfs 0.56Vfs 0.39Vfs 0.61Vfs 0.50Vfs
Noise immunity curves
0.05*22n
Noise amplitude for switching comb.
r A e r A P 8 . 28 * 8 . 28 ) ( − =
12/15/2004 International Symposium on Microarchitecture - MICRO 37 17
Estimation model
Fit distribution into immunity Combine it with voltage swing vs. time
0.00E+ 00 5.00E-05 1.00E-04 1.50E-04 2.00E-04 2.50E-04 3.00E-04 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
relative voltage swing (Vrs )
1.00E-07 1.00E-06 1.00E-05 1.00E-04 1.00E-03 1.00E-02 1.00E-01 1.00E+00 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 relative cycle time (Cr )
Data Formula
Fault probability versus voltage swing
6 7 * 6 1 7
2 2
* 10 * 59 . 2 * 10 * 59 . 2
r r
F C E
e e P
− −
= =
Fault probability versus relative clock frequency
12/15/2004 International Symposium on Microarchitecture - MICRO 37 18
Outline
Introduction Application description and error metrics Error models for overclocking a cache Processor configuration Measurement definitions Simulations
12/15/2004 International Symposium on Microarchitecture - MICRO 37 19
Processor Configuration
Fault detection
No detection Parity
- One-strike, two-strikes, three-strikes
Overclocking
Static
- 75%, 50%, and 25% of the original
Dynamic
- Processors adapts according to fault observed
- Frequency is adjusted at the end of each epoch
12/15/2004 International Symposium on Microarchitecture - MICRO 37 20
Measurement definitions
Comparison between ideal and erroneous
execution
Traditional parameters – unfair competition Consider both performance and reliability
Energy-Delay-Fallibility product
Energyk x delaym x fallibilityn Fallibility = unit error occurrence probability Can adjust the importance of faults by changing n In present work, k = 1; m = 2; n = 2
12/15/2004 International Symposium on Microarchitecture - MICRO 37 21
Simulations
SimpleScalar Simulator for StrongARM 110
Roughly an execution core of a Network Processor Separate 4 KB direct mapped L1 data and instruction
caches
128 KB 4-way set-associative unified L2 cache
Error Probability
At normal clock frequency
- Error probability = 2.59*10-7 per bit
Increased error probability at higher clock rate according to
the fault model
12/15/2004 International Symposium on Microarchitecture - MICRO 37 22
Application Error Behavior
0.002 0.004 0.006 0.008 0.01 0.012 100% 75% 50% 25%
Relative Clock Cycle
Error Probability
Initialization Error Interface Value Destn Add Radix Tree Entry Translated IP Address Fatal Error
0.005 0.01 0.015 0.02 0.025 100% 75% 50% 25% Relative Clock Cycle Error Probability
Initialization Error Interface Value Destn Add Radix Tree Entry Translated IP Address Fatal Error 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 100% 75% 50% 25% Relative Clock Cycle Error Probability Initialization Error Interface Value Destn Add Radix Tree Entry Translated IP Address Fatal Error
Control plane Data plane Error introduced in both control and data plane
12/15/2004 International Symposium on Microarchitecture - MICRO 37 23
Fatal Error Probability
Curse on the system
Destroys integrity – unacceptable Increases with high clock frequency
Observed on system with no error detection
0.0002 0.0004 0.0006 0.0008 0.001 0.0012 route drr nat tl url md5 crc avrg Applications Probability 100% 75% 50% 25%
12/15/2004 International Symposium on Microarchitecture - MICRO 37 24
0.6 0.8 1 1.2 1.4 1.6 1.8 2 no detection
- ne-strike
two strikes three strikes Recovery Scheme
Energy-Delay^2-Fallibility^2
1 0.75 0.5 0.25 dynamic
Energy-Delay-Fallibility Values
High Energy-Delay-Fallibility
Higher fallibility rate Increased execution cycle
Extra instructions due to errors Erroneous load cache miss
12/15/2004 International Symposium on Microarchitecture - MICRO 37 25
Conclusions
Release correctness constraint Application-Specific Processors
Utilizing released correctness Application-Specific error metrics
Overclocking
Fault modeling for overclocking a data cache
Error weighting – metrics
12/15/2004 International Symposium on Microarchitecture - MICRO 37 26
Thanks!
Anonymous reviewers
- Dr. Masud Chowdhury, Dr. Yehea Ismail,