A Case for Clumsy Packet Processors Arindam Mallik and Gokhan Memik - - PowerPoint PPT Presentation

a case for clumsy packet processors
SMART_READER_LITE
LIVE PREVIEW

A Case for Clumsy Packet Processors Arindam Mallik and Gokhan Memik - - PowerPoint PPT Presentation

A Case for Clumsy Packet Processors Arindam Mallik and Gokhan Memik Electrical and Computer Engineering Dept. Northwestern University Overview Faults Correctness is overrated What if the higher levels take care of it? Processor


slide-1
SLIDE 1

A Case for Clumsy Packet Processors

Arindam Mallik and Gokhan Memik Electrical and Computer Engineering Dept. Northwestern University

slide-2
SLIDE 2

12/15/2004 International Symposium on Microarchitecture - MICRO 37 2

Overview

Faults Correctness is overrated What if the higher levels take care of it?

Processor can be even more aggressive/speculative

Application-specific correctness

Networking applications How do we measure?

Tools for architects

Relation between overclocking and faults

Treat correctness as an objective, not a requirement

slide-3
SLIDE 3

12/15/2004 International Symposium on Microarchitecture - MICRO 37 3

Outline

Introduction Application description and error metrics Error models for overclocking a cache Processor configuration Measurement definitions Simulations

slide-4
SLIDE 4

12/15/2004 International Symposium on Microarchitecture - MICRO 37 4

Motivation

Performance, energy requirements Reliability / Probabilistic Circuits Circuit designers have to be conservative

Worst-case design

slide-5
SLIDE 5

12/15/2004 International Symposium on Microarchitecture - MICRO 37 5

Introduction

Inherent possibility of fault occurrence

Adverse environmental conditions Aggressive scaling of supply voltage Smaller manufacturing technologies

Need for analysis

More Transistors Higher fault probability

Effect on system integrity

Transient faults Permanent faults

slide-6
SLIDE 6

12/15/2004 International Symposium on Microarchitecture - MICRO 37 6

Application Errors

For desktop processor or server

Capture and eliminate all faults

Networking – Communication

A certain level of error is acceptable Nevertheless

The integrity of the system behavior must be maintained System impact Excessive “resubmission”

Program output

slide-7
SLIDE 7

12/15/2004 International Symposium on Microarchitecture - MICRO 37 7

Overview of Approach

Overclocking vs. Fault Modeling Application Error Metrics Simulator

  • Performance
  • Application Errors

Configuration Comparison Metric

slide-8
SLIDE 8

12/15/2004 International Symposium on Microarchitecture - MICRO 37 8

Error Classification

Fault vs. Error Effect or duration

Volatile Error

Occurs mostly while processing a packet Effects unit data element Error in a single packet

Non-volatile Error

Occurs in the static data structures Effects seen in many elements Error in routing table

slide-9
SLIDE 9

12/15/2004 International Symposium on Microarchitecture - MICRO 37 9

Error Metrics for Applications

Categorization of NetBench Applications

Low or micro-level

  • Routines related to lowest layers of network stack

Routing-level

  • Applications similar to traditional IP routing (Layer 3-4 of the

network stack)

Application-level

  • Traditional as well as emerging applications

Common property of all applications

Control level tasks Data level tasks

slide-10
SLIDE 10

12/15/2004 International Symposium on Microarchitecture - MICRO 37 10

Error Measurement Procedure

Mark data structures in NetBench apps

Important Data Structures

  • Routing Table Entries, TTL Value, …

Outputs of Key Function Units

  • Checksum Value, NAT Address

Perform simulation

Introduce hardware faults

Mark the change

Data values change Application behavior changes

Define the application error rate

slide-11
SLIDE 11

12/15/2004 International Symposium on Microarchitecture - MICRO 37 11

A Sample Application - Route

Route – one of the most common networking

applications

Implements IPv4 routing Receives each packet – table lookup – processes

it to decide the next network hop

Error Keys

Routing Table Initialization (IMPORTANT !!) Checksum value TTL Value Path traversed in Routing Table for each packet

slide-12
SLIDE 12

12/15/2004 International Symposium on Microarchitecture - MICRO 37 12

Fault Models for Overclocking

Overclock a component

Increased performance Reduced energy Increase in fault probability

Goal

Find fault vs. overclocking aggressiveness

Particular circuit design

Parameters

Voltage swing, noise

slide-13
SLIDE 13

12/15/2004 International Symposium on Microarchitecture - MICRO 37 13

Opportunity for overclocking

Voltage swing

Rapid increase at first Slow increase later

Voltage Swing vs. Time

slide-14
SLIDE 14

12/15/2004 International Symposium on Microarchitecture - MICRO 37 14

Not so fast, my friend!

Noise (inductive and/or capacitive)

Signal deviation

Overclocking

Reduced immunity

slide-15
SLIDE 15

12/15/2004 International Symposium on Microarchitecture - MICRO 37 15

Analyze each component separately 6-transistor SRAM cell

Input, clock, feedback loop

Approach

slide-16
SLIDE 16

12/15/2004 International Symposium on Microarchitecture - MICRO 37 16

Finding fault probability

Analyze the impact of noise on the feedback

loop Noise immunity curves

Different noise amplitude probabilities

Check all switching combinations

Vfs 0.89Vfs 0.78Vfs 0.67Vfs 0.56Vfs 0.39Vfs 0.61Vfs 0.50Vfs

Noise immunity curves

0.05*22n

Noise amplitude for switching comb.

r A e r A P 8 . 28 * 8 . 28 ) ( − =

slide-17
SLIDE 17

12/15/2004 International Symposium on Microarchitecture - MICRO 37 17

Estimation model

Fit distribution into immunity Combine it with voltage swing vs. time

0.00E+ 00 5.00E-05 1.00E-04 1.50E-04 2.00E-04 2.50E-04 3.00E-04 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

relative voltage swing (Vrs )

1.00E-07 1.00E-06 1.00E-05 1.00E-04 1.00E-03 1.00E-02 1.00E-01 1.00E+00 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 relative cycle time (Cr )

Data Formula

Fault probability versus voltage swing

6 7 * 6 1 7

2 2

* 10 * 59 . 2 * 10 * 59 . 2

r r

F C E

e e P

− −

= =

Fault probability versus relative clock frequency

slide-18
SLIDE 18

12/15/2004 International Symposium on Microarchitecture - MICRO 37 18

Outline

Introduction Application description and error metrics Error models for overclocking a cache Processor configuration Measurement definitions Simulations

slide-19
SLIDE 19

12/15/2004 International Symposium on Microarchitecture - MICRO 37 19

Processor Configuration

Fault detection

No detection Parity

  • One-strike, two-strikes, three-strikes

Overclocking

Static

  • 75%, 50%, and 25% of the original

Dynamic

  • Processors adapts according to fault observed
  • Frequency is adjusted at the end of each epoch
slide-20
SLIDE 20

12/15/2004 International Symposium on Microarchitecture - MICRO 37 20

Measurement definitions

Comparison between ideal and erroneous

execution

Traditional parameters – unfair competition Consider both performance and reliability

Energy-Delay-Fallibility product

Energyk x delaym x fallibilityn Fallibility = unit error occurrence probability Can adjust the importance of faults by changing n In present work, k = 1; m = 2; n = 2

slide-21
SLIDE 21

12/15/2004 International Symposium on Microarchitecture - MICRO 37 21

Simulations

SimpleScalar Simulator for StrongARM 110

Roughly an execution core of a Network Processor Separate 4 KB direct mapped L1 data and instruction

caches

128 KB 4-way set-associative unified L2 cache

Error Probability

At normal clock frequency

  • Error probability = 2.59*10-7 per bit

Increased error probability at higher clock rate according to

the fault model

slide-22
SLIDE 22

12/15/2004 International Symposium on Microarchitecture - MICRO 37 22

Application Error Behavior

0.002 0.004 0.006 0.008 0.01 0.012 100% 75% 50% 25%

Relative Clock Cycle

Error Probability

Initialization Error Interface Value Destn Add Radix Tree Entry Translated IP Address Fatal Error

0.005 0.01 0.015 0.02 0.025 100% 75% 50% 25% Relative Clock Cycle Error Probability

Initialization Error Interface Value Destn Add Radix Tree Entry Translated IP Address Fatal Error 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 100% 75% 50% 25% Relative Clock Cycle Error Probability Initialization Error Interface Value Destn Add Radix Tree Entry Translated IP Address Fatal Error

Control plane Data plane Error introduced in both control and data plane

slide-23
SLIDE 23

12/15/2004 International Symposium on Microarchitecture - MICRO 37 23

Fatal Error Probability

Curse on the system

Destroys integrity – unacceptable Increases with high clock frequency

Observed on system with no error detection

0.0002 0.0004 0.0006 0.0008 0.001 0.0012 route drr nat tl url md5 crc avrg Applications Probability 100% 75% 50% 25%

slide-24
SLIDE 24

12/15/2004 International Symposium on Microarchitecture - MICRO 37 24

0.6 0.8 1 1.2 1.4 1.6 1.8 2 no detection

  • ne-strike

two strikes three strikes Recovery Scheme

Energy-Delay^2-Fallibility^2

1 0.75 0.5 0.25 dynamic

Energy-Delay-Fallibility Values

High Energy-Delay-Fallibility

Higher fallibility rate Increased execution cycle

Extra instructions due to errors Erroneous load cache miss

slide-25
SLIDE 25

12/15/2004 International Symposium on Microarchitecture - MICRO 37 25

Conclusions

Release correctness constraint Application-Specific Processors

Utilizing released correctness Application-Specific error metrics

Overclocking

Fault modeling for overclocking a data cache

Error weighting – metrics

slide-26
SLIDE 26

12/15/2004 International Symposium on Microarchitecture - MICRO 37 26

Thanks!

Anonymous reviewers

  • Dr. Masud Chowdhury, Dr. Yehea Ismail,

Sasha Jevtic, Dr. Bill Mangione-Smith, Dr. Seda O. Memik, Matthew Wildrick