DESIGNING ROBUST SYSTEMS DESIGNING ROBUST SYSTEMS with with - - PowerPoint PPT Presentation

designing robust systems designing robust systems with
SMART_READER_LITE
LIVE PREVIEW

DESIGNING ROBUST SYSTEMS DESIGNING ROBUST SYSTEMS with with - - PowerPoint PPT Presentation

DESIGNING ROBUST SYSTEMS DESIGNING ROBUST SYSTEMS with with UNCERTAIN INFORMATION UNCERTAIN INFORMATION Giovanni De Micheli Giovanni De Micheli CSL - CSL - Stanford University Stanford University ASPDAC 2003 The philosophical paradigm


slide-1
SLIDE 1

ASPDAC 2003

DESIGNING ROBUST SYSTEMS DESIGNING ROBUST SYSTEMS with with UNCERTAIN INFORMATION UNCERTAIN INFORMATION

Giovanni De Micheli Giovanni De Micheli CSL CSL -

  • Stanford University

Stanford University

slide-2
SLIDE 2

De Micheli 2 ASPDAC 2003

The philosophical paradigm

– Laplacian determinism

  • The future state of the universe can be

determined from its present state

– Quantum theory and uncertainty

  • We can neither observe nor control

microscopic features with accuracy

  • Science at the onset of the XX century
slide-3
SLIDE 3

De Micheli 3 ASPDAC 2003

The philosophical paradigm

– Design determinism

  • The complete behavior and features of

a microelectronic circuit can be derived from a hardware model

  • Synthesis technology

– Design uncertainty with nanoscale technologies

  • Need for high-level abstractions
  • Inaccuracy of low-level models
  • Design technology at the onset of the XXI century

Ir << fetch(pc); case ir is when => and acc=rega and regb

slide-4
SLIDE 4

De Micheli 4 ASPDAC 2003

The economic perspective

  • System on Chip (SoC) design:

– Increasingly more complex:

  • Many detailed electrical problems
  • Integration of different technologies

– Increasingly more expensive and risky

  • A mask set may cost over a million dollars
  • A single functional error can kill a product

– Fewer design starts

  • Large volume needed to recapture hw costs

– Software solutions are more desirable

slide-5
SLIDE 5

De Micheli 5 ASPDAC 2003

Correctness Reliability and safety Robustness

The SoC market

  • SoCs find application in many

embedded systems

  • Concerns:

Performance Energy consumption Cost

slide-6
SLIDE 6

De Micheli 6 ASPDAC 2003

Robust design

  • SoCs must preserve correct operation and

performance:

– Under varying environmental conditions – Under changes of design assumptions

  • Designing correct and performing circuits

becomes increasingly harder

– Too many factors to take into account

  • Paradigm shift needed

– Design error-tolerant and adaptive circuits

slide-7
SLIDE 7

De Micheli 7 ASPDAC 2003

Issues

  • Extremely small size

– Coping with deep submicron (DSM) technologies

  • Spreading of parameters
  • Extremely large scale

– System complexity

  • Changing environmental conditions
  • New fabrication materials

– Novel technologies

  • How to make the leap
slide-8
SLIDE 8

De Micheli 8 ASPDAC 2003

Extremely small size

Intel’s 50nm transistor [Source: IEEE Spectrum]

slide-9
SLIDE 9

De Micheli 9 ASPDAC 2003

Year Gate length (nm) Transistor density (million/cm2) Clock rate

(GHz)

Supply voltage (V) 2002 75 48 2.3 1.1 2007 35 154 6.7 0.7 2013 13 617 19.3 0.5

Silicon technology roadmap

slide-10
SLIDE 10

De Micheli 10 ASPDAC 2003

Qualitative trends

  • Continued gate downscaling
  • Increased transistor density and frequency

Power and thermal management

  • Lower supply voltage

Reduced noise immunity

  • Increased spread of physical parameters

Inaccurate modeling of physical behavior

slide-11
SLIDE 11

De Micheli 11 ASPDAC 2003

Critical design issue

  • Achieve desired performance levels with

limited energy consumption

  • Dynamic power management (DPM)

– Component shut off – Frequency and voltage downscaling

  • Explore (at run time) the voltage/delay trade
  • ff curve
slide-12
SLIDE 12

De Micheli 12 ASPDAC 2003

Design space exploration

worst case analysis

Voltage Delay

max typ min Pareto points on w.c. curve

slide-13
SLIDE 13

De Micheli 13 ASPDAC 2003

?

Adaptive design space

worst case analysis

Voltage Delay

min typ max As parameters spread, w.c. design is too pessimistic ?

slide-14
SLIDE 14

De Micheli 14 ASPDAC 2003

Self-calibrating circuits

  • The operating points of a circuit

should be determined on-line

– Variation from chip to chip – Operation at the edge of failure

  • Analogy

– Sailing boat tacking against the wind – Max gain when sailing close to wind

  • When angle is too close, large loss of speed
slide-15
SLIDE 15

De Micheli 15 ASPDAC 2003

  • General paradigm

– A circuit may be in correct or faulty operational state, depending

  • n a parameter (e.g., voltage)

– Computed/transmitted data need checks

  • If data is faulty, data is recomputed and/or retransmitted

– Error rate is monitored on line – Feedback loop to control operational state parameter based on error rate

  • Circuits can generate errors:

– Errors must be detected and corrected – Correction rate is used for calibration

How to calibrate?

slide-16
SLIDE 16

De Micheli 16 ASPDAC 2003

FIFO

1 2 Example:

  • n chip transmission scheme
  • Globally asynchronous, locally synchronous (GALS)
  • FIFO for decoupling
  • Variable transmission frequency

dd

v

dd

v

slide-17
SLIDE 17

De Micheli 17 ASPDAC 2003 dd

v

1 2 Adaptive low-power transmission scheme

FIFO

ch

F

Controller

FIFO

n

dd

v

Encoder Decoder Ack

ch

v

errors

ch

v

slide-18
SLIDE 18

De Micheli 18 ASPDAC 2003

  • Self-calibration makes circuit robust against:

– Design process variations – External disturbances

  • E.g., soft errors, EM interference, environment
  • Self-calibration may take different embodyments

– May be applied during normal operation

  • To compensate for environmental changes

– May be used at circuit boot time

  • To compensate for manufacturing variations
  • General paradigm to cope with DSM problems

Self-calibration

slide-19
SLIDE 19

De Micheli 20 ASPDAC 2003

  • Engineers will always attempt to design chips

at the edge of human capacity

  • Challenges:

– Large scale: billion transistor chips – Heterogeneity: digital, analog, RF, optical, MEMS, sensors, micro-fluidics

  • Many desiderata: high performance, low

power, low cost, fast design, small team, …

Extremely large scale

slide-20
SLIDE 20

De Micheli 21 ASPDAC 2003

Component-based design

  • SoCs are designed (re)-using large macrocells

– Processors, controllers, memories… – Plug and play methodology is very desirable – Components are qualified before use

  • Design goal:

– Provide a functionally-correct, reliable operation of the interconnected components

  • Critical issues:

– Properties of the physical interconnect – Achieving robust system-level assembly

slide-21
SLIDE 21

De Micheli 22 ASPDAC 2003

Physical interconnection

  • Electrical-level information transfer is unreliable

– Timing errors

  • Delay on global wires and delay uncertainty
  • Synchronization failure across different islands
  • Crosstalk-induced timing errors

– Data errors:

  • Data upsets due to EM interference and soft errors
  • Noise is the abstraction of the error sources
  • The problem will get more and more acute as

geometries and voltages scale down

slide-22
SLIDE 22

De Micheli 24 ASPDAC 2003

Systems on chips:

a communication-centric view

  • Design component interconnection under:

– Uncertain knowledge of physical medium – Incomplete knowledge of environment

  • Workload, data traffic, …
  • Design interconnection as a micro-network

– Leverage network design technology – Manage information flow

  • To provide for performance

– Power-manage components based on activity

  • To reduce energy consumption
slide-23
SLIDE 23

De Micheli 25 ASPDAC 2003

Micro-network characteristics

  • Micro-networks require:

– Low communication latency – Low communication energy consumption – Limited adherence to standards

  • SoCs have some physical parameters that:

– Can be predicted accurately – Can be described by stochastic distributions

slide-24
SLIDE 24

De Micheli 26 ASPDAC 2003

Micro-network stack

Design choices at each stack level affect:

– Communication speed – Reliability – Energy

Control Protocols:

– Layered – Implemented in Hw or Sw – Providing error correction

  • application

application

  • system

system Software Software Architecture Architecture and control and control

  • transport

transport

  • network

network

  • data link

data link

  • wiring

wiring Physical Physical

slide-25
SLIDE 25

De Micheli 27 ASPDAC 2003

Achieving robustness in micro-networks

  • Error detection and correction is applied

at various layers in micro-networks

  • Paradigm shift:

– Present design methods reduce noise

  • Physical design (e.g., sizing, routing)

– Future methods must cope with noise

  • Push solution to higher abstraction levels
slide-26
SLIDE 26

De Micheli 28 ASPDAC 2003

ICACHE MEM.CTRL.

AMBA BUS INTERFACE FROM EXT. MEMORY HRDATA AMBA BUS

  • Compare original AMBA bus to

extended bus with error detection and correction or retransmission – SEC coding – SEC-DED coding – ED coding

  • Explore energy efficiency

Data-link protocol example: error-resilient coding

H DECODER H ENCODER

MTTF

slide-27
SLIDE 27

De Micheli 29 ASPDAC 2003

Advanced bus techniques: CDMA on bus

  • Motivation: many data sources

– Support multiple concurrent write on bus – Discriminate against background noise

  • Spread spectrum of information

– Driver/receiver multiply data by random sequence generated by LFSR

  • LFSR signature is key for de-spreading

LFSR

data

LFSR

data

LFSR

data

slide-28
SLIDE 28

De Micheli 30 ASPDAC 2003

Going beyond buses

  • Buses:

– Pro: simple, existing standards – Contra: performance, energy-efficiency, arbitration

  • Other network topologies:

– Pro: higher performance, experience with MP – Contra: physical routing, need for network and transport layers

  • Challenge: exploit appropriate network

Challenge: exploit appropriate network architecture and corresponding protocols architecture and corresponding protocols

slide-29
SLIDE 29

De Micheli 31 ASPDAC 2003

Network and transport layers

  • Information is in packets
  • Network issues:

– Network switching

  • Circuit, packet, cut-through, wormhole

– Network routing

  • Deterministic and adaptive routing
  • Transport issues:

– Decompose and reconstruct information – Packet granularity – Admission/congestion control

slide-30
SLIDE 30

De Micheli 32 ASPDAC 2003

SPIN micro-network

  • Applied to SoCs
  • 36-bit packets

– Header: destination – Trailer: checksum

  • Fat-tree network architecture
  • Cut-through switching
  • Deterministic tree routing

EOP

Variable size payload

Address

slide-31
SLIDE 31

De Micheli 33 ASPDAC 2003

SPIN micro-network

Address

Stream

Other Other RAM CPU FIR

Router Router Router Router Router Router Router Router

Address Address

Stream Stream

slide-32
SLIDE 32

De Micheli 34 ASPDAC 2003

Benefits of packets

  • Reliable error-control mechanism

– With small overhead

  • Exploit different routing paths

– Spread information to avoid congestion

  • Several user-controllable parameters

– Size, retransmission schemes, …

  • Use retransmission rate for calibrating

parameters

slide-33
SLIDE 33

De Micheli 35 ASPDAC 2003

System assembly

around micro-network

  • Network architecture provides backbone
  • Component plug and play:

– Programmable network interface – Reconfigurable protocols – Recognize network and self configure

  • Self-assembly of SoCs addresses the issue
  • f component reuse and heterogeneity
slide-34
SLIDE 34

De Micheli 36 ASPDAC 2003

Extremely large scale design

  • Heterogeneous components with malleable interfaces
  • Macroscopic self-assembly

– Exploit degrees of freedom in component/interface specifications – Self-configuration realizes interfacing details abstracted by designers – Self-configuration, together with redundancy, addresses self- correction of some possible design errors

  • Self-healing

– Correcting for run-time failures – Method to increase availability and robustness

slide-35
SLIDE 35

De Micheli 37 ASPDAC 2003

Example: Biowall

  • Embryonics project at EPFL, Switzerland
  • Cellular design with redundancy

– Each cell programmed by a string (gene) – FPGA technology

  • Self-healing property:

– Upon cell failure, neighbors reconfigure to take over function

slide-36
SLIDE 36

De Micheli 38 ASPDAC 2003

Cellular self-repair

RG+OG 2 3 4 X=1 SPARE CELL faulty molecule

slide-37
SLIDE 37

De Micheli 39 ASPDAC 2003

Cellular self-repair

RG+OG 2 3 4 X=1 SPARE CELL 3 4 KILL=1

slide-38
SLIDE 38

De Micheli 40 ASPDAC 2003

Autonomic computing

  • Broad R&D project launched by IBM
  • Self-healing

– Design computer and software that perform self-diagnostic functions and can fix themselves without human intervention – Strong analogies to biological systems

  • Reduced cost of design and maintenance
slide-39
SLIDE 39

De Micheli 41 ASPDAC 2003

Autonomics principles

  • An autonomic system:

– must know itself – reconfigures itself under varying condition – optimizes its operations at run time – must support self-healing – must defend itself against attacks – must know the environment – manages and optimizes internal resources without human intervention

slide-40
SLIDE 40

De Micheli 42 ASPDAC 2003

Evolving computing materials

  • When will current semiconductor technologies run out of steam?
  • What factor will provide a radical change in technology?

– Performance, power density, cost?

  • Several emerging technologies:

– Carbon nanotubes, nanowires, quantum devices, molecular electronics, biological computing, …

  • Are these technologies compatible with silicon?

– What is the transition path?

  • What are the common characteristics, from a design

technology standpoint?

slide-41
SLIDE 41

De Micheli 43 ASPDAC 2003

Rosette nanotubes

[Source: Purdue University]

slide-42
SLIDE 42

De Micheli 44 ASPDAC 2003

Common characteristics of nano-devices

  • Self-assembly used to create structures

– Manufacturing paradigm is bottom-up

  • Significant presence of physical defects

– Design style must be massively fault-tolerant

  • Competitive advantage stems from extreme high density of computing

elements

– 1011-1012 dev/cm2 vs. 3x109 dev/cm2 for CMOS in 2016

  • Some nano-array technologies are compatible with silicon technology

and can be embedded in CMOS

slide-43
SLIDE 43

De Micheli 45 ASPDAC 2003

  • Key ingredients:

– Massive parallelism and redundancy – Exploit properties of crosspoint architectures

  • E.g., Programmable Logic Arrays (PLAs)

– Local and global reconfiguration

  • Some design technologies for robust DSM CMOS

design can be applied to nanotechnology

Robust nano-design

slide-44
SLIDE 44

De Micheli 46 ASPDAC 2003

Summary

problem analysis

  • The electronic market is driven by embedded applications

where reliability and robustness are key figures of merit

  • System design has to cope with uncertainty

– Lack of knowledge of details, due to abstraction – Physical properties of the material

  • As design size scales up, the design challenge is related

to interconnecting high-level components

  • As technology scales down, and as nanotechnologies are

introduced, electrical-level information becomes unreliable

slide-45
SLIDE 45

De Micheli 47 ASPDAC 2003

Summary

design strategies

  • Robust and reliable design is achieved by:

– Self-calibrating system components – Networking components on chip with adaptive interfaces

  • Encoding, packet switching and routing provide a new view of logic

and interconnect design

– Self-healing components that can diagnose failures and reconfigure themselves

  • New emerging technologies will require massive use
  • f error correction and redundancy
slide-46
SLIDE 46

De Micheli 48 ASPDAC 2003