Reliability, Thermal, and Power Modeling and Optimization Robert P. - - PowerPoint PPT Presentation

reliability thermal and power modeling and optimization
SMART_READER_LITE
LIVE PREVIEW

Reliability, Thermal, and Power Modeling and Optimization Robert P. - - PowerPoint PPT Presentation

Reliability, Thermal, and Power Modeling and Optimization Robert P. Dick http://robertdick.org/ Department of Electrical Engineering and Computer Science University of Michigan Intended audience for tutorial Researchers and designers who are


slide-1
SLIDE 1

Reliability, Thermal, and Power Modeling and Optimization

Robert P. Dick

http://robertdick.org/ Department of Electrical Engineering and Computer Science University of Michigan

slide-2
SLIDE 2

Intended audience for tutorial

Researchers and designers who are interested in, but new to, temperature-dependent integrated circuit and embedded system reliability modeling and optimization.

slide-3
SLIDE 3

Goals

Suggest sources of new reliability research problems. Explain relationships among power consumption, temperature, and reliability. Indicate the difficulty of generalized reliability modeling and

  • ptimization.

Request reliability modeling anecdotes for public repository.

slide-4
SLIDE 4

My background and perspective

Integrated circuit power, thermal, and reliability modeling and

  • ptimization.

Embedded system reliability modeling during design and synthesis.

slide-5
SLIDE 5

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models State of reliability research field Sources of new research problems Reliability problem taxonomy

Tutorial sections

  • 1. Indicate state and trends in research field
  • 2. Power, temperature, and reliability
  • 3. Trade-off between sophistication and complexity in reliability

modeling and optimization

  • 4. Reasons for difficulty of developing general models

5 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-6
SLIDE 6

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models State of reliability research field Sources of new research problems Reliability problem taxonomy

Tutorial subsections

  • 1. Indicate state and trends in research field

State of reliability research field Sources of new research problems Reliability problem taxonomy

  • 2. Power, temperature, and reliability
  • 3. Trade-off between sophistication and complexity in reliability

modeling and optimization

  • 4. Reasons for difficulty of developing general models

6 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-7
SLIDE 7

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models State of reliability research field Sources of new research problems Reliability problem taxonomy

Historical development of research fields

Case studies. Modeling and optimization. Generalized and automated modeling and optimization.

7 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-8
SLIDE 8

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models State of reliability research field Sources of new research problems Reliability problem taxonomy

State of reliability research

Embedded systems reliability Case studies. Variation in environmental conditions and applications makes generalization difficult. Caveat: Some areas within embedded system design are better understood than others. Integrated circuit reliability Empirical models of device-level fault processes. Well-developed theory for system-level reliability estimation, as long as component fault rates are known. Ongoing work on (automated) system-level reliability modeling, monitoring, and optimization.

Complicated by impact of on-line adaptation on fault/wear rates.

8 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-9
SLIDE 9

Recent academic IC and system reliability research

Reliable nanoscale logic (DeHon, Jha, Orailoglu, et al.) and system (Atienza, Benini, De Micheli, et al.) design. Reliability-aware IC operating parameter and power consumption state

  • ptimization: Eles, Pop, et al.

Soft error protection and modeling: Dutt, Narayanan, Xie, et al. Architectural techniques for improved reliability: Adve, Alameldeen, Austin, Bertacco, Falsafi, Mahlke, Mudge, Skadron, et al. Trading off correctness for improvements in other quality metrics: Palem, Memik, et al. Circuit failure prediction and self-tuning: Cao, Mitra, Wei, et al. Reliability-aware (networked) embedded system design and synthesis: Coskun, Shang, Teich, Thomas, Dick, et al.

slide-10
SLIDE 10

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models State of reliability research field Sources of new research problems Reliability problem taxonomy

Tutorial subsections

  • 1. Indicate state and trends in research field

State of reliability research field Sources of new research problems Reliability problem taxonomy

  • 2. Power, temperature, and reliability
  • 3. Trade-off between sophistication and complexity in reliability

modeling and optimization

  • 4. Reasons for difficulty of developing general models

10 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-11
SLIDE 11

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models State of reliability research field Sources of new research problems Reliability problem taxonomy

Sources of new research problems

Changes in applications. Changes in implementation technologies.

11 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-12
SLIDE 12

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models State of reliability research field Sources of new research problems Reliability problem taxonomy

Application trends influencing reliability

Inexpensive computers in harsh environments.

Figure from http://wsn- security.info.

Battery- motivated energy constraints. Use in safety-critical applications, e.g., transportation and medical devices. Networked systems.

Figure from Huafeng Xie.

12 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-13
SLIDE 13

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models State of reliability research field Sources of new research problems Reliability problem taxonomy

Technology trends influencing integrated circuit reliability

Use of nanoscale devices. More variation. Better sensors. Power density, variation increase. More cores. More devices.

13 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-14
SLIDE 14

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models State of reliability research field Sources of new research problems Reliability problem taxonomy

Tutorial subsections

  • 1. Indicate state and trends in research field

State of reliability research field Sources of new research problems Reliability problem taxonomy

  • 2. Power, temperature, and reliability
  • 3. Trade-off between sophistication and complexity in reliability

modeling and optimization

  • 4. Reasons for difficulty of developing general models

14 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-15
SLIDE 15

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models State of reliability research field Sources of new research problems Reliability problem taxonomy

Specification and design

Responsible for vast majority of in-system faults [Rahman’06]. An error per ∼100 lines of code is considered a very good rate. What is being done? Language design. Formal verification. Software engineering. Operating system and middleware design. Hardware synthesis. See written tutorial summary for citations.

15 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-16
SLIDE 16

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models State of reliability research field Sources of new research problems Reliability problem taxonomy

Permanent faults

Many permanent faults related to lifetime wear processes. Temperature dependent. Wear state can be estimated or tracked. However, on-line monitoring/testing impact cost.

16 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-17
SLIDE 17

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models State of reliability research field Sources of new research problems Reliability problem taxonomy

Intermittent and transient faults I

Influenced by both controlled and uncontrolled (environmental) conditions. Examples Temperature-dependent timing violations. R drop. dI/dt. C or L crosstalk.

17 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-18
SLIDE 18

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models State of reliability research field Sources of new research problems Reliability problem taxonomy

Intermittent and transient faults II

Single-event upsets Cosmic rays interact with atoms in atmosphere, producing shower of high-energy neutrons. In general, danger increases with process scaling – decreased node capacitance. Single particle can trigger multiple upsets.

18 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-19
SLIDE 19

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models State of reliability research field Sources of new research problems Reliability problem taxonomy

Influence of parameter variation

Fabrication time Cuts into safety margins. Changes sensitivity to dynamically varying environmental parameters. E.g., reduced threshold voltage increases power density and temperature. On-line Operating parameters influence, and influenced by, wear processes. E.g., Vt, are influenced by wear processes.

19 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-20
SLIDE 20

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models Wear mechanisms Relationships among power, temperature, and reliability Thermal analysis

Tutorial sections

  • 1. Indicate state and trends in research field
  • 2. Power, temperature, and reliability
  • 3. Trade-off between sophistication and complexity in reliability

modeling and optimization

  • 4. Reasons for difficulty of developing general models

20 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-21
SLIDE 21

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models Wear mechanisms Relationships among power, temperature, and reliability Thermal analysis

Tutorial subsections

  • 1. Indicate state and trends in research field
  • 2. Power, temperature, and reliability

Wear mechanisms Relationships among power, temperature, and reliability Thermal analysis

  • 3. Trade-off between sophistication and complexity in reliability

modeling and optimization

  • 4. Reasons for difficulty of developing general models

21 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-22
SLIDE 22

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models Wear mechanisms Relationships among power, temperature, and reliability Thermal analysis

Wear mechanisms I

Electromigration. Dislocation of metal atoms caused by momentum imparted by electrical current in wires and vias. Figure from Taychatanapat, Bolotin, and Kuemmeth.

22 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-23
SLIDE 23

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models Wear mechanisms Relationships among power, temperature, and reliability Thermal analysis

Wear mechanisms II

Time-dependent dielectric breakdown. Deterioration of the gate oxide layer: formation of conductive path. Stress migration Directionally biased motion of atoms in wires due to mechanical stress. Negative bias temperature instability Electric field dependent disassociation of Si–H bonds at Si–SiO2 interface. Increases threshold voltage. Significant for PMOSFETs under negative bias. Partially recovers when negative bias is removed.

23 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-24
SLIDE 24

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models Wear mechanisms Relationships among power, temperature, and reliability Thermal analysis

Wear mechanisms III

Thermal cycling Mechanical stress resulting from mismatched coefficients of thermal expansion for adjacent material layers. Special class of memory: depends on recent temperature history, not just wear state and environment.

24 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-25
SLIDE 25

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models Wear mechanisms Relationships among power, temperature, and reliability Thermal analysis

Lifetime estimation of the failure mechanisms

Most mechanisms Arrhenius equation: MTTF = j1e

j2 T

j1 and j2: wear process dependent constants. T: temperature. Thermal cycling Generalized Coffin-Manson eq.: N = k1 (δT − Tth)k2 e

k3 Tmax ,

k1, k2, and k3: constants. N: cycles to failure. δT: thermal cycle amplitude. Tth: temperature change threshold. Tmax: maximum temperature during the cycle.

25 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-26
SLIDE 26

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models Wear mechanisms Relationships among power, temperature, and reliability Thermal analysis

Tutorial subsections

  • 1. Indicate state and trends in research field
  • 2. Power, temperature, and reliability

Wear mechanisms Relationships among power, temperature, and reliability Thermal analysis

  • 3. Trade-off between sophistication and complexity in reliability

modeling and optimization

  • 4. Reasons for difficulty of developing general models

26 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-27
SLIDE 27

Relationships among power, temperature, and reliability

slide-28
SLIDE 28

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models Wear mechanisms Relationships among power, temperature, and reliability Thermal analysis

Required information for reliability estimation

Need Thermal profile, which requires power profile. May need temporal distribution or time series, depending on dominant fault processes. May need current densities. May need process variation characteristics.

28 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-29
SLIDE 29

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models Wear mechanisms Relationships among power, temperature, and reliability Thermal analysis

Tutorial subsections

  • 1. Indicate state and trends in research field
  • 2. Power, temperature, and reliability

Wear mechanisms Relationships among power, temperature, and reliability Thermal analysis

  • 3. Trade-off between sophistication and complexity in reliability

modeling and optimization

  • 4. Reasons for difficulty of developing general models

29 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-30
SLIDE 30

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models Wear mechanisms Relationships among power, temperature, and reliability Thermal analysis

Macroscopic thermal analysis

Partition into 3-D elements (diagram 2-D for simplicity) Temperature ↔ Voltage Thermal resistance ↔ Resistance Heat flow ↔ Current Heat capacity ↔ Capacitance

30 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-31
SLIDE 31

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models Wear mechanisms Relationships among power, temperature, and reliability Thermal analysis

Device-level thermal analysis

Architectural thermal analysis tools Functional-unit level spatial discretization. 10–100 µm element sizes. 100 µs–1 ms time step sizes. Fourier heat flow model. When device length scales shorter than phonon mean free path (∼100 nm), conventional diffusion-based thermal analysis breaks down. I.e., when “heat particles” interact with the material lattice after traveling short distances.

31 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-32
SLIDE 32

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models Wear mechanisms Relationships among power, temperature, and reliability Thermal analysis

Need non-Fourier models and analysis for nanoscale devices

300 302 304 306 308 310 312 314 x (nm) y (nm)

100 200 300 400 500 100 200 300 400 500 600 700 800 900

Thermal profile of 65 nm FinFET (Fourier heat flow).

300 302 304 306 308 310 312 314 316 318 x (nm) y (nm)

100 200 300 400 500 100 200 300 400 500 600 700 800 900

Thermal profile of 65 nm FinFET (Boltzmann Transport Equation).

Figures from Zyad Hassan.

32 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-33
SLIDE 33

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models Wear mechanisms Relationships among power, temperature, and reliability Thermal analysis

Wear-dependent fault processes without other state I

Parameter distribution suffices. E.g., temperature distribution.

33 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-34
SLIDE 34

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models Wear mechanisms Relationships among power, temperature, and reliability Thermal analysis

Wear-dependent fault processes with other state

t T

E.g., thermal cycling. Aggregate parameter distributions insufficient.

34 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-35
SLIDE 35

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models Wear mechanisms Relationships among power, temperature, and reliability Thermal analysis

Wear-dependent fault processes with other state

t T

E.g., thermal cycling. Aggregate parameter distributions insufficient.

34 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-36
SLIDE 36

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models Wear mechanisms Relationships among power, temperature, and reliability Thermal analysis

Wear-dependent fault processes with other state

t T

E.g., thermal cycling. Aggregate parameter distributions insufficient.

34 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-37
SLIDE 37

Sketch of power, thermal, and reliability modeling flow

Component-level modeling Device-level modeling System-level modeling Selected distribution Thermal profiles Device specification Component specification EM, SM, TDDB TC (RFC) Survival lattice Expected lifetime Initial age System structure Monte-Carlo simulation Component distribution Parameter fitting

slide-38
SLIDE 38

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models Hierarchy of modeling and optimization techniques Complexity–quality metric trade-off

Tutorial sections

  • 1. Indicate state and trends in research field
  • 2. Power, temperature, and reliability
  • 3. Trade-off between sophistication and complexity in reliability

modeling and optimization

  • 4. Reasons for difficulty of developing general models

36 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-39
SLIDE 39

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models Hierarchy of modeling and optimization techniques Complexity–quality metric trade-off

Tutorial subsections

  • 1. Indicate state and trends in research field
  • 2. Power, temperature, and reliability
  • 3. Trade-off between sophistication and complexity in reliability

modeling and optimization Hierarchy of modeling and optimization techniques Complexity–quality metric trade-off

  • 4. Reasons for difficulty of developing general models

37 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-40
SLIDE 40

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models Hierarchy of modeling and optimization techniques Complexity–quality metric trade-off

Example hierarchy of modeling and optimization techniques

Ignorance. Static fault rates, no redundancy. Variable fault rates (wear modeling), redundancy. Variable-rate environmental parameter dependent wear modeling, redundancy. Sensor-based estimation of environmental parameters, on-line fault detection and adaptation. Is more sophisticated better? Not always.

38 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-41
SLIDE 41

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models Hierarchy of modeling and optimization techniques Complexity–quality metric trade-off

Tutorial subsections

  • 1. Indicate state and trends in research field
  • 2. Power, temperature, and reliability
  • 3. Trade-off between sophistication and complexity in reliability

modeling and optimization Hierarchy of modeling and optimization techniques Complexity–quality metric trade-off

  • 4. Reasons for difficulty of developing general models

39 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-42
SLIDE 42

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models Hierarchy of modeling and optimization techniques Complexity–quality metric trade-off

Complexity–quality metric trade-off: What is optimal?

Common proposal Use more sophisticated modeling and adaptation techniques to allow tighter guardbands. Supports improved performance, reliability, or cost (pick any two). Costs of sophistication Do designers need to spend time learning and remembering complex new concepts? Is the design process made more difficult or changed? Is debugging or analysis made more difficult? Does the technique impose overhead (performance, power, etc.)?

40 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-43
SLIDE 43

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models Hierarchy of modeling and optimization techniques Complexity–quality metric trade-off

Simplicity does not preceed complexity, but follows it. Alan Perlis

41 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-44
SLIDE 44

Promising directions

Studies that indicate the operating conditions necessary for a particular type of fault to matter. Lets designers ignore things that should be ignored. Generalized or automated modeling techniques. Opportunities for design automation researchers. Low-level, low-complexity techniques that reduce a particular reliability problem. Multiple unidirectional current vias for electromigration. Change in packaging for α particles. System-level performance enhancement infrastructure that happens to also support adaptation for reliability. Motivate use before it is too late.

slide-45
SLIDE 45

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models Example technology-specific reliability problems A plea for anecdotes

Tutorial sections

  • 1. Indicate state and trends in research field
  • 2. Power, temperature, and reliability
  • 3. Trade-off between sophistication and complexity in reliability

modeling and optimization

  • 4. Reasons for difficulty of developing general models

43 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-46
SLIDE 46

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models Example technology-specific reliability problems A plea for anecdotes

Tutorial subsections

  • 1. Indicate state and trends in research field
  • 2. Power, temperature, and reliability
  • 3. Trade-off between sophistication and complexity in reliability

modeling and optimization

  • 4. Reasons for difficulty of developing general models

Example technology-specific reliability problems A plea for anecdotes

44 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-47
SLIDE 47

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models Example technology-specific reliability problems A plea for anecdotes

Random background offset charge effects

Defects near gate act trap charge carriers. Two close defects? Charge carrier tunnels between them. Result: Randomly changing I–V curve phase. Wide range of timescales, ns–many hours. Can general approach solve this? Maybe, but less efficient than one based on understanding of cause of faults.

45 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-48
SLIDE 48

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models Example technology-specific reliability problems A plea for anecdotes

Rain

Inexpensive sensors deployed in harsh environments. Many possible reliability problems. Difficult to predict and prepare for each.

46 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-49
SLIDE 49

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models Example technology-specific reliability problems A plea for anecdotes

Tutorial subsections

  • 1. Indicate state and trends in research field
  • 2. Power, temperature, and reliability
  • 3. Trade-off between sophistication and complexity in reliability

modeling and optimization

  • 4. Reasons for difficulty of developing general models

Example technology-specific reliability problems A plea for anecdotes

47 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-50
SLIDE 50

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models Example technology-specific reliability problems A plea for anecdotes

A plea for anecdotes

Hypothesis System designers often encounter reliability problems that are difficult to predict, but obvious in hindsight, and therefore rarely published. As a result, these problems remain difficult to predict. Offer I will maintain an (anonymized if requested) list of reliability problem anecdotes to help designers determine the most likely problems. Email dickrp@eecs.umich.edu. Case studies are a foundation.

48 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-51
SLIDE 51

Indicate state and trends in research field Power, temperature, and reliability Sophistication vs. overhead Reasons for difficulty of developing general models

Our relevant tools and work

Reliability modeling and optimization in distributed sensing systems: Bai et al., DATE’11. Nano- to system-scale integrated circuit thermal analysis software:

  • N. Allec, et al., “ISAC2: Incremental self-adaptive chip-package

thermal analysis software, version 2,” ISAC2 link at http://ziyang.eecs.umich.edu/projects/isac and http://eces.colorado.edu/∼hassanz/ThermalScope. Thermal and reliability modeling survey: Brooks et al., IEEE Micro’07. Thermal and reliability modeling and optimization techniques and software: Multiple. See http://robertdick.org/publications/.

49 Robert P. Dick Reliability, Thermal, and Power Modeling and Optimization

slide-52
SLIDE 52

Thanks!

Thanks to Lan Bai, David Bild, Fred Brooks, Pai Chou, Peter Dinda, Paul Ferno, Zyad Hassan, Lei Jiang, Russ Joseph, Brett Meyer, Seth Prejean, Li Shang, Don Thomas, Vishak Venkatraman, Yun Xiang, Lide Zhang, and Lin Zhang for prior discussions that influenced this tutorial. Thanks to SRC and NSF for supporting projects that gave me the

  • pportunity to think about integrated circuit and embedded system

reliability.

Thank you for attending!

More information at http://robertdick.org/.