Fast st On-Die St On-Die Statistical Thermal Hotspot atistical - - PowerPoint PPT Presentation

fast st on die st on die statistical thermal hotspot
SMART_READER_LITE
LIVE PREVIEW

Fast st On-Die St On-Die Statistical Thermal Hotspot atistical - - PowerPoint PPT Presentation

Fast st On-Die St On-Die Statistical Thermal Hotspot atistical Thermal Hotspot Analy Analysis: is: Considering Local Statistical Variations Considering Local Statistical Variations Palkesh Jain 1 Manoj Mehrotra 1 Qualcomm Technologies Inc.,


slide-1
SLIDE 1

1

Fast st On-Die St On-Die Statistical Thermal Hotspot atistical Thermal Hotspot Analy Analysis: is: Considering Local Statistical Variations Considering Local Statistical Variations

Palkesh Jain1 Manoj Mehrotra

1Qualcomm Technologies Inc., India

email: palkesh@qti.qualcomm.com

slide-2
SLIDE 2

2

5/17/2016

Statistical Variations: Premise

Calls for: [Re-assessment of leakage power computation methodology] Impact assessment on SoC thermal hotspots  varies based on statistical leakage distribution of the chip (unique per chip)

Core-to-core leakage variations are on rise. Such variations are expected to increase with increasing number of cores (4  8  10 and so on) and reducing feature size

Source: S. Dighe et al., IEEE JSSSC, 2011

Core-to-core leakage variations for a multi-core chip

slide-3
SLIDE 3

3

5/17/2016

Die-to-die:

−Modeled well through global variations

Within-die: primary cause of concern in this work

−Systematic –across-shot litho induced variations;

− Exhibit high amount of correlation (cells adjacent to each other tend to move similarly)

−Random – primarily through random dopant fluctuations

− Independent random variations; uncorrelated − Also called as ‘local variations’ ; This work

Types of Leakage Variations

slide-4
SLIDE 4

4

5/17/2016

Random Variations: Sensitivity and Large Distributions

Impact of Random Local Variations ON/Drive Current:

− Linear; swings -20% to 30%

Good news: Leakage spread reduces with increasing number

  • f uncorrelated distributions

(Central Limit Theorem)

For SoC Leakage  billions of uncorrelated transistor-leakage- distributions: practically we can ignore σ of individual transistor distributions and just bother about µ Works as multiplier for incorporating

  • n-die variations for leakage

calculations

OFF/Leakage Current:

− Exponential: swings -100% to 500%

 

 

 

n i i sum n i i sum 1 2 1

;    

slide-5
SLIDE 5

5

5/17/2016

Thermal runaway can get triggered even by leakage-temperature dependency of very small chip areas (1 grid ~ 10um2 x 10um2)

− Local variations do not completely cancel

  • ut for small areas (fewer distributions)

However.. (bad news)

1x drive cell High drive cell

Grid local-variations are also a function of the grid- composition Smaller sized/drive cells

− could potentially see a higher statistical spread of leakage

Log-Leakage (Normalized) Frequency (PDF) Log-Leakage (Normalized) Frequency (PDF) Collection of 100 inverters Single Inverter

slide-6
SLIDE 6

6

5/17/2016

Localized composition (types and count of the cells) within a small grid alters the thermal sensitivity!

Assessing Grid Composition Impact

a b (> a)

Overall, chip leakage distribution may still remain narrow  3 design methodologies chosen to study the impact of design-style (grid-composition) on the thermal and power Metrics Monit Metrics Monitored: d: As the leakage variations are statistical, for every configuration:

− 100 Monte Carlo runs are performed. − For every run, we monitor: block’s total leakage and the individual grid temperatures

slide-7
SLIDE 7

7

5/17/2016

Assessing Grid Composition Impact … (2)

Single grid structure – defines the sensitivity of the grid to leakage variations  thermal

Modified S Modified Sensitivit nsitivity with y with Toler lerant Inne ant Inner Cor r Core

− Altered grid composition − Inner core has tolerant grids to variations

− Inner Inner

  • = a

= a − Outer Outer

  • = b (>a)

= b (>a)

Modified sensitivit Modified sensitivity y with T with Toler lerant ant Out Outer r Cor Core

− Inner Inner

  • = b

= b − Outer Outer

  • = a (< b

= a (< b)

St Standar andard: :

− No alterations to the original grid-compositions. − All grids in the design have similar sensitivity to leakage variations

slide-8
SLIDE 8

8

5/17/2016

Statistical Impact of Grid Composition: Total Leakage

Des Design St Style yle 100 Mont 100 Monte Carlo runs @ e Carlo runs @ TT TT

  • Max. Power

Average Standard 1.17 1.086 Tolerant Inner 1.169 1.078 Tolerant Outer 1.184 1.062

For the full block:

− For the 100 MC runs for each design configuration, the local random variations result in only about 8% variation from the average current

Local variations effect gets averaged for block level leakage

slide-9
SLIDE 9

9

5/17/2016

From thermal perspective, there could be as much as 10C variation in hotspot temperature

(1) (2) (3)

Thermal map for most probable hotspot case

Design Type: 1 Design Type: 2 Design Type: 3

Statistical Impact of Grid Composition: Thermal Impact

slide-10
SLIDE 10

10

5/17/2016

Design Flow/Methodology

Using MM models  , and ,

Library Characterization

For every cell in the design ∆,, , For every grid, compute the ,  propose grid-changes if sigma is high Compute hotspot and the

  • max. junction temperature

Grid Optimization Required? Check for high-drive, low-finger cells in the grid; space them apart Yes No

slide-11
SLIDE 11

11

5/17/2016

Conclusions

Accurate leakage calculation is integral to accurate thermal & hotspot predictions Random/Statistical variations significantly alters leakage:

−Results in a low Chip-wide impact due to averaging out (Central Limit Applicability; ~ 5-6% variation from estimated global worst) −Very high impact on small-scale (>40% leakage variations) −Small scale (~ 10um2x10um2) variations in leakage alter temperature evolution (due to strong leakage—Temp loop)

In this work, we shared:

−A fast methodology to incorporate local variations into thermal estimations −A physical design methodology to reduce the SoC's final variability and thermal impact