Jihye Kwon *, Matthew M. Ziegler , Luca P. Carloni* *Department of - - PowerPoint PPT Presentation

jihye kwon matthew m ziegler luca p carloni
SMART_READER_LITE
LIVE PREVIEW

Jihye Kwon *, Matthew M. Ziegler , Luca P. Carloni* *Department of - - PowerPoint PPT Presentation

2019 Design Automation Conference Jihye Kwon *, Matthew M. Ziegler , Luca P. Carloni* *Department of Computer Science, Columbia University, New York, NY, USA IBM T. J. Watson Research Center, Yorktown Heights, NY, USA Diverse


slide-1
SLIDE 1

Jihye Kwon*, Matthew M. Ziegler†, Luca P. Carloni*

*Department of Computer Science, Columbia University, New York, NY, USA

†IBM T. J. Watson Research Center, Yorktown Heights, NY, USA

2019 Design Automation Conference

slide-2
SLIDE 2
  • Diverse application areas

✓Movies, music, SNS posts, online shopping items, personalized tips

  • Two main paradigms

✓Content filtering ✓Collaborative filtering User profile / preferences Item content / information

? ? ? ? ?

2

slide-3
SLIDE 3
  • VLSI design with CAD tools for Logic Synthesis and Physical Design (LSPD)

✓Hierarchy of a high-performance processor: Chip → Processor core → Unit → Macro (10,000 – 100,000+ logic gates)

Logic synthesis Physical placement Clock tree synthesis Post-placement

  • ptimization

LSPD Phase-1 Macro specification

Routing Post-route

  • ptimization

LSPD Phase-2 Layout for fabrication Estimated Quality-of-Result (QoR) e.g., timing, power, routability LSPD parameter configuration

3

(scenario) Design-Space Exploration

slide-4
SLIDE 4

Proposed Recommender System

QoR Cost Analysis

Parallel LSPD Flow Runs

Macro data (RTL, constraints, linked libraries) LSPD parameter configuration QoR statistics Feedback

Iterative LSPD Parameter Tuning Runs

Cost function

LSPD Results Archive

Across multiple: ✓ Macros per chip ✓ Tapeouts per chip ✓ Chips per technology ✓ Technology nodes

QoR Model Online Recommendation Recommended Scenarios Offline Learning

Data filter & normalization Hyper-parameters Macro

ቊLegacy (in Archive) New (not observed)

Cost function 4

slide-5
SLIDE 5

LSPD Results Archive

Input Output (normalized QoR)

Macro Scenario Slack 1 Slack 2 Slack 3 Power Congestion 𝑛1 1000 ⋯ 0 0.42 0.56 0.34 0.88 0.76 0110 ⋯ 0 0.89 0.87 0.68 0.75 0.60 1010 ⋯ 1 0.92 0.84 0.56 0.65 0.54 0101 ⋯ 1 0.27 0.30 0.40 0.45 0.63 ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 𝑛2 1000 ⋯ 0 0.34 0.22 0.50 0.56 0.83 1011 ⋯ 0 0.51 0.63 0.74 0.66 0.77 ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮

  • The Archive contains sparse records of

(Input: Macro, Scenario; Output: QoR)

✓ Macro: RTL description, timing and physical constraints, linked libraries ✓ Scenario: configuration of binary meta- parameters for tuning LSPD flows ✓ QoR: normalized QoR scores for each of the 𝑒 metrics (e.g., 5) for each macro

  • Goal: to build a QoR prediction model 𝐺

𝐺 𝑁𝑏𝑑𝑠𝑝, 𝑇𝑑𝑓𝑜𝑏𝑠𝑗𝑝 = (𝑅𝑝𝑆1, ⋯ , 𝑅𝑝𝑆𝑒) ↑ ↑ NOT easily available or quantifiable

→ A collaborative filtering approach

5

slide-6
SLIDE 6
  • Goal: to build a QoR prediction model 𝐺

✓ A collaborative filtering approach E.g., Matrix factorization for a movie recommender system

0.7 0.2 0.9 0.4 0.3 0.8 0.1 0.6 1 −1 −0.2 0.8 −1 1

≈ ×

User matrix Movie matrix

→ ቊ

(User, movie) scores

, = −1, 1 × 0.2, 0.8 = 0.6 , = −1, 1 × 0.9, 0.1 = −0.8

6

slide-7
SLIDE 7
  • Goal: to build a QoR prediction model 𝐺

✓ (Macro, Scenario) scores ✓ (Macro, Parameter, QoR metric) scores 𝑈

𝑅𝑝𝑆1 Macro Scenario (300,000 observations) Extremely sparse for highly tuned scenarios 𝑅𝑝𝑆𝑒 Macro Scenario

𝑈 Macro LSPD Parameter

Latent scores

𝑵

Macro matrix Parameter matrix Defined by CP tensor decomposition

(150,000 out of 2250) (1,000) (Number of latent features: 50) (250 × 50) (1000 × 50) (1,000) (250)

7 QoR metric matrix

(5 × 50)

slide-8
SLIDE 8
  • Goal: to build a QoR prediction model 𝐺

✓ Macro matrix 𝑵, Parameter matrix 𝑸, QoR matrix 𝑹 → Latent tensor 𝑈 ✓ A single-layer perceptron network 𝑯 for QoR prediction (regression) 𝐺 Macro 𝒏𝒋, Scenario 𝒒𝒃 ⋅ 𝒒𝒄 ⋅ 𝒒𝒅; 𝑈 = 𝑯(𝑼𝒋𝒃:, 𝑼𝒋𝒄:, 𝑼𝒋𝒅:) ✓ Learn (𝑵, 𝑸, 𝑹, 𝑯) by a stochastic gradient descent (SGD) method ➢ Objective: to minimize the prediction error (RMSE)

𝑈 Macro LSPD Parameter

=

𝑵

Macro matrix Parameter matrix QoR metric matrix

Output layer Latent layer

Project and normalize (divide by no. of parameters)

8 CP tensor decomposition

𝒋 𝒃 𝒄 𝒅

slide-9
SLIDE 9
  • Legacy Macros

✓ Target macro 𝑛𝑗 in the archive ✓ Use 𝐺 = (𝑵[𝒋], 𝑸, 𝑹, 𝑯) for making an inference instead of applying an LSPD flow

  • New Macros

✓ Sample LSPD results for a new macro ✓ Train 𝐺∗ = (𝒏∗, 𝑸, 𝑹, 𝑯) with 𝑸, 𝑹, 𝑯 fixed (to learn 𝒏∗) → Use 𝐺∗ = (𝒏∗, 𝑸, 𝑹, 𝑯) for inference

Online Recommendation

QoR Model 𝐺 = (𝑵, 𝑸, 𝑹, 𝑯) Macro QoR cost function (e.g., weights)

Recommended Scenarios

+ Designer’s parameter settings

=

𝒏∗ 𝑯

taking hours

9

in minutes

slide-10
SLIDE 10

✓1,000 macros in 14 nm chip designs and tapeouts ✓250 binary meta-parameters ✓300,000 LSPD flow results ✓150,000 distinct scenarios ✓80% train set, 20% validation set

LSPD Results Archive

10

slide-11
SLIDE 11

Macro name Logic function Logic gates Runtime (hours) FP Floating-point pipeline 75 K 8.0 ECDT Execution control & data transfer 45 K 6.2 IDEC Instruction decode 210 K 21.6 ISC Instruction sequencing control 77 K 13.1 LSC L2 cache control & FSM 195 K 12.3

5 Macros from Industrial 14nm Processors

11 Designer’s Setting Recommended + Designer’s (Ours) Default Flow

slide-12
SLIDE 12

Macro name Logic function Logic gates Runtime (hours) FP Floating-point pipeline 75 K 8.0 ECDT Execution control & data transfer 45 K 6.2 IDEC Instruction decode 210 K 21.6 ISC Instruction sequencing control 77 K 13.1 LSC L2 cache control & FSM 195 K 12.3

5 Macros from Industrial 14nm Processors

12 50 Sample Parameters Recommended (Ours) Iterative Tuning Default Flow

slide-13
SLIDE 13
  • Collaborative recommendation for VLSI design
  • Data from LSPD flow runs of industrial high-performance processors
  • Reduced computational (LSPD) cost for design-space exploration
  • Many unique and unobserved scenarios recommended
  • The model learned for 14nm designs used for a 7nm design in progress

13