A High-Performance Triple Patterning Layout Decomposer with Balanced - - PowerPoint PPT Presentation

a high performance triple patterning layout decomposer
SMART_READER_LITE
LIVE PREVIEW

A High-Performance Triple Patterning Layout Decomposer with Balanced - - PowerPoint PPT Presentation

A High-Performance Triple Patterning Layout Decomposer with Balanced Density Bei Yu 1 Yen-Hung Lin 2 Gerard Luk-Pat 3 Duo Ding 4 Kevin Lucas 3 David Z. Pan 1 1 ECE Dept., University of Texas at Austin 3 Synopsys Inc. 2 CS Dept., National Chiao Tung


slide-1
SLIDE 1

A High-Performance Triple Patterning Layout Decomposer with Balanced Density

Bei Yu1 Yen-Hung Lin2 Gerard Luk-Pat3 Duo Ding4 Kevin Lucas3 David Z. Pan1

1ECE Dept., University of Texas at Austin 3Synopsys Inc. 2CS Dept., National Chiao Tung University 4Oracle Corp. Austin 1 / 25

slide-2
SLIDE 2

Triple Patterning Lithography (TPL)

ITRS roadmap

28nm single-patterning 20nm double-patterning 14nm triple-patterning / EUV 10nm quadruple-patterning / EUV dmin

stitch 2 / 25

slide-3
SLIDE 3

TPL Decomposition Works

– ILP or SAT [Cork+,SPIE’08][Yu+,ICCAD’11][Cork+,SPIE’13] – Graph Search for Row based Layout [Tian+, ICCAD’12][Tian+,SPIE’13] – Heuristic [Ghaida+,SPIE’11][Fang+,DAC’12][Chen,ISQED’13] [Kuang+,DAC’13][Tang+,Patent’13] – Semidefinite Programming (SDP) (trade-off) [Yu+, ICCAD’11][Yu+,ICCAD’13]

3 / 25

slide-4
SLIDE 4

◮ Global Balanced Density?

b1 b1

4 / 25

slide-5
SLIDE 5

◮ Global Balanced Density?

b1 b1

◮ Local Balanced Density!

b2 b3 b4 b1 b4 b3 b2 b1

4 / 25

slide-6
SLIDE 6

Overall Flow

Graphs Construction and Simplification Density Balanced Color Assignment Density Balanced Recovery Output Masks

Input Layout Local Bins Info

5 / 25

slide-7
SLIDE 7

Overall Flow

Graphs Construction and Simplification Density Balanced Color Assignment Density Balanced Recovery Output Masks

Input Layout Local Bins Info

b4 b3 b1 b2

5 / 25

slide-8
SLIDE 8

Overall Flow

Graphs Construction and Simplification Density Balanced Color Assignment Density Balanced Recovery Output Masks

Input Layout Local Bins Info

a c d

5 / 25

slide-9
SLIDE 9

Overall Flow

Graphs Construction and Simplification Density Balanced Color Assignment Density Balanced Recovery Output Masks

Input Layout Local Bins Info

a b c d

5 / 25

slide-10
SLIDE 10

Overall Flow

Graphs Construction and Simplification Density Balanced Color Assignment Density Balanced Recovery Output Masks

Input Layout Local Bins Info

a1 b c d1 a2 d2

a

aStitch candidate generation [Kuang+,DAC’13] 5 / 25

slide-11
SLIDE 11

Overall Flow

Graphs Construction and Simplification Density Balanced Color Assignment Density Balanced Recovery Output Masks

Input Layout Local Bins Info

a1 b c d1 a2 d2

5 / 25

slide-12
SLIDE 12

Overall Flow

Graphs Construction and Simplification Density Balanced Color Assignment Density Balanced Recovery Output Masks

Input Layout Local Bins Info

a1 b c d1 a2 d2

5 / 25

slide-13
SLIDE 13

Overall Flow

Graphs Construction and Simplification Density Balanced Color Assignment Density Balanced Recovery Output Masks

Input Layout Local Bins Info

stitch

5 / 25

slide-14
SLIDE 14

Graphs Construction and Simplification Density Balanced Color Assignment Density Balanced Recovery Output Masks

Input Layout Local Bins Info

6 / 25

slide-15
SLIDE 15

Problem Formulation

Local density uniformity in bin bk DUk = dk1 · dk2 + dk1 · dk3 + dk2 · dk3 Lemma: Maximizing DUk can achieve better density balance.

Density Balanced Color Assignment

Input Graph model Output Color Assignment to the graph Objective min conflict#, stitch#, and max DUk

7 / 25

slide-16
SLIDE 16

Color representation

◮ Three unit vectors [Yu+,ICCAD’11] ◮ same color:

vi · vj = 1

◮ different color:

vi · vj = −1/2

(- , ) √3 2 1 2 (1, 0) (- ,- ) √3 2 1 2

Our Vector Programming

min

  • eij ∈CE

( vi · vj) − α

  • eij ∈SE

( vi · vj) − β ·

  • bk ∈B

DUk s.t. vi ∈ {(1, 0), (−1 2, √ 3 2 ), (−1 2, − √ 3 2 )} DUk = −

  • i,j∈V

denki · denkj · ( vi · vj) ∀bk ∈ B

denki: density of feature ri in bin bk

8 / 25

slide-17
SLIDE 17

Relax to Semidefinite Programming (SDP)

SDP: min A • X Xii = 1, ∀i ∈ V Xij ≥ −1 2, ∀eij ∈ CE X 0 Aij =

  • 1 + β ·

k denki · denkj,

∀bk ∈ B, eij ∈ CE −α + β ·

k denki · denkj,

∀bk ∈ B, eij ∈ SE β ·

k denki · denkj,

  • therwise

Output matrix X:

◮ If Xij close to 1, i, j same color ◮ If Xij close to -0.5, i, j different colors

9 / 25

slide-18
SLIDE 18

Mapping: From SDP to Color Assignment

X =      

1.0 0.43 −0.5 0.21 −0.5 0.15 1.0 −0.5 −0.5 0.15 0.95 1.0 −0.5 −0.5 −0.5 1.0 0.21 −0.5 . . . 1.0 0.43 1.0

     

1 3 4 5 2 6

10 / 25

slide-19
SLIDE 19

Mapping: From SDP to Color Assignment

X =      

1.0 0.43 −0.5 0.21 −0.5 0.15 1.0 −0.5 −0.5 0.15 0.95 1.0 −0.5 −0.5 −0.5 1.0 0.21 −0.5 . . . 1.0 0.43 1.0

     

1 3 4 5 2 6

◮ Greedy may lose optimality ◮ 0.95? 0.43?

1 3 4 5 2 6

10 / 25

slide-20
SLIDE 20

Mapping: 3-Way Max-Cut

X =      

1.0 0.43 −0.5 0.21 −0.5 0.15 1.0 −0.5 −0.5 0.15 0.95 1.0 −0.5 −0.5 −0.5 1.0 0.21 −0.5 . . . 1.0 0.43 1.0

     

1 3 4 5 2 6

◮ Smaller merged graph ◮ 3-Way Max-Cut ◮ FM Heuristic v.s. Search

4 2+6 3 1 5

  • 0.1

1 1 1

  • 0.1

2 2 1 11 / 25

slide-21
SLIDE 21

Mapping: Extend to Density Balance

a1 b c d1 a2 d2 c (5) a2+d2 (100) b(5) a1(20) d1(15)

  • 0.1

1 1 1

  • 0.1

2 2 1

12 / 25

slide-22
SLIDE 22

Mapping: Extend to Density Balance

a1 b c d1 a2 d2 c (5) a2+d2 (100) b(5) a1(20) d1(15)

  • 0.1

1 1 1

  • 0.1

2 2 1

12 / 25

slide-23
SLIDE 23

Graphs Construction and Simplification Density Balanced Color Assignment Density Balanced Recovery Output Masks

Input Layout Local Bins Info

◮ Implement techniques

[Yu+,ICCAD’11] [Fang+,DAC’12] [Kuang+,DAC’13]

◮ 3 new techniques ◮ Integrate density balance

13 / 25

slide-24
SLIDE 24

Fast Color Assignment Trial

◮ Iteratively remove vertex with:

◮ Conflict degree < 3 ◮ Stitch degree < 2

◮ Linear runtime ◮ Keep conflict # optimality

1 3 4 5 2 6

Stack 14 / 25

slide-25
SLIDE 25

Fast Color Assignment Trial

◮ Iteratively remove vertex with:

◮ Conflict degree < 3 ◮ Stitch degree < 2

◮ Linear runtime ◮ Keep conflict # optimality

1 3 4 5 2 6

Stack

1

14 / 25

slide-26
SLIDE 26

Fast Color Assignment Trial

◮ Iteratively remove vertex with:

◮ Conflict degree < 3 ◮ Stitch degree < 2

◮ Linear runtime ◮ Keep conflict # optimality

1 3 4 5 2 6

Stack

2 1

14 / 25

slide-27
SLIDE 27

Fast Color Assignment Trial

◮ Iteratively remove vertex with:

◮ Conflict degree < 3 ◮ Stitch degree < 2

◮ Linear runtime ◮ Keep conflict # optimality

4 1 3 4 5 2 6

Stack

2 1

14 / 25

slide-28
SLIDE 28

Fast Color Assignment Trial

◮ Iteratively remove vertex with:

◮ Conflict degree < 3 ◮ Stitch degree < 2

◮ Linear runtime ◮ Keep conflict # optimality

4 1 3 4 5 2 6

Stack

2 1

14 / 25

slide-29
SLIDE 29

Fast Color Assignment Trial

◮ Iteratively remove vertex with:

◮ Conflict degree < 3 ◮ Stitch degree < 2

◮ Linear runtime ◮ Keep conflict # optimality

1 5 2 6

Stack

2 1 4 3

14 / 25

slide-30
SLIDE 30

Fast Color Assignment Trial

◮ Iteratively remove vertex with:

◮ Conflict degree < 3 ◮ Stitch degree < 2

◮ Linear runtime ◮ Keep conflict # optimality

1 5 2 6

Stack

1 4 3

14 / 25

slide-31
SLIDE 31

Fast Color Assignment Trial

◮ Iteratively remove vertex with:

◮ Conflict degree < 3 ◮ Stitch degree < 2

◮ Linear runtime ◮ Keep conflict # optimality

1 5 2 6

Stack

4 3

14 / 25

slide-32
SLIDE 32

Cut Vertex Stitch Forbiddance

a a2 a1 a

Figure : Cut Computation [Fang+,DAC’12]

15 / 25

slide-33
SLIDE 33

Cut Vertex Stitch Forbiddance

a a2 a1 a

Figure : Cut Computation [Fang+,DAC’12]

a b c e f a b c d e f g b c e f

a1 a2 b c d1 d2 e f g1 g2 15 / 25

slide-34
SLIDE 34

Cut Vertex Stitch Forbiddance

a a2 a1 a

Figure : Cut Computation [Fang+,DAC’12]

a b c e f a b c d e f g b c e f

a b c d1 d2 e f g1 g2 15 / 25

slide-35
SLIDE 35

Cut Vertex Stitch Forbiddance

a a2 a1 a

Figure : Cut Computation [Fang+,DAC’12]

a b c e f a b c d e f g b c e f

a b c d1 d2 e f g1 g2 a b c d2 e f g2 a'

DG1 DG2

d1 g1 15 / 25

slide-36
SLIDE 36

Vertex Clustering

b c d1 d2 a e (a)

b c d2 e

a+d1

(b)

16 / 25

slide-37
SLIDE 37

Experimental Results– Setup

◮ Using C++ on 3.0GHz Linux machine ◮ CSDP as SDP solver ◮ Benchmarks

◮ ISCAS benchmarks from [Yu+,ICCAD’11] ◮ Two OpenSPARC T1 benchmarks mul top and exu ecc ◮ Six dense benchmarks c9 total – s5 total 17 / 25

slide-38
SLIDE 38

With or Without Density Balance?

C880 C1,355 C1,908 C2,670 C3,540 C5,315 C6,288 S1,488 EPE value

[w/o. balance] [w. balance]

50 100 150 200 250 C432 C499 600 800 1,000 1,200 1,400 C 7 , 5 5 2 S 3 8 , 4 1 7 S 3 5 , 9 3 2 S 3 8 , 5 8 4 S 1 5 , 8 5 EPE value

[w/o. balance] [w. balance]

200 400

◮ Considering balance, -14% EPE# ◮ EPE: Edge placement error

18 / 25

slide-39
SLIDE 39

With or Without Density Balance? (cont.)

C880 C1,355 C1,908 C2,670 C3,540 C5,315 C7,552 S1,488

[w/o. balance] [w. balance]

1 2 2 2 C432 C499 30 40 50 C6,288 S38,417 S35,932 S38,584 S15,850

[w/o. balance] [w. balance]

10 20

◮ Considering balance, +4% cost penalty ◮ Cost = conflict# + 0.1 stitch#

19 / 25

slide-40
SLIDE 40

V.S. Other Decomposers ([Yu+,ICCAD’11] cases)

C7,552 S1,488

[ICCAD’11] [DAC’12] [DAC’13]

  • urs

2 4 6 8 10 12 14 C432 C499 C880 C1,355 C1,908 C2,670 C3,540 C5,315 C6,288 S38,417 S35,932 S38,584 S15,850

[ICCAD’11] [DAC’12] [DAC’13]

  • urs

20 40 60 80 100

◮ Cost: -55% (v.s. ICCAD’11), -25% (v.s. DAC’12), -5% (v.s. DAC’13)

20 / 25

slide-41
SLIDE 41

V.S. Other Decomposers ([Yu+,ICCAD’11] cases)

C 5 , 3 1 5 C 6 , 2 8 8 C 7 , 5 5 2 S 1 , 4 8 8 S 3 8 , 4 1 7 S 3 5 , 9 3 2 S 3 8 , 5 8 4 S 1 5 , 8 5 Runtime (s) 88 92 80

[ICCAD’11] [DAC’12]

  • urs

5 10 15 20 C 4 3 2 C 4 9 9 C 8 8 C 1 , 3 5 5 C 1 , 9 8 C 2 , 6 7 C 3 , 5 4

21 / 25

slide-42
SLIDE 42

V.S. Other Decomposers (other cases)

Cost comparison

5,000 m u l _ t

  • p

e x u _ e c c c 9 _ t

  • t

a l c 1 _ t

  • t

a l s 2 _ t

  • t

a l s 3 _ t

  • t

a l s 4 _ t

  • t

a l s 5 _ t

  • t

a l

[ICCAD’11] [DAC’12]

  • urs

1,000 2,000 3,000 4,000

6851 6116 5231 6265

N/A N/A

◮ -60% (v.s. ICCAD’11), -60% (v.s. DAC’12)

Runtime comparison

◮ ICCAD’11 : DAC’12 : ours = 3600s : 6s : 134s

22 / 25

slide-43
SLIDE 43

Scalability of SDP

1000 2000 3000 4000 5000 6000 7000 8000 200 400 600 800 1000 1200 1400 1600 1800 2000 Runtime (sec) Number of nodes Runtime complexity of SDP runtime of SDP O(x^2.2) O(x^2.4)

23 / 25

slide-44
SLIDE 44

Conclusions

◮ Integrate density balance

◮ SDP formulation ◮ Mapping ◮ Graph simplification

◮ High performance

◮ Mapping ◮ Graph model

◮ Faster

◮ A set of graph simplification 24 / 25

slide-45
SLIDE 45

Thank You !

Acknowledgement

Supported by IBM Scholarship, NSF , CCF , SRC, and NSFC

25 / 25