[PPT] - Physical Design Considerations of One-level RRAM-based Routing PowerPoint Presentation

SLIDE 1

Physical Design Considerations

f One-level RRAM-based

Routing Multiplexers

Xifan Tang, Edouard Giacomin,

Giovanni De Micheli and Pierre-Emmanuel Gaillardon

March 20th, 2017 For ISPD’17

SLIDE 2

2

▲Resistive Memory (RRAM) technology can offer

▽Low on-resistance: source of high-performance ▽Resistance independent from VDD: source of low-power

▲Challenges in RRAM-based multiplexer design

▽Co-integration of low-voltage VDD and high-voltage Vprog

Vprog for RRAM programming circuits
VDD for datapath circuits

▽Eliminate crosstalk current between datapath and

programming structures

▽Consider physical design aspects

Multiple wells, parasitic capacitances and physical location of

RRAMs

Motivation

SLIDE 3

3

▲Fabrication

▽Sandwiched structure ▽Compatible with BEoL ▽Between metal layers

▲Two stable resistance states

▽Filamentary conducting ▽High Resistance State (HRS) ▽Low Resistance State (LRS)

▲Adjustable set process

▽Large Iset => Low LRS

Resistive Memory

SLIDE 4

4

▲4T(ransistor)1R(RAM) programming structure

▽1.4x larger in programming current than 2T1R ▽Achieve smaller LRS

▲Set and Reset are controlled by two pairs of

programming transistors independently

4T1R Programming Structure

GND

Vprog Vset,ENb +

P1

P2

Iset

GND

N1 N2

Vreset,EN Vreset,ENb Vset,EN Vprog

datapath,in datapath,out

CP

Ireset

Deep N-Well

[1] X. Tang et al., “A Study on the Programming Structures for RRAM-Based FPGA Architectures,” IEEE TCAS-I,

Vol. 63, No. 4, pp. 503-516, April 2016.

SLIDE 5

5

▲High-performance

▽Small capacitance on critical path

▲Low power

▽RRAM LRS is independent from VDD

Potential of RRAM-based Multiplexers

Cpath = (2 n +1)Ctrans

Cpath =2Cprog,trans+N·CP

CMOS MUX: RRAM MUX:

SLIDE 6

6

▲Limitation 1: Programming current contribution

from datapath inverters (Red arrows)

Naïve 4T1R-based Multiplexer

in[0] VDD,well BL[0]

P0

GND,well

N0

+

in[N-1]

+

BL[N-1]

BL[N] WL[0] WL[N] WL[N-1]

ut

GND

VDD

GND GND GND,well GND,well

VDD VDD ... VDD,well VDD,well Input inverters Output inverter

A

B

C

R0 RN-1

P1 P2 N1

programming current crosstalk current

Deep N-Well

N2

CP,0 CP,N-1

Regular Well

...

Metal wire group1 Metal wire group2

Regular Well

SLIDE 7

7

▲Limitation 2: Breakdown threats of datapath

transistors (highlighted in red)

▽Deep N-well VDD,well (3.0V) >> Regular Well VDD (0.9V)

Naïve 4T1R-based Multiplexer

in[0] VDD,well BL[0]

P0

GND,well

N0

+

in[N-1]

+

BL[N-1]

BL[N] WL[0] WL[N] WL[N-1]

ut

GND

VDD

GND GND GND,well GND,well

VDD VDD ... VDD,well VDD,well Input inverters Output inverter

A

B

C

R0 RN-1

P1 P2 N1

programming current crosstalk current

Deep N-Well

N2

CP,0 CP,N-1

Regular Well

...

Metal wire group1 Metal wire group2

Regular Well

SLIDE 8

8

▲Limitation 3: Long interconnecting wires

between deep N-well and regular well

▽Large parasitic resistances and capacitances

Naïve 4T1R-based Multiplexer

in[0] VDD,well BL[0]

P0

GND,well

N0

+

in[N-1]

+

BL[N-1]

BL[N] WL[0] WL[N] WL[N-1]

ut

GND

VDD

GND GND GND,well GND,well

VDD VDD ... VDD,well VDD,well Input inverters Output inverter

A

B

C

R0 RN-1

P1 P2 N1

programming current crosstalk current

Deep N-Well

N2

CP,0 CP,N-1

Regular Well

...

Metal wire group1 Metal wire group2

Regular Well

SLIDE 9

9

▲Address limitations of naïve design

▽Cut off programming current from datapath inverters ▽Avoid transistor breakdown ▽Short interconnecting wires

Improved 4T1R-based Multiplexer

in[0]

+

BL[0]

WL[0]

in[N-1]

+

BL[N-1]

WL[N-1]

… …

VDD VDD GND GND VDD VDD GND GND

BL[N] WL[N]

ut

VDD,well GNDwell VDD,well GNDwell EN EN EN EN

RB RA

CP,A CP,B

Deep N-Well Regular Well

Metal wire group 1

in[0] VDD,well BL[0]

P0

GND,well

N0

+

in[N-1]

+

BL[N-1]

BL[N] WL[0] WL[N] WL[N-1]

ut

GND

VDD

GND GND GND,well GND,well

VDD VDD ... VDD,well VDD,well Input inverters Output inverter

A

B

C

R0 RN-1

P1 P2 N1

programming current crosstalk current

Deep N-Well

N2

CP,0 CP,N-1

Regular Well

...

Metal wire group1 Metal wire group2

Regular Well

SLIDE 10

10

▲Three modes:

▽Operating: VDD,well = VDD, GNDwell = GND ▽Set RRAM: VDD,well = -Vprog + 2VDD, GNDwell = -Vprog+VDD ▽Reset RRAM: VDD,well = Vprog, GNDwell = Vprog-VDD

Improved 4T1R-based Multiplexer

(a) (b) (c)

in[0]

+

BL[N]

WL[N]

ut

BL[0] WL[0]

in[N-1]

+

BL[N-1]

WL[N-1]

…

Deep N-Well

…

VDD VDD GND GND VDD VDD GND GND EN EN EN EN Deep N-Well in[0]

+

BL[N]

WL[N]

ut

BL[0] WL[0]

in[N-1]

+

BL[N-1]

WL[N-1]

… …

GND VDD GND VDD GND VDD GND VDD

Vprog+VDD
Vprog+2VDD
Vprog+VDD
Vprog+2VDD

programming current

EN EN EN EN in[0]

+

BL[N]

WL[N]

ut

BL[0] WL[0]

in[N-1]

+

BL[N-1]

WL[N-1]

…

Deep N-Well

…

GND VDD GND VDD

Vprog-VDD Vprog Vprog-VDD Vprog

VDD EN GND EN VDD EN GND EN

P0 N0 RA RB RA

CP,A CP,B CP,A CP,B VDD GND VDD GND

SLIDE 11

11

▲Advantage 1: zero programming current from

datapath inverters

▽Power-gated input inverters

Improved 4T1R-based Multiplexer

in[0]

+

BL[0]

WL[0]

in[N-1]

+

BL[N-1]

WL[N-1]

… …

VDD VDD GND GND VDD VDD GND GND

BL[N] WL[N]

ut

VDD,well GNDwell VDD,well GNDwell EN EN EN EN

RB RA

CP,A CP,B

Deep N-Well Regular Well

Metal wire group 1

SLIDE 12

12

▲ Advantage 2: datapath transistors are protected from high

programming voltages

▽Large voltage difference shifts from transistors to RRAMs ▽Allow to use standard transistors in programming structures

Higher density and smaller transistor capacitances!

Improved 4T1R-based Multiplexer

(a) (b) (c)

in[0]

+

BL[N]

WL[N]

ut

BL[0] WL[0]

in[N-1]

+

BL[N-1]

WL[N-1]

…

Deep N-Well

…

VDD VDD GND GND VDD VDD GND GND EN EN EN EN Deep N-Well in[0]

+

BL[N]

WL[N]

ut

BL[0] WL[0]

in[N-1]

+

BL[N-1]

WL[N-1]

… …

GND VDD GND VDD GND VDD GND VDD

Vprog+VDD
Vprog+2VDD
Vprog+VDD
Vprog+2VDD

programming current

EN EN EN EN in[0]

+

BL[N]

WL[N]

ut

BL[0] WL[0]

in[N-1]

+

BL[N-1]

WL[N-1]

…

Deep N-Well

…

GND VDD GND VDD

Vprog-VDD Vprog Vprog-VDD Vprog

VDD EN GND EN VDD EN GND EN

P0 N0 RA RB RA

CP,A CP,B CP,A CP,B VDD GND VDD GND

SLIDE 13

13

▲Advantage 3: only one interconnection

between regular and deep N-wells

▽Smaller parasitic capacitances!

Improved 4T1R-based Multiplexer

in[0]

+

BL[0]

WL[0]

in[N-1]

+

BL[N-1]

WL[N-1]

… …

VDD VDD GND GND VDD VDD GND GND

BL[N] WL[N]

ut

VDD,well GNDwell VDD,well GNDwell EN EN EN EN

RB RA

CP,A CP,B

Deep N-Well Regular Well

Metal wire group 1

in[0] VDD,well BL[0]

P0

GND,well

N0

+

in[N-1]

+

BL[N-1]

BL[N] WL[0] WL[N] WL[N-1]

ut

GND

VDD

GND GND GND,well GND,well

VDD VDD ... VDD,well VDD,well Input inverters Output inverter

A

B

C

R0 RN-1

P1 P2 N1

programming current crosstalk current

Deep N-Well

N2

CP,0 CP,N-1

Regular Well

...

Metal wire group1 Metal wire group2

Regular Well

SLIDE 14

14

▲Advantage 3: only one interconnection between

regular and deep N-wells (Cross-section View)

▽Smaller parasitic capacitances!

Improved 4T1R-based Multiplexer

(b)

P++ N+ N+ P+ P+ N++

VDD,well

BL[0] WL[0]

P-Well VDD GND

N++ P+ P+ N+ N+ P++ BL[N] WL[N]

P-Well Deep N-Well

GNDwell

CON TACT MET2 CON TACT

VIA RRAM

P++ N+ N+ P+ P+ in[0] in[0]

GND N-Well

CON TACT MET1 N+ MET1 N+ P+

ut

N++ P+

VDD,well

(a)

P++ N+ N+ P+ P+ N++

Vprog

BL[0] WL[0]

VDD

N+ P+ P+ N+ N+ P++ BL[N] WL[N]

P-Well Deep N-Well P-Well

MET2

VDD,well GNDwell

MET1 CON TACT CON TACT N++

VDD

P++ N+ N+ P+ P+ in[0] in[0]

GND N-Well

P+ P+ N+ N+ P++

GND

CON TACT

VIA

ut

VIA RRAM VIA

Well spacing: L Well spacing: L x y

SLIDE 15

15

▲Share deep N-wells between cascaded multiplexers

▽CMOS logic gates can locate in deep N-wells

Improved 4T1R-based Multiplexer

Deep N-Well

...

inA[0] inA[N-1]

utA

...

inB[0] inB[N-1] M U X M U X 1

utB

CMOS logic gates CMOS logic gates

inA[0]

+ -

BL[N] WL[N]

utA

BL[0] WL[0]

inA[N-1]

+ -

BL[N-1] WL[N-1]

… …

VDD VDD GND GND VDD GND VDD,well GND,well VDD,well GND,well EN EN EN EN VDD GND

MUX0

inB[0]

BL[N] WL[N]

utB

BL[0] WL[0]

inB[N-1]

BL[N-1] WL[N-1]

… …

VDD,well GND GND,well VDD,well VDD GND,well VDD,well GND,well

+

+
GND,well

VDD,well GND VDD EN EN EN EN

MUX1

SLIDE 16

16

▲Close to input inverters or output inverters?

Physical Location of RRAMs

(b)

P++ N+ N+ P+ P+ N++

VDD,well

BL[0] WL[0]

P-Well VDD GND

N++ P+ P+ N+ N+ P++ BL[N] WL[N]

P-Well Deep N-Well

GNDwell

CON TACT MET2 CON TACT

VIA RRAM

P++ N+ N+ P+ P+ in[0] in[0]

GND N-Well

CON TACT MET1 N+ MET1 N+ P+

ut

N++ P+

VDD,well

(a)

P++ N+ N+ P+ P+ N++

Vprog

BL[0] WL[0]

VDD

N+ P+ P+ N+ N+ P++ BL[N] WL[N]

P-Well Deep N-Well P-Well

MET2

VDD,well GNDwell

MET1 CON TACT CON TACT N++

VDD

P++ N+ N+ P+ P+ in[0] in[0]

GND N-Well

P+ P+ N+ N+ P++

GND

CON TACT

VIA

ut

VIA RRAM VIA

Well spacing: L Well spacing: L x y

SLIDE 17

17

▲RC modelling and Minimize Elmore Delay ▲Depend on technology parameters

▽Rinv RLRS CP R □ C□ L

▲Depend on design parameters

▽N xopt

RRAM should be close to input inverters!

Optimal Location of RRAM

SLIDE 18

18

▲Use ASAP 7nm FinFET Process Design Kit

▽W/L=28/20 nm, regular VDD=0.7V ▽Logic transistors: regular Vt ▽Programming transistors: super low Vt

▲RRAM: 10nm feature size

▽Stanford RRAM compact model ▽RHRS=27MΩ, RLRS=1.6kΩ, Iset=Ireset=500µA,

Vset=Vreset=0.9V, CP=4.5aF

▲HSPICE simulation: delay and power results ▲Layout: area results

Experimental Methodology

[1] X. Tang et al., “Accurate power analysis for near-V t RRAM-based FPGA”, FPL, pp. 1-4, 2015.

SLIDE 19

19

▲Baseline CMOS multiplexers

▽Input size from 2 to 32 ▽Transmission gates: 3 fins per FinFET ▽N <= 12, one-level structure ▽N > 12, two-level structure

▲RRAM multiplexers

▽Input size from 2 to 32 ▽Sweep number of fins per FinFET from 1 to 3 ▽One-level structure

Experimental Methodology

[1] X. Tang et al., “Circuit Designs of High-Performance and Low-Power RRAM-Based Multiplexers Based on 4T(ransistor)1R(RAM) Programming Structure”, accepted to IEEE TCAS-I, 2016.

SLIDE 20

20

▲RRAMs can set and reset successfully ▲Functionality of multiplexer is correct

Transient Analysis

SLIDE 21

21

▲Programming transistor

sizing technique

▽Trade-off LRS and

capacitances of programming transistors

▲Metric: Best Delay

▽Three fins per FinFET is

the best choice

▽LRS = ~4.3kΩ (Lowest is

1.6kΩ)

▽Two fins for best PDP

Best Number of Fins

[1] X. Tang et al., “A High-performance Low-power Near-Vt RRAM-based FPGA”, ICFPT, pp. 207-214, 2014.

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 8 10 12 14 16 18 20 22 MUX size

Delay (ps)

14%
21%
15%

Fin no.=1, VDD=0.5V Fin no.=2, VDD=0.5V Fin no.=3, VDD=0.5V Fin no.=1, V =0.6V

DD

Fin no.=1, VDD=0.6V Fin no.=2, VDD=0.6V Fin no.=3, VDD=0.6V Fin no.=1, VDD=0.7V Fin no.=2, VDD=0.7V Fin no.=3, VDD=0.7V

10

(a)

fi nfig

↵

1.94µ 2.70µ

firs

⇥ ↵

fic

SLIDE 22

22

▲Improved design is delay efficient ▲RRAM location: on the top of input inverters!

Naïve vs. Improved RRAM-based Multiplexers

SLIDE 23

23

▲Area reduction

▽1.4x more efficient: input size = 16

RRAM-based vs.CMOS Multiplexers

µ

rea = 2.70µm²

1.73µm 1.12µm Outptut inverter Inputs inverters P rogramming circuits P rogramming circuits

T

tal area = 1.94µm²

b) a)

1.62µm

rea = 2.70µm²

1.94µ Total Area of CMOS MUX = 2.70µm2

1.66µm S R AMS S R AMS S R AMS S R AMS Outptut inverter F irst level S econd level

T

tal area = 2.70µm²

1.73µm µ

rea = 1.94µm²

F irst level Outptut inverter

b)

1.62µm

T

tal area = 2.70µm²

Total Area of RRAM MUX = 1.94µm2 2.70µ

Total Area of RRAM MUX=1.94µm2 Total Area of SRAM MUX=2.70µm2

SLIDE 24

24

▲Delay efficiency when

VDD is same

▽On average 2x efficiency ▽Smaller capacitances

▲Delay efficiency: CMOS

VDD=0.7V while RRAM VDD=0.5V and 0.6V

▽Still 30% delay reduction! ▽RRAM LRS is not

impacted by VDD

RRAM-based vs.CMOS Multiplexers

SLIDE 25

25

▲Power efficiency when

VDD is same

▽On average 2x efficiency ▽Smaller capacitances

▲Power efficiency: CMOS

VDD=0.7V while RRAM VDD=0.5V and 0.6V

▽Up to 5.8 times ▽Without any

performance loss!

RRAM-based vs.CMOS Multiplexers

SLIDE 26

26

▲Naïve 4T1R-based multiplexers have serious

limitations

▽Well organization, transistor breakdown, long

interconnecting wires

▲Proposed 4T1R-based multiplexers address all

these limitations

▽Study optimal location of RRAMs

▲4T1R-based multiplexers are both high-

performance (2x) and low-power (2x) w.r.t. CMOS counterparts

▽Low power achieved without performance loss!

Conclusion

SLIDE 27

27