Multj-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, - - PowerPoint PPT Presentation

multj ported memories for fpgas via xor
SMART_READER_LITE
LIVE PREVIEW

Multj-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, - - PowerPoint PPT Presentation

Multj-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapatj, and Greg Stefgan ECE, University of Toronto 1 Multj-Ported Memories (MPM) MPM: Memory with more than 2 ports Many uses: register fjles queues/bufgers


slide-1
SLIDE 1

1

Multj-ported Memories for FPGAs via XOR

Eric LaForest, Ming Liu, Emma Rapatj, and Greg Stefgan ECE, University of Toronto

slide-2
SLIDE 2

Multj-Ported Memories (MPM)

  • MPM: Memory with more than 2 ports
  • Many uses:

– register fjles – queues/bufgers

  • FPGA BRAMs:

– have only 2 ports

  • Building MPMs:

– multjple BRAMs – logic elements (ALMs/LEs) – clever combinatjons

Write Ports Read Ports

2

… …

slide-3
SLIDE 3

3

Example: 2W/2R MPM

How can we build this?

slide-4
SLIDE 4

4

2W/2R: Pure-ALMs/LEs

Scales very poorly with memory depth 

slide-5
SLIDE 5

5

1W/2R: Replicated BRAMS

Fairly effjcient  Only one write port 

M0 M1 R0 R1 W0 X

slide-6
SLIDE 6

6

2W/2R Banked BRAMs

Multjple write ports  Fragmented data 

M0 M1 R0 R1 W0 W1X

slide-7
SLIDE 7

7

2W/2R “Multjpumping”

No fragmentatjon  Divides clock speed 

M0 R0 W0 W1 R1

slide-8
SLIDE 8

8

Review: The Live Value Table (LVT) Approach (FPGA’10)

Effjcient Multj-Ported Memories for FPGAs, Eric LaForest and J. Gregory Stefgan, Internatjonal Symposium on Field-Programmable Gate Arrays, Monterey, CA, February, 2010.

slide-9
SLIDE 9

9

LVT-Based MPM

Bank for two write ports, replicate to provide read ports Muxes select bank to read from LVT: remembers which bank has most recently- writuen value

LVT

slide-10
SLIDE 10

10

LVT-Based MPM

Signifjcant Multjplexing! Many ALMs!

Punchline: LVT is a big freq win, but...

slide-11
SLIDE 11

11

An XOR Approach

slide-12
SLIDE 12

XOR

  • XOR basics:

A 0 = A ⊕ B B = 0 ⊕

  • Implicatjon:

A B B = A ⊕ ⊕

12

Can we exploit XOR to build betuer MPMs? Intuitjon: avoid LVT-table, multjplexing

slide-13
SLIDE 13

2W/2R XOR Design

13

R0

Goal: a read is only an XOR operatjon

slide-14
SLIDE 14

2W/2R XOR Design

14

R0

Focus on one locatjon for now

OLD OLD A OLD ⊕ A OLD ⊕ A OLD OLD ⊕ ⊕ =A

slide-15
SLIDE 15

2W/2R XOR Design

15

R0 W0

XOR new value with old value

A OLD OLD A OLD ⊕ A OLD ⊕

slide-16
SLIDE 16

2W/2R XOR Design

16

R0 W0

Support multjple locatjons, two write ports

slide-17
SLIDE 17

2W/2R XOR Design

17

R0 W1 W0

Most-recently-writuen bank holds new value XOR old(s)

A OLD1 ⊕ OLD1

A

B OLD2 ⊕ OLD2 B

slide-18
SLIDE 18

2W/2R XOR Design

18

R0 R1 W1 W0

Add support for second read port---done! (almost)

slide-19
SLIDE 19

2W/2R XOR Design

19

R0 R1 W1 W0

Writjng requires reading: hence 2 cycles to write!

TICK

A

Solutjon: need pipelining to avoid stalling

TOCK

slide-20
SLIDE 20

2W/2R XOR Design

20

R0 R1 W1 W0

What if read a locatjon one cycle afuer writuen?

TICK

A

Solutjon: bypass with forwarding logic

Read?

slide-21
SLIDE 21

Generalized XOR Design

21

slide-22
SLIDE 22

Generalized XOR Design

22

slide-23
SLIDE 23

23

LVT vs XOR

slide-24
SLIDE 24

24

Methodology

Use Quartus 10.0 to target Stratjx IV

− Favor speed over area, optjmize − Average over 10 seeds to get Fmax

Measure area as Total Equivalent Area (TEA)

− Expresses area in a single unit (ALMs) − 1 M9K == 28.7ALMs **

Measure a large design space

− Depth: 32-8192 memory locatjons − Ports: 2W/4R, 4W/8R, 8W/16R

** H. Wong, J. Rose and V. Betz, "Comparing FPGA vs. Custom CMOS and the Impact on Processor Microarchitecture," ACM Int. Symp.on FPGAs, 2011

slide-25
SLIDE 25

Example Layout: 8192-deep 2W/4R

25

Signifjcant resource diversity! LVT XOR

slide-26
SLIDE 26

2W/4R

26

LVT betuer for small designs, XOR betuer for large

8192 XOR: 15% faster, 2x smaller CAD anomaly Faster Smaller (log)

slide-27
SLIDE 27

Navigatjng the Design Space (2W/4R)

27

Which is best? That depends...

slide-28
SLIDE 28

Summary

28

2W/4R 4W/8R 8W/16R

Use LVT when:

  • want to minimize BRAMs
  • building <= 128 depth

else use XOR, i.e. when:

  • >= 256 & spare BRAMS
slide-29
SLIDE 29

Conclusions

  • Use LVT when

– building up to 128-entry designs – you want to minimize BRAM usage

  • Use XOR when

– building 256-entry or larger designs – you want to minimize ALM usage

  • Interestjng Library/Generator?

– help the designer automatjcally navigate this space

  • Further work

– Exploring “true-dual-port” mode, stalls, power – Results on other vendor’s FPGAs

29