Multj-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapatj, and Greg Stefgan ECE, University of Toronto 1
Multj-Ported Memories (MPM) • MPM: Memory with more than 2 ports • Many uses: – register fjles – queues/bufgers … … • FPGA BRAMs: – have only 2 ports • Building MPMs: Write Ports Read Ports – multjple BRAMs – logic elements (ALMs/LEs) – clever combinatjons 2
Example: 2W/2R MPM How can we build this? 3
2W/2R: Pure-ALMs/LEs Scales very poorly with memory depth 4
1W/2R: Replicated BRAMS W 0 X M 0 R 0 M 1 R 1 Fairly effjcient Only one write port 5
2W/2R Banked BRAMs W 1 X W 0 M 0 R 0 M 1 R 1 Fragmented data Multjple write ports 6
2W/2R “Multjpumping” W 0 R 0 M 0 R 1 W 1 Divides clock speed No fragmentatjon 7
Review: The Live Value Table (LVT) Approach (FPGA’10) Effjcient Multj-Ported Memories for FPGAs , Eric LaForest and J. Gregory Stefgan, Internatjonal Symposium on Field-Programmable Gate Arrays, Monterey, CA, February, 2010. 8
LVT-Based MPM LVT: remembers Bank for two which bank has write ports, LVT most recently- Muxes replicate to writuen value select bank provide read to read from ports 9
LVT-Based MPM Many ALMs! Signifjcant Multjplexing! Punchline: LVT is a big freq win, but... 10
An XOR Approach 11
XOR • XOR basics: ⊕ A 0 = A ⊕ B B = 0 • Implicatjon: ⊕ ⊕ A B B = A Can we exploit XOR to build betuer MPMs? Intuitjon: avoid LVT-table, multjplexing 12
2W/2R XOR Design R 0 Goal: a read is only an XOR operatjon 13
2W/2R XOR Design ⊕ A OLD ⊕ ⊕ A OLD OLD R 0 =A ⊕ A OLD OLD OLD Focus on one locatjon for now 14
2W/2R XOR Design ⊕ A A OLD W 0 ⊕ A OLD OLD R 0 OLD XOR new value with old value 15
2W/2R XOR Design W 0 R 0 Support multjple locatjons, two write ports 16
2W/2R XOR Design A W 0 ⊕ A OLD1 OLD2 R 0 B W 1 OLD1 ⊕ B OLD2 Most-recently-writuen bank holds new value XOR old(s) 17
2W/2R XOR Design W 0 R 0 W 1 R 1 Add support for second read port---done! (almost) 18
2W/2R XOR Design A W 0 TOCK R 0 W 1 TICK R 1 Writjng requires reading: hence 2 cycles to write! Solutjon: need pipelining to avoid stalling 19
2W/2R XOR Design A W 0 Read? R 0 W 1 TICK R 1 What if read a locatjon one cycle afuer writuen? Solutjon: bypass with forwarding logic 20
Generalized XOR Design 21
Generalized XOR Design 22
LVT vs XOR 23
Methodology Use Quartus 10.0 to target Stratjx IV − Favor speed over area, optjmize − Average over 10 seeds to get Fmax Measure area as Total Equivalent Area (TEA) − Expresses area in a single unit (ALMs) − 1 M9K == 28.7ALMs ** Measure a large design space − Depth: 32-8192 memory locatjons − Ports: 2W/4R, 4W/8R, 8W/16R ** H. Wong, J. Rose and V. Betz, "Comparing FPGA vs. Custom CMOS and the Impact on Processor Microarchitecture," ACM Int. Symp.on FPGAs , 2011 24
Example Layout: 8192-deep 2W/4R XOR LVT Signifjcant resource diversity! 25
2W/4R Faster 8192 XOR: CAD anomaly 15% faster, 2x smaller Smaller (log) 26 LVT betuer for small designs, XOR betuer for large
Navigatjng the Design Space (2W/4R) Which is best? That depends... 27
Summary 2W/4R 4W/8R 8W/16R Use LVT when: • want to minimize BRAMs • building <= 128 depth else use XOR, i.e. when: • >= 256 & spare BRAMS 28
Conclusions • Use LVT when – building up to 128-entry designs – you want to minimize BRAM usage • Use XOR when – building 256-entry or larger designs – you want to minimize ALM usage • Interestjng Library/Generator? – help the designer automatjcally navigate this space • Further work – Exploring “true-dual-port” mode, stalls, power – Results on other vendor’s FPGAs 29
Recommend
More recommend