implementing logic in fpga embedded memory arrays
play

Implementing Logic in FPGA Embedded Memory Arrays: Heterogeneous - PowerPoint PPT Presentation

Implementing Logic in FPGA Embedded Memory Arrays: Heterogeneous Memory Architectures Steve Wilton University of British Columbia Vancouver, B.C., Canada stevew@ece.ubc.ca As FPGAs Get Bigger... Embedded Memory is becoming critical


  1. Implementing Logic in FPGA Embedded Memory Arrays: Heterogeneous Memory Architectures Steve Wilton University of British Columbia Vancouver, B.C., Canada stevew@ece.ubc.ca

  2. As FPGAs Get Bigger... Embedded Memory is becoming critical Implementing Storage on-chip is important: • Integration • Relax I/O Constraints • Speed • Flexibility Today, most FPGAs have large embedded memory arrays

  3. Problem : If a circuit doesn’t need all memory blocks, valuable chip area wasted Solution : Configure memory blocks as ROMs and use them to implement logic

  4. Implementing Logic in Memory: L N H P J K M Q G D F B C E A

  5. Implementing Logic in Memory: L N N H Q P J K P M M Q G D F B C A C E E A Two published algorithms: SMAP, EMB_Pack

  6. The ability of memory arrays to implement logic depends on the memory array architecture Previous Work: 2Kbit arrays with 8 outputs are good

  7. Heterogeneous Memory Architectures Altera Stratix: Three types of memories MegaRAM M4K Blocks M512 Blocks

  8. This Talk: A given: For storage: Several types of memories on a single chip is a good idea In this paper: For logic: 1. Heterogeneous memory architectures: a good idea? 2. How much does it help? 3. What memory sizes are best?

  9. Methodology: Benchmark Circuits Architecture SMAP Area Model Pack as much logic as possible into memory arrays Amount of Area Logic Packed Packing Ratio = Amount of logic packed Area

  10. SMAP Algorithm: Overall approach: 1. Map to 4-LUTs using Flowmap 2. Pack as many 4-LUTs as possible into arrays L N N H Q P J K P M M Q G D F B C A C E E A Goal : Maximize number of LUTs that can be packed

  11. SMAP Algorithm: Goal: Maximize number of LUTs that can be packed Four Steps: 1. Choose a “seed node” 2. Choose signals that will become array inputs 3. Choose signals that will become array outputs 4. Insert memory into circuit, and remove 4-LUTs no longer needed

  12. Choosing Inputs of Memory Array: Find maximum-volume d-feasible cut (Flowpack) 8-input memory Seed Node Cut edges become memory array inputs

  13. Choosing Outputs of Memory Array: A bad way to choose output signal: L L N N H H P P J K J K M Q M Q G G D D F F B C C E E A Since D and F fan-out outside the fanin cone, we still need D and F (and their predecessors)

  14. Suppose there are two memory outputs: L L N N H H P P J K J K M Q M Q G G D D F F B B C C E E A A N L N P H M P J K Q M Q D G C F E F E A C A Better Solution Bad Solution

  15. Choosing Outputs of Memory Array: Goal : We want to select the w nodes such that the largest number of nodes can be deleted Problem : For w > 1 , it is computationally expensive to check all combinations of w potential outputs Heuristic: 1. For each potential output individually, find that node’s maximum fanout-free cone 2. Choose the w nodes with the largest MFFC’s.

  16. Choosing a Seed Node: It turns out that the choice of seed node is very important - Try all nodes as potential seeds, choose whichever gives the best results - There are ways to speed this up, especially if there are many arrays

  17. Results: Homogeneous Architectures 350 300 250 Packed Logic Blocks 200 150 100 50 0 128 256 512 1024 2048 4096 8192 Bits Per Array

  18. Results: Homogeneous Architectures 350 350 300 300 Area (equiv. logic blocks) 250 250 Packed Logic Blocks 200 200 150 150 100 100 50 50 0 0 128 256 512 1024 2048 4096 8192 Bits Per Array

  19. Results: Homogeneous Architectures 3.0 2.5 Packing Ratio 2.0 1.5 Logic Blocks Packed 1.0 Packing Ratio = Area (Equiv Logic Blocks) 0.5 128 256 512 1024 2048 4096 8192 Bits Per Array

  20. Modifying SMAP for Heterogeneous Archs: SMAP fills arrays sequentially We have looked at two strategies: 1. Fill all large arrays first 2. Fill all small arrays first Strategy 1 gives better results

  21. Two Sizes: Four Arrays of Each 23 % Improvement Best: 2048 bits / 128 bits 3.5 3.0 Packing Density 2.5 2.0 1.5 8192 Homogeneous 4096 Results 1.0 2048 8192 4096 1024 2048 512 1024 Array 1 Size 256 512 256 Array 2 Size 128 128

  22. Observations from our Results: Trend 1: A combination of 2048 / 128 bit arrays is always the best choice Trend 2: The more arrays, the higher the gain seen by using a heterogeneous architecture

  23. One Type-1 array and Two Type-2 Arrays: 4.0 3.5 3.0 Packing Density 2.5 2.0 8192 1.5 4096 1.0 2048 8192 1024 4096 512 2048 1024 Array 1 Size 256 512 Array 2 Size (one of these) 256 128 128 (two of these)

  24. Four Type-1 arrays and Eight Type-2 Arrays: 3.0 2.5 Packing Density 2.0 8192 1.5 4096 1.0 2048 8192 1024 4096 256 2048 Array 1 Size 1024 (four of these) 512 Array 2 Size 512 256 (eight of these) 128 128

  25. One Type-1 array and Three Type-2 Arrays: 4.0 Packing Density 3.5 3.0 2.5 8192 2.0 4096 1.5 2048 1024 8192 4096 512 2048 Array 1 Size 1024 256 Array 2 Size 512 (one of these) 256 (three of these) 128 128

  26. Three Type-1 arrays and Nine Type-2 Arrays: 2.5 Packing Density 2.0 1.5 8192 4096 1.0 2048 8192 1024 4096 512 2048 Array 1 Size 1024 256 512 (three of these) Array 2 Size 256 128 128 (nine of these)

  27. Observations from our Results: Trend 1: A combination of 2048 / 128 bit arrays is always the best choice Trend 2: The more arrays, the higher the gain seen by using a heterogeneous architecture Trend 3: From above, we should have 2048 bit arrays and 128 bit arrays. As the number of arrays increases, more of the arrays should be small.

  28. One Type-1 array and Three Type-2 Arrays: Better 4.0 Packing Density 3.5 One large array 3.0 and 3 small arrays 2.5 8192 2.0 Three large arrays 4096 and one small array 1.5 2048 1024 8192 4096 512 2048 Array 1 Size 1024 256 Array 2 Size 512 (one of these) 256 (three of these) 128 128

  29. Three Type-1 arrays and Nine Type-2 Arrays: Better 2.5 Packing Density 3 large arrays 2.0 and 9 small arrays 1.5 8192 4096 Nine large arrays 1.0 2048 and 3 small arrays 8192 1024 4096 512 2048 Array 1 Size 1024 256 512 (three of these) Array 2 Size 256 128 128 (nine of these)

  30. Things we haven't taken into account: Speed: - Heterogeneous architectures are likely to give gains in speed (compared to homogeneous) since an array of "just the right size" can be used - Right now, SMAP doesn't optimize for speed, but for homogeneous architectures, there is little impact on speed Routing: - With heterogeneous architectures, there may be longer routes to get to the right memory - But not too bad, if only a few memory types

  31. Summary Heterogeneous Memory Architectures are efficient when implementing logic - Compared to homogeneous architectures 23 % improvement is typical - The more arrays, the higher the gain - A combination of 2048 / 128 bit arrays is always the best choice - As the number of arrays increases, more of the arrays should be small.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend