Implementing Logic in FPGA Embedded Memory Arrays: Heterogeneous - - PowerPoint PPT Presentation

implementing logic in fpga embedded memory arrays
SMART_READER_LITE
LIVE PREVIEW

Implementing Logic in FPGA Embedded Memory Arrays: Heterogeneous - - PowerPoint PPT Presentation

Implementing Logic in FPGA Embedded Memory Arrays: Heterogeneous Memory Architectures Steve Wilton University of British Columbia Vancouver, B.C., Canada stevew@ece.ubc.ca As FPGAs Get Bigger... Embedded Memory is becoming critical


slide-1
SLIDE 1

Implementing Logic in FPGA Embedded Memory Arrays: Heterogeneous Memory Architectures

Steve Wilton University of British Columbia Vancouver, B.C., Canada stevew@ece.ubc.ca

slide-2
SLIDE 2

As FPGAs Get Bigger...

Embedded Memory is becoming critical Implementing Storage on-chip is important:

  • Integration
  • Relax I/O Constraints
  • Speed
  • Flexibility

Today, most FPGAs have large embedded memory arrays

slide-3
SLIDE 3

Problem: If a circuit doesn’t need all memory blocks, valuable chip area wasted Solution: Configure memory blocks as ROMs and use them to implement logic

slide-4
SLIDE 4

Implementing Logic in Memory:

N M P D C B A E Q F G L H K J

slide-5
SLIDE 5

Implementing Logic in Memory:

Two published algorithms: SMAP, EMB_Pack

N M P D C B A E Q F G L H K J N M P A E Q C

slide-6
SLIDE 6

The ability of memory arrays to implement logic depends on the memory array architecture Previous Work: 2Kbit arrays with 8 outputs are good

slide-7
SLIDE 7

Heterogeneous Memory Architectures

Altera Stratix: Three types of memories

M512 Blocks M4K Blocks MegaRAM

slide-8
SLIDE 8

This Talk:

A given: For storage: Several types of memories on a single chip is a good idea In this paper: For logic: 1. Heterogeneous memory architectures: a good idea?

  • 2. How much does it help?
  • 3. What memory sizes are best?
slide-9
SLIDE 9

Methodology:

SMAP Pack as much logic as possible into memory arrays Area Model Packing Ratio = Amount of logic packed Area Architecture Benchmark Circuits Area Amount of Logic Packed

slide-10
SLIDE 10

SMAP Algorithm:

Overall approach:

  • 1. Map to 4-LUTs using Flowmap
  • 2. Pack as many 4-LUTs as possible into arrays

Goal: Maximize number of LUTs that can be packed

N M P D C B A E Q F G L H K J N M P A E Q C

slide-11
SLIDE 11

SMAP Algorithm:

Goal: Maximize number of LUTs that can be packed Four Steps:

  • 1. Choose a “seed node”
  • 2. Choose signals that will become array inputs
  • 3. Choose signals that will become array outputs
  • 4. Insert memory into circuit, and remove 4-LUTs

no longer needed

slide-12
SLIDE 12

Choosing Inputs of Memory Array:

Find maximum-volume d-feasible cut (Flowpack) Cut edges become memory array inputs

Seed Node 8-input memory

slide-13
SLIDE 13

Choosing Outputs of Memory Array:

A bad way to choose output signal: Since D and F fan-out outside the fanin cone, we still need D and F (and their predecessors)

N M P D C B A E Q F G L H K J N M P C E Q F G L H K J D

slide-14
SLIDE 14

Suppose there are two memory outputs:

N M P D C B A E Q F G L H K J N M P E Q F G L H K J N M P D C B A E Q F G L H K J N M P C E C A A F D Q

Bad Solution Better Solution

slide-15
SLIDE 15

Choosing Outputs of Memory Array:

Goal: We want to select the w nodes such that the largest number of nodes can be deleted Problem: For w > 1, it is computationally expensive to check all combinations of w potential

  • utputs

Heuristic:

  • 1. For each potential output individually, find

that node’s maximum fanout-free cone

  • 2. Choose the w nodes with the largest MFFC’s.
slide-16
SLIDE 16

Choosing a Seed Node:

It turns out that the choice of seed node is very important

  • Try all nodes as potential seeds, choose whichever

gives the best results

  • There are ways to speed this up, especially if there

are many arrays

slide-17
SLIDE 17

50 100 150 200 250 300 350 128 256 512 1024 2048 4096 8192 Packed Logic Blocks Bits Per Array

Results: Homogeneous Architectures

slide-18
SLIDE 18

50 100 150 200 250 300 350 128 50 100 150 200 250 300 350 256 512 1024 2048 4096 8192 Area (equiv. logic blocks) Packed Logic Blocks Bits Per Array

Results: Homogeneous Architectures

slide-19
SLIDE 19

0.5 1.0 1.5 2.0 2.5 3.0 128 256 512 1024 2048 4096 8192 Bits Per Array Packing Ratio Logic Blocks Packed Area (Equiv Logic Blocks) Packing Ratio =

Results: Homogeneous Architectures

slide-20
SLIDE 20

Modifying SMAP for Heterogeneous Archs:

SMAP fills arrays sequentially We have looked at two strategies:

  • 1. Fill all large arrays first
  • 2. Fill all small arrays first

Strategy 1 gives better results

slide-21
SLIDE 21

Two Sizes: Four Arrays of Each

Array 1 Size Array 2 Size Packing Density 128 256 512 1024 2048 4096 8192 128 256 512 1024 2048 4096 8192 1.0 1.5 2.0 2.5 3.5 3.0

Homogeneous Results Best: 2048 bits / 128 bits 23 % Improvement

slide-22
SLIDE 22

Observations from our Results:

Trend 2: The more arrays, the higher the gain seen by using a heterogeneous architecture Trend 1: A combination of 2048 / 128 bit arrays is always the best choice

slide-23
SLIDE 23

One Type-1 array and Two Type-2 Arrays:

Array 1 Size (one of these) Array 2 Size (two of these) Packing Density 128 256 512 1024 2048 4096 8192 128 256 512 1024 2048 4096 8192 4.0 2.0 2.5 3.5 3.0 1.0 1.5

slide-24
SLIDE 24

Four Type-1 arrays and Eight Type-2 Arrays:

Array 2 Size (eight of these) Packing Density 256 512 1024 2048 4096 8192 128 256 512 1024 2048 4096 8192 2.0 2.5 3.0 128 Array 1 Size (four of these) 1.5 1.0

slide-25
SLIDE 25

One Type-1 array and Three Type-2 Arrays:

Array 1 Size (one of these) Array 2 Size (three of these) Packing Density 128 256 512 1024 2048 4096 8192 128 256 512 1024 2048 4096 8192 4.0 2.0 2.5 3.5 3.0 1.5

slide-26
SLIDE 26

Three Type-1 arrays and Nine Type-2 Arrays:

Array 1 Size (three of these) Array 2 Size (nine of these) Packing Density 128 256 512 1024 2048 4096 8192 128 256 512 1024 2048 4096 8192 2.0 2.5 1.0 1.5

slide-27
SLIDE 27

Observations from our Results:

Trend 2: The more arrays, the higher the gain seen by using a heterogeneous architecture Trend 1: A combination of 2048 / 128 bit arrays is always the best choice Trend 3: From above, we should have 2048 bit arrays and 128 bit arrays. As the number of arrays increases, more of the arrays should be small.

slide-28
SLIDE 28

One Type-1 array and Three Type-2 Arrays:

Array 1 Size (one of these) Array 2 Size (three of these) Packing Density 128 256 512 1024 2048 4096 8192 128 256 512 1024 2048 4096 8192 4.0 2.0 2.5 3.5 3.0 1.5

Three large arrays and one small array One large array and 3 small arrays Better

slide-29
SLIDE 29

Three Type-1 arrays and Nine Type-2 Arrays:

Array 1 Size (three of these) Array 2 Size (nine of these) Packing Density 128 256 512 1024 2048 4096 8192 128 256 512 1024 2048 4096 8192 2.0 2.5 1.0 1.5

Nine large arrays and 3 small arrays 3 large arrays and 9 small arrays Better

slide-30
SLIDE 30

Things we haven't taken into account:

Speed:

  • Heterogeneous architectures are likely to

give gains in speed (compared to homogeneous) since an array of "just the right size" can be used

  • Right now, SMAP doesn't optimize for speed, but

for homogeneous architectures, there is little impact on speed Routing:

  • With heterogeneous architectures, there may be

longer routes to get to the right memory

  • But not too bad, if only a few memory types
slide-31
SLIDE 31

Summary

Heterogeneous Memory Architectures are efficient when implementing logic

  • Compared to homogeneous architectures

23 % improvement is typical

  • The more arrays, the higher the gain
  • A combination of 2048 / 128 bit arrays is always

the best choice

  • As the number of arrays increases, more of the

arrays should be small.