Implementing Logic in FPGA Embedded Memory Arrays: Heterogeneous - PowerPoint PPT Presentation

Implementing Logic in FPGA Embedded Memory Arrays: Heterogeneous Memory Architectures Steve Wilton University of British Columbia Vancouver, B.C., Canada stevew@ece.ubc.ca

As FPGAs Get Bigger... Embedded Memory is becoming critical Implementing Storage on-chip is important: • Integration • Relax I/O Constraints • Speed • Flexibility Today, most FPGAs have large embedded memory arrays

Problem : If a circuit doesn’t need all memory blocks, valuable chip area wasted Solution : Configure memory blocks as ROMs and use them to implement logic

Implementing Logic in Memory: L N H P J K M Q G D F B C E A

Implementing Logic in Memory: L N N H Q P J K P M M Q G D F B C A C E E A Two published algorithms: SMAP, EMB_Pack

The ability of memory arrays to implement logic depends on the memory array architecture Previous Work: 2Kbit arrays with 8 outputs are good

Heterogeneous Memory Architectures Altera Stratix: Three types of memories MegaRAM M4K Blocks M512 Blocks

This Talk: A given: For storage: Several types of memories on a single chip is a good idea In this paper: For logic: 1. Heterogeneous memory architectures: a good idea? 2. How much does it help? 3. What memory sizes are best?

Methodology: Benchmark Circuits Architecture SMAP Area Model Pack as much logic as possible into memory arrays Amount of Area Logic Packed Packing Ratio = Amount of logic packed Area

SMAP Algorithm: Overall approach: 1. Map to 4-LUTs using Flowmap 2. Pack as many 4-LUTs as possible into arrays L N N H Q P J K P M M Q G D F B C A C E E A Goal : Maximize number of LUTs that can be packed

SMAP Algorithm: Goal: Maximize number of LUTs that can be packed Four Steps: 1. Choose a “seed node” 2. Choose signals that will become array inputs 3. Choose signals that will become array outputs 4. Insert memory into circuit, and remove 4-LUTs no longer needed

Choosing Inputs of Memory Array: Find maximum-volume d-feasible cut (Flowpack) 8-input memory Seed Node Cut edges become memory array inputs

Choosing Outputs of Memory Array: A bad way to choose output signal: L L N N H H P P J K J K M Q M Q G G D D F F B C C E E A Since D and F fan-out outside the fanin cone, we still need D and F (and their predecessors)

Suppose there are two memory outputs: L L N N H H P P J K J K M Q M Q G G D D F F B B C C E E A A N L N P H M P J K Q M Q D G C F E F E A C A Better Solution Bad Solution

Choosing Outputs of Memory Array: Goal : We want to select the w nodes such that the largest number of nodes can be deleted Problem : For w > 1 , it is computationally expensive to check all combinations of w potential outputs Heuristic: 1. For each potential output individually, find that node’s maximum fanout-free cone 2. Choose the w nodes with the largest MFFC’s.

Choosing a Seed Node: It turns out that the choice of seed node is very important - Try all nodes as potential seeds, choose whichever gives the best results - There are ways to speed this up, especially if there are many arrays

Results: Homogeneous Architectures 350 300 250 Packed Logic Blocks 200 150 100 50 0 128 256 512 1024 2048 4096 8192 Bits Per Array

Results: Homogeneous Architectures 350 350 300 300 Area (equiv. logic blocks) 250 250 Packed Logic Blocks 200 200 150 150 100 100 50 50 0 0 128 256 512 1024 2048 4096 8192 Bits Per Array

Results: Homogeneous Architectures 3.0 2.5 Packing Ratio 2.0 1.5 Logic Blocks Packed 1.0 Packing Ratio = Area (Equiv Logic Blocks) 0.5 128 256 512 1024 2048 4096 8192 Bits Per Array

Modifying SMAP for Heterogeneous Archs: SMAP fills arrays sequentially We have looked at two strategies: 1. Fill all large arrays first 2. Fill all small arrays first Strategy 1 gives better results

Two Sizes: Four Arrays of Each 23 % Improvement Best: 2048 bits / 128 bits 3.5 3.0 Packing Density 2.5 2.0 1.5 8192 Homogeneous 4096 Results 1.0 2048 8192 4096 1024 2048 512 1024 Array 1 Size 256 512 256 Array 2 Size 128 128

Observations from our Results: Trend 1: A combination of 2048 / 128 bit arrays is always the best choice Trend 2: The more arrays, the higher the gain seen by using a heterogeneous architecture

One Type-1 array and Two Type-2 Arrays: 4.0 3.5 3.0 Packing Density 2.5 2.0 8192 1.5 4096 1.0 2048 8192 1024 4096 512 2048 1024 Array 1 Size 256 512 Array 2 Size (one of these) 256 128 128 (two of these)

Four Type-1 arrays and Eight Type-2 Arrays: 3.0 2.5 Packing Density 2.0 8192 1.5 4096 1.0 2048 8192 1024 4096 256 2048 Array 1 Size 1024 (four of these) 512 Array 2 Size 512 256 (eight of these) 128 128

One Type-1 array and Three Type-2 Arrays: 4.0 Packing Density 3.5 3.0 2.5 8192 2.0 4096 1.5 2048 1024 8192 4096 512 2048 Array 1 Size 1024 256 Array 2 Size 512 (one of these) 256 (three of these) 128 128

Three Type-1 arrays and Nine Type-2 Arrays: 2.5 Packing Density 2.0 1.5 8192 4096 1.0 2048 8192 1024 4096 512 2048 Array 1 Size 1024 256 512 (three of these) Array 2 Size 256 128 128 (nine of these)

Observations from our Results: Trend 1: A combination of 2048 / 128 bit arrays is always the best choice Trend 2: The more arrays, the higher the gain seen by using a heterogeneous architecture Trend 3: From above, we should have 2048 bit arrays and 128 bit arrays. As the number of arrays increases, more of the arrays should be small.

One Type-1 array and Three Type-2 Arrays: Better 4.0 Packing Density 3.5 One large array 3.0 and 3 small arrays 2.5 8192 2.0 Three large arrays 4096 and one small array 1.5 2048 1024 8192 4096 512 2048 Array 1 Size 1024 256 Array 2 Size 512 (one of these) 256 (three of these) 128 128

Three Type-1 arrays and Nine Type-2 Arrays: Better 2.5 Packing Density 3 large arrays 2.0 and 9 small arrays 1.5 8192 4096 Nine large arrays 1.0 2048 and 3 small arrays 8192 1024 4096 512 2048 Array 1 Size 1024 256 512 (three of these) Array 2 Size 256 128 128 (nine of these)

Things we haven't taken into account: Speed: - Heterogeneous architectures are likely to give gains in speed (compared to homogeneous) since an array of "just the right size" can be used - Right now, SMAP doesn't optimize for speed, but for homogeneous architectures, there is little impact on speed Routing: - With heterogeneous architectures, there may be longer routes to get to the right memory - But not too bad, if only a few memory types

Summary Heterogeneous Memory Architectures are efficient when implementing logic - Compared to homogeneous architectures 23 % improvement is typical - The more arrays, the higher the gain - A combination of 2048 / 128 bit arrays is always the best choice - As the number of arrays increases, more of the arrays should be small.

Implementing Logic in FPGA Embedded Memory Arrays: Heterogeneous - PowerPoint PPT Presentation

Implementing Logic in FPGA Embedded Memory Arrays: Heterogeneous Memory Architectures Steve Wilton University of British Columbia Vancouver, B.C., Canada stevew@ece.ubc.ca As FPGAs Get Bigger... Embedded Memory is becoming critical

Arrays (2) Higher-Dimensional Arrays Arrays of Character Strings Topics Variables and Arrays

Data Abstraction Copying Arrays. Sorting Arrays. 2D Arrays. Janyl Jumadinova September 30 and

WWW.FPGA What is an FPGA? Field Programmable Gate Array Introduction to FPGA designs

An introduction to FPGA-based acceleration of neural networks Marco Pagani 1 What is an FPGA?

Lecture 11 Multidimensional arrays Two-dimensional Arrays Just an array of arrays useful

Lecture 11 Multidimensional arrays Two-dimensional Arrays Just an array of arrays useful

Arrays Arrays and Methods Searching Sorting Arrays Reading: => Continue with

Objectives: Discuss arrays Syntax Multi-dimensional arrays Arrays

Open Source FPGA Toolchain FPGA LSE Summer Week 2015 iCE40 Flow Conclusion Vincent Gatine

Tips about an FPGA 02/09/2018 J.C. special topic FPGA ( field-programmable gate array ) FPGA :

FPGA What is a FPGA? How FPGAs work How do they work? Manufacturers

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Preallocating Resources for Distributed Memory based FPGA Debug Robert Hale & Brad Hutchings

Current Trends in Hybrid FPGA/CPU Devices Hybrid FPGA/CPU Devices Xilinx Zynq Series Real

Public FPGA based DM Public FPGA based DMA Atta A Attacking king UlfFrisk Agenda Background

Data, memory, pointer Pointers and arrays 1-1 Data, memory memory address: every byte

Almost homogeneous toric varieties Ivan Arzhantsev Moscow State University based on a joint work

Gains from Openness with Heterogenous Firms Phemelo Tamasiga 1 1 Bielefeld Graduate School of

Theoretical foundations Ingredients of choice theory Michel Bierlaire Introduction to choice

Building a Caring and Inclusive Home For All with Red Cross Junior Hi! Im Henry the Helpful!

Matrix Calculations: Solutions of Systems of Linear Equations A. Kissinger Institute for

Technical challenges NRAMM Workshop Scripps - 8th Nov 2009 Richard Henderson State of the field

Heterogeneity in Data- Driven Live Streaming: Blessing or Curse? Fabien Mathieu Hot-P2P,

Aid effectiveness: have we learnt anything? Sam Jones University of Copenhagen September 2015 1

Implementing Logic in FPGA Embedded Memory Arrays: Heterogeneous - PowerPoint PPT Presentation

Implementing Logic in FPGA Embedded Memory Arrays: Heterogeneous Memory Architectures Steve Wilton University of British Columbia Vancouver, B.C., Canada stevew@ece.ubc.ca As FPGAs Get Bigger... Embedded Memory is becoming critical

Arrays (2) Higher-Dimensional Arrays Arrays of Character Strings Topics Variables and Arrays

Data Abstraction Copying Arrays. Sorting Arrays. 2D Arrays. Janyl Jumadinova September 30 and

WWW.FPGA What is an FPGA? Field Programmable Gate Array Introduction to FPGA designs

An introduction to FPGA-based acceleration of neural networks Marco Pagani 1 What is an FPGA?

Lecture 11 Multidimensional arrays Two-dimensional Arrays Just an array of arrays useful

Lecture 11 Multidimensional arrays Two-dimensional Arrays Just an array of arrays useful

Arrays Arrays and Methods Searching Sorting Arrays Reading: =&gt; Continue with

Objectives: Discuss arrays Syntax Multi-dimensional arrays Arrays

Open Source FPGA Toolchain FPGA LSE Summer Week 2015 iCE40 Flow Conclusion Vincent Gatine

Tips about an FPGA 02/09/2018 J.C. special topic FPGA ( field-programmable gate array ) FPGA :

FPGA What is a FPGA? How FPGAs work How do they work? Manufacturers

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Preallocating Resources for Distributed Memory based FPGA Debug Robert Hale &amp; Brad Hutchings

Current Trends in Hybrid FPGA/CPU Devices Hybrid FPGA/CPU Devices Xilinx Zynq Series Real

Public FPGA based DM Public FPGA based DMA Atta A Attacking king UlfFrisk Agenda Background

Data, memory, pointer Pointers and arrays 1-1 Data, memory memory address: every byte

Almost homogeneous toric varieties Ivan Arzhantsev Moscow State University based on a joint work

Gains from Openness with Heterogenous Firms Phemelo Tamasiga 1 1 Bielefeld Graduate School of

Theoretical foundations Ingredients of choice theory Michel Bierlaire Introduction to choice

Building a Caring and Inclusive Home For All with Red Cross Junior Hi! Im Henry the Helpful!

Matrix Calculations: Solutions of Systems of Linear Equations A. Kissinger Institute for

Technical challenges NRAMM Workshop Scripps - 8th Nov 2009 Richard Henderson State of the field

Heterogeneity in Data- Driven Live Streaming: Blessing or Curse? Fabien Mathieu Hot-P2P,

Aid effectiveness: have we learnt anything? Sam Jones University of Copenhagen September 2015 1

Arrays Arrays and Methods Searching Sorting Arrays Reading: => Continue with

Preallocating Resources for Distributed Memory based FPGA Debug Robert Hale & Brad Hutchings