Optimizing Under Abstraction: Using Prefetching to Improve FPGA - PowerPoint PPT Presentation

Optimizing Under Abstraction: Using Prefetching to Improve FPGA Performance Hsin-Jung Yang † , Kermin E. Fleming ‡ , Michael Adler ‡ , and Joel Emer †‡ † Massachusetts Institute of Technology ‡ Intel Corporation September 3rd, FPL 2013

Motivation • Moore’s Law – Increasing FPGA size and capability • Use case for FPGA: FPGA A User Program A

Motivation • Moore’s Law – Increasing FPGA size and capability • Use case for FPGA: FPGA A FPGA B User User Program B Program A SRAM Ethernet DRAM

Motivation • Moore’s Law – Increasing FPGA size and capability • Use case for FPGA: FPGA A FPGA B User User Program B Program A SRAM Ethernet DRAM circuit verification

Motivation • Moore’s Law – Increasing FPGA size and capability • Use case for FPGA: FPGA A FPGA B User User Program B Program A SRAM Ethernet DRAM circuit algorithm verification acceleration

Motivation • Moore’s Law – Increasing FPGA size and capability • Use case for FPGA: FPGA C FPGA A FPGA B User SRAM User Program B User Program B Program A SRAM SRAM Ethernet SRAM Ethernet SRAM DRAM LUTs DRAM PCIE DRAM circuit algorithm verification acceleration

Motivation • Moore’s Law – Increasing FPGA size and capability • Use case for FPGA: FPGA C FPGA A FPGA B User SRAM User Program B User Program B’ Program A SRAM SRAM Ethernet SRAM Ethernet SRAM DRAM LUTs DRAM PCIE DRAM circuit algorithm verification acceleration

Abstraction • Goal: making FPGAs easier to use Processor A C++/Python/Perl Application Software Library Operating System (config. A) Memory Device CPU

Abstraction • Goal: making FPGAs easier to use Processor B C++/Python/Perl Application Software Library Operating System (config. B) Memory’ Device’ CPU’

Abstraction • Goal: making FPGAs easier to use FPGA A Processor B User Program C++/Python/Perl Application Interface Software Library Abstraction (config. A) Operating System (config. B) SRAM Ethernet Memory’ Device’ CPU’

Abstraction • Goal: making FPGAs easier to use FPGA B Processor B User Program C++/Python/Perl Application Interface Software Library Abstraction (config. B) Operating System (config. B) DRAM PCIe Memory’ Device’ CPU’ Unused resources

Abstraction • Goal: making FPGAs easier to use FPGA B Processor B User Program C++/Python/Perl Application Interface Software Library Abstraction (config. B) Operating System (config. B) Optimization Memory’ Device’ CPU’ DRAM PCIe • Optimization under abstraction – Automatically accelerate FPGA applications – Provide FREE performance gain

Memory Abstraction FPGA Block RAMs Client Client Client addr din dout RAM RAM RAM wen Block Block Block interface MEMORY_INTERFACE input: clk readReq (addr); write (addr, din); A1 addr output: D1 // dout is available at the next cycle of readReq dout readResp () if (readReq fired previous cycle); endinterface

Memory Abstraction FPGA Block RAMs Client Client Client addr din dout RAM RAM RAM wen valid Block Block Block interface MEMORY_INTERFACE input: clk readReq (addr); A1 addr write (addr, din); output: D1 dout // dout is available when response is ready readResp () if (valid == True); valid endinterface

LEAP Scratchpads Scratchpads Client Client Client Scratchpad Client Private Cache Private Cache Private Cache Scratchpad Scratchpad Scratchpad Interface Interface Interface Connector Central Cache Platform Scratchpad Controller Host Memory M. Adler et al. , “LEAP Scratchpads,” in FPGA, 2011.

LEAP Scratchpads Processor Scratchpads Client Client Client Scratchpad Client Private Cache Private Cache Private Cache Scratchpad Scratchpad Scratchpad Interface Interface Interface Connector Central Cache Platform Scratchpad Controller Host Memory M. Adler et al. , “LEAP Scratchpads,” in FPGA, 2011.

Scratchpad Optimization Automatically accelerate memory-using FPGA programs • Reduce scratchpad latency • Leverage unused resources • Learn from optimization techniques in processors – Larger caches, greater associativity – Better cache policies – Cache prefetching

Talk Outline • Motivation • Introduction to LEAP Scratchpads • Prefetching in FPGAs vs. in processors • Scratchpad Prefetcher Microarchitecture • Evaluation and Prefetch Optimization • Conclusion

Prefetching Techniques Comparison of prefetching techniques and platforms Static Prefetching Dynamic Prefetching Platform Processor FPGA Processor FPGA User/ Hardware How? User Compiler Compiler manufacturer   No code change   High prefetch accuracy    No instruction overhead   Runtime information

Dynamic Prefetching in Processor Classic processor dynamic prefetching policies • When to prefetch – Prefetch on cache miss – Prefetch on cache miss and prefetch hit • Also called tagged prefetch • What to prefetch – Always prefetch next memory block – Learn stride-access patterns

Dynamic Prefetching in Processor Stride prefetching • L1 cache: PC-based stride prefetching • L2 cache: address-based stride prefetching (fully associative cache of learners) Previous Tag Stride State Address 0xa001 0x1008 4 Steady learner 1 learner 2 0xa002 0x2000 0 Initial learner 3

Dynamic Prefetching on FPGAs • Easier: – Cleaner streaming memory accesses – No need for PC as a filter to separate streams – Fixed (usually plenty of) resources • Harder: – Back-to-back memory accesses – Inefficient to implement CAM

Dynamic Prefetching on FPGAs • Easier: – Cleaner streaming memory accesses – No need for PC as a filter to separate streams – Fixed (usually plenty of) resources • Harder: – Back-to-back memory accesses – Inefficient to implement CAM Scratchpad prefetcher uses address-based stride prefetching with a larger set of direct-mapped learners

Talk Outline • Motivation • Introduction to LEAP Scratchpads • Prefetching in FPGAs vs. in processors • Scratchpad Prefetcher Design • Evaluation and Prefetch Optimization • Conclusion

Scratchpad Prefetcher Client Client Client Scratchpad Client Private Private Private Cache Cache Cache Scratchpad Scratchpad Scratchpad Interface Interface Interface Connector Central Cache Platform Scratchpad Controller Host Memory

Scratchpad Prefetcher Client Client Client Scratchpad Client Private Private Private Prefetcher Prefetcher Prefetcher Cache Cache Cache Scratchpad Scratchpad Scratchpad Interface Interface Interface Connector Central Cache Platform Scratchpad Controller Host Memory

Scratchpad Prefetching Policy • When to prefetch – Cache line miss / prefetch hit – Prefetcher learns the stride pattern • What to prefetch – Prefetch address: P = L + s * d – Cache line address: L – Learned stride: S – Look-ahead distance: d

Scratchpad Prefetching Policy • Look-ahead distance: – Small distance? prefetch benefit – Large distance? cache pollution – Suitable distance for different programs & platforms?

Scratchpad Prefetching Policy • Look-ahead distance: – Small distance? prefetch benefit – Large distance? cache pollution – Suitable distance for different programs & platforms? Dynamically adjust look-ahead distance

Scratchpad Prefetching Policy • Look-ahead distance: – Small distance? prefetch benefit – Large distance? cache pollution – Suitable distance for different programs & platforms? Dynamically adjust look-ahead distance Issued prefetch

Scratchpad Prefetching Policy • Look-ahead distance: – Small distance? prefetch benefit – Large distance? cache pollution – Suitable distance for different programs & platforms? Dynamically adjust look-ahead distance To Memory Issued prefetch Dropped

Scratchpad Prefetching Policy • Look-ahead distance: – Small distance? prefetch benefit – Large distance? cache pollution – Suitable distance for different programs & platforms? Dynamically adjust look-ahead distance To Memory Issued prefetch Dropped by hit Dropped Dropped by busy

Optimizing Under Abstraction: Using Prefetching to Improve FPGA - PowerPoint PPT Presentation

Optimizing Under Abstraction: Using Prefetching to Improve FPGA Performance Hsin-Jung Yang , Kermin E. Fleming , Michael Adler , and Joel Emer Massachusetts Institute of Technology Intel Corporation September 3rd, FPL

Data Abstraction Announcements Data Abstraction Data Abstraction 4 Data Abstraction

Predicate Abstraction with SATABS Existential Abstraction Predicate Abstraction for Software

Data Abstraction Announcements Data Abstraction Data Abstraction Programmers Compound

Optimizing monitoring networks for Optimizing monitoring networks for Optimizing monitoring

Point, Line, & Plane 1 Abstraction Abstraction is the act of considering something as a

Chapter 3: Data Abstraction Modularity and Abstraction Abstraction, modularity, information

Managing Water Abstraction Reforming Abstraction and Modernising Regulation Richard Austen Water

Predicate Abstraction with SATABS Version 1.0, 2010 Outline Introduction Existential

61A Lecture 18 Announcements Sequences The Sequence Abstraction 4 The Sequence Abstraction

CS 1331 Introduction to Object Oriented Programming Data Abstraction Christopher Simpkins

On Visual Abstraction Ivan Viola Visual Abstraction Fundamental concept in visualization and

Lecture 11: Abstraction I ntro. to Programming, lecture 11: Abstraction 2 Topics for today

CS 61A Lecture 9 Friday, September 14 The Sequence Abstraction 2 The Sequence Abstraction red,

The Power of Abstraction The Power of Abstraction Barbara Liskov October 2009 October 2009

Abstraction in Cryptography Ueli Maurer ETH Zurich CRYPTO 2009, August 19, 2009 Abstraction in

What is abstraction? What is abstraction? Workable answer a blurring of details Idea:

Software Architectural Descriptions in CCS and PI-calculus Sandeep Nimmakuri Guide:Prof.

OpenPipe freedom for your fingers OSHW bagpipes & beyond... A bit of history Back

Curriculum Connectors 2 0 1 9 -2 0 2 0 February 2 0 2 0 Departm ents Standard-Based Instruction

Although Weve Come to the End of the Road(map): The Future of CMOS Nicole DiLello 6.Insight

Mu2e Calorimeter ~ 680 CsI crystals (Square, side: 34 mm) for the first disk ~ 680 CsI crystals

End-to-end Analysis and Design of a Drone Flight Controller Zhuoqun Cheng, Richard West, Craig

Future Trends in Robotics Commercial Vehicle Megatrends India 2012 Raj Singh Rathee KUKA

FIELD BUSES ECE422-DATA COMMUNICATIONS & COMPUTER NETWORKS October 2020 WHAT IS A FIELDBUS?

Optimizing Under Abstraction: Using Prefetching to Improve FPGA - PowerPoint PPT Presentation

Optimizing Under Abstraction: Using Prefetching to Improve FPGA Performance Hsin-Jung Yang , Kermin E. Fleming , Michael Adler , and Joel Emer Massachusetts Institute of Technology Intel Corporation September 3rd, FPL

Data Abstraction Announcements Data Abstraction Data Abstraction 4 Data Abstraction

Predicate Abstraction with SATABS Existential Abstraction Predicate Abstraction for Software

Data Abstraction Announcements Data Abstraction Data Abstraction Programmers Compound

Optimizing monitoring networks for Optimizing monitoring networks for Optimizing monitoring

Point, Line, &amp; Plane 1 Abstraction Abstraction is the act of considering something as a

Chapter 3: Data Abstraction Modularity and Abstraction Abstraction, modularity, information

Managing Water Abstraction Reforming Abstraction and Modernising Regulation Richard Austen Water

Predicate Abstraction with SATABS Version 1.0, 2010 Outline Introduction Existential

61A Lecture 18 Announcements Sequences The Sequence Abstraction 4 The Sequence Abstraction

CS 1331 Introduction to Object Oriented Programming Data Abstraction Christopher Simpkins

On Visual Abstraction Ivan Viola Visual Abstraction Fundamental concept in visualization and

Lecture 11: Abstraction I ntro. to Programming, lecture 11: Abstraction 2 Topics for today

CS 61A Lecture 9 Friday, September 14 The Sequence Abstraction 2 The Sequence Abstraction red,

The Power of Abstraction The Power of Abstraction Barbara Liskov October 2009 October 2009

Abstraction in Cryptography Ueli Maurer ETH Zurich CRYPTO 2009, August 19, 2009 Abstraction in

What is abstraction? What is abstraction? Workable answer a blurring of details Idea:

Software Architectural Descriptions in CCS and PI-calculus Sandeep Nimmakuri Guide:Prof.

OpenPipe freedom for your fingers OSHW bagpipes &amp; beyond... A bit of history Back

Curriculum Connectors 2 0 1 9 -2 0 2 0 February 2 0 2 0 Departm ents Standard-Based Instruction

Although Weve Come to the End of the Road(map): The Future of CMOS Nicole DiLello 6.Insight

Mu2e Calorimeter ~ 680 CsI crystals (Square, side: 34 mm) for the first disk ~ 680 CsI crystals

End-to-end Analysis and Design of a Drone Flight Controller Zhuoqun Cheng, Richard West, Craig

Future Trends in Robotics Commercial Vehicle Megatrends India 2012 Raj Singh Rathee KUKA

FIELD BUSES ECE422-DATA COMMUNICATIONS &amp; COMPUTER NETWORKS October 2020 WHAT IS A FIELDBUS?

Point, Line, & Plane 1 Abstraction Abstraction is the act of considering something as a

OpenPipe freedom for your fingers OSHW bagpipes & beyond... A bit of history Back

FIELD BUSES ECE422-DATA COMMUNICATIONS & COMPUTER NETWORKS October 2020 WHAT IS A FIELDBUS?