An Architectural Framework for Accelerating Dynamic Parallel Algorithms
- n Reconfigurable Hardware
An Architectural Framework for Accelerating Dynamic Parallel - - PowerPoint PPT Presentation
An Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware Tao Chen, Shreesha Srinath Christopher Batten , G. Edward Suh Computer Systems Laboratory School of Electrical and Computer Engineering Cornell
Computation Model Accelerator Architecture Design Methodology Evaluation
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 2 / 18
Computation Model Accelerator Architecture Design Methodology Evaluation
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 3 / 18
Motivation Computation Model Accelerator Architecture Design Methodology Evaluation
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 4 / 18
Motivation
Accelerator Architecture Design Methodology Evaluation
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 5 / 18
Motivation
Accelerator Architecture Design Methodology Evaluation
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 6 / 18
Motivation
Accelerator Architecture Design Methodology Evaluation
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 7 / 18
Motivation Computation Model
Design Methodology Evaluation
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 8 / 18
Motivation Computation Model
Design Methodology Evaluation
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 9 / 18
Motivation Computation Model
Design Methodology Evaluation
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 9 / 18
Motivation Computation Model
Design Methodology Evaluation
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 9 / 18
Motivation Computation Model
Design Methodology Evaluation
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 9 / 18
Motivation Computation Model
Design Methodology Evaluation
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 9 / 18
Motivation Computation Model
Design Methodology Evaluation
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 9 / 18
Motivation Computation Model
Design Methodology Evaluation
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 9 / 18
Motivation Computation Model
Design Methodology Evaluation
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 9 / 18
Motivation Computation Model
Design Methodology Evaluation
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 9 / 18
Motivation Computation Model
Design Methodology Evaluation
Networks Interface Tile Tile L1$ L1$ CPU L1$ Cache Coherent Interconnect L2 Cache Off-Chip DRAM FPGA TMU task
Worker task in Processing Element steal succ task Pending Task Store Arg & Task Router Stealing Net IF Arg/Task Net IF
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 10 / 18
Motivation Computation Model
Design Methodology Evaluation
Networks Interface Tile Tile L1$ L1$ CPU L1$ Cache Coherent Interconnect L2 Cache Off-Chip DRAM FPGA TMU task
Worker task in Processing Element steal succ task Pending Task Store Arg & Task Router Stealing Net IF Arg/Task Net IF
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 10 / 18
Motivation Computation Model
Design Methodology Evaluation
Networks Interface Tile Tile L1$ L1$ CPU L1$ Cache Coherent Interconnect L2 Cache Off-Chip DRAM FPGA TMU task
Worker task in Processing Element steal succ task Pending Task Store Arg & Task Router Stealing Net IF Arg/Task Net IF
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 10 / 18
Motivation Computation Model
Design Methodology Evaluation
Networks Interface Tile Tile L1$ L1$ CPU L1$ Cache Coherent Interconnect L2 Cache Off-Chip DRAM FPGA TMU task
Worker task in Processing Element steal succ task Pending Task Store Arg & Task Router Stealing Net IF Arg/Task Net IF
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 10 / 18
Motivation Computation Model
Design Methodology Evaluation
Networks Interface Tile Tile L1$ L1$ CPU L1$ Cache Coherent Interconnect L2 Cache Off-Chip DRAM FPGA TMU task
Worker task in Processing Element steal succ task Pending Task Store Arg & Task Router Stealing Net IF Arg/Task Net IF
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 10 / 18
Motivation Computation Model
Design Methodology Evaluation
Networks Interface Tile Tile L1$ L1$ CPU L1$ Cache Coherent Interconnect L2 Cache Off-Chip DRAM FPGA TMU task
Worker task in Processing Element steal succ task Pending Task Store Arg & Task Router Stealing Net IF Arg/Task Net IF
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 10 / 18
Motivation Computation Model
Design Methodology Evaluation
Networks Interface Tile Tile L1$ L1$ CPU L1$ Cache Coherent Interconnect L2 Cache Off-Chip DRAM FPGA TMU task
Worker task in Processing Element steal succ task Pending Task Store Arg & Task Router Stealing Net IF Arg/Task Net IF
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 10 / 18
Motivation Computation Model
Design Methodology Evaluation
Networks Interface Tile Tile L1$ L1$ CPU L1$ Cache Coherent Interconnect L2 Cache Off-Chip DRAM FPGA TMU task
Worker task in Processing Element steal succ task Pending Task Store Arg & Task Router Stealing Net IF Arg/Task Net IF
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 10 / 18
Motivation Computation Model
Design Methodology Evaluation
Networks Interface Tile Tile L1$ L1$ CPU L1$ Cache Coherent Interconnect L2 Cache Off-Chip DRAM FPGA TMU task
Worker task in Processing Element steal succ task Pending Task Store Arg & Task Router Stealing Net IF Arg/Task Net IF
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 10 / 18
Motivation Computation Model
Design Methodology Evaluation
Networks Interface Tile Tile L1$ L1$ CPU L1$ Cache Coherent Interconnect L2 Cache Off-Chip DRAM FPGA TMU task
Worker task in Processing Element steal succ task Pending Task Store Arg & Task Router Stealing Net IF Arg/Task Net IF
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 10 / 18
Motivation Computation Model
Design Methodology Evaluation
Networks Interface Tile Tile L1$ L1$ CPU L1$ Cache Coherent Interconnect L2 Cache Off-Chip DRAM FPGA TMU task
Worker task in Processing Element task Arg/Task Net IF
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 11 / 18
Motivation Computation Model Accelerator Architecture
Evaluation
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 12 / 18
Motivation Computation Model Accelerator Architecture
Evaluation
Networks Interface Tile Tile L1$ L1$ TMU task
Empty Worker task in Processing Element steal succ task Pending Task Store Arg & Task Router Stealing Net IF Arg/Task Net IF Empty Worker
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 13 / 18
Motivation Computation Model Accelerator Architecture
Evaluation
void FibWorkerHLS( TaskInPort<FibTask> tin, TaskOutPort<FibTask> tout, SuccReqPort sreq, SuccRespPort sresp, ArgOutPort aout ) { FibTask task = task_in.read(); task_k_t k = task.k; if (task.type == FIB) { int n = task.x; if (n < 2) send_arg( Arg(k, n), aout ); else { k = make_succ(SUM,k,2,sreq,sresp); spawn(FibTask(FIB,k,1,n-2), tout); spawn(FibTask(FIB,k,0,n-1), tout); } } else if (task.type == SUM) { int sum = task.x + task.y; send_arg(Arg(k, sum), aout); } }
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 13 / 18
Motivation Computation Model Accelerator Architecture Design Methodology
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 14 / 18
Motivation Computation Model Accelerator Architecture Design Methodology
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 15 / 18
Motivation Computation Model Accelerator Architecture Design Methodology
Networks Interface Tile 4PEs Tile 4PEs SBuf ARM A9 ARM A9 L1$ 32KB L1$ 32KB Cache Coherent Interconnect L2 Cache Off-Chip DRAM FPGA SBuf Arbiter Networks Interface Tile 4PEs Tile 4PEs L1$ 32KB L1$ 32KB 8x 8x ARM OOO ARM OOO 8x L1$ 32KB L1$ 32KB 8x Cache Coherent Interconnect L2 Cache Off-Chip DRAM FPGA
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 16 / 18
Motivation Computation Model Accelerator Architecture Design Methodology
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 17 / 18
Motivation Computation Model Accelerator Architecture Design Methodology Evaluation
Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware 18 / 18