h etero r efactor refactoring for heterogeneous computing
play

H ETERO R EFACTOR : Refactoring for Heterogeneous Computing with - PowerPoint PPT Presentation

H ETERO R EFACTOR : Refactoring for Heterogeneous Computing with FPGA Jason Lau*, Aishwarya Sivaraman*, Qian Zhang*, Muhammad Ali Gulzar, Jason Cong, Miryung Kim University of California, Los Angeles *Equal co-first authors in alphabetical


  1. H ETERO R EFACTOR : Refactoring for Heterogeneous Computing with FPGA Jason Lau*, Aishwarya Sivaraman*, Qian Zhang*, Muhammad Ali Gulzar, Jason Cong, Miryung Kim University of California, Los Angeles *Equal co-first authors in alphabetical order

  2. H ETERO R EFACTOR : Refactoring for Heterogeneous Computing with FPGA Jason Lau Aishwarya Qian Zhang Muhammad Jason Cong Miryung Kim Sivaraman Ali Gulzar

  3. FPGA*-based Acceleration Fast Efficient * FPGA: Field Programmable Gate Array 3

  4. FPGA*-based Acceleration Fast Efficient Effort * Field Programmable Gate Array 4

  5. Evolution of Programming Model module vecdot(a, b, c, clk, rst); Verilog typeless. input [67:0] a, b; HDL * output [16:0] c; reg [5:0] s; reg [16:0] prod [0:7]; ... always @( posedge clk or posedge rst) registers. if (!rst) begin if (s == 6’b00001 ) prod[0] = a[..] * b[..]; prod[1] =... instructions. s = 6’b00010 ; else if (s == 6’b00010 ) reg1 = prod[0] + prod[1] + prod[2]; goto-style control. s = 6’b00100 ; // goto L00100; else if (s == 6’b00100 ) reg1 = reg1 + prod[3] + prod[4]; s = 6’b01000 ; else ... ; ... endmodule * HDL: Hardware Description Language 5

  6. Evolution of Programming Model fpga_float<8,15> vecdot( typed. Merlin fpga_float<8,15> a[], HLS * , fpga_float<8,15> b[], auto schedule. etc. fpga_int<31> n) { for ( fpga_int<31> i = 0; i < n; i++) auto resource. sum += a[i] * b[i]; return sum; auto optimization. } * HLS: High-Level Synthesis 6

  7. Something is missing... fpga_float<8,15> vecdot( bit-width. Merlin fpga_float<8,15> a[], HLS * , fpga_float<8,15> b[], etc. fpga_int<31> n) { for ( fpga_int<31> i = 0; i < n; i++) bitwidth = 31 sum += a[i] * b[i]; return sum; waste scarce } memory! FPGA memory: < 100 MB * HLS: High-Level Synthesis 7

  8. Something is missing... exponent 8 bits fraction 15 bits fpga_float<8,15> vecdot( bit-width. Merlin fpga_float<8,15> a[], HLS * , fpga_float<8,15> b[], floating-point precision. etc. fpga_int<31> n) { for ( fpga_int<31> i = 0; i < n; i++) sum += a[i] * b[i]; return sum; precision? } memory? * HLS: High-Level Synthesis 8

  9. Something is missing... 4 errors in 14 lines of code struct Node { bit-width. Merlin Node *left, *right; HLS * , int val; }; floating-point precision. etc. void init(Node **root) { recursive data structure. *root = (Node *)malloc(sizeof(Node)); } nested pointers void insert(Node **root, int *arr); void delete_tree(Node *root) { // … free(root); } void traverse(Node *curr) { if (curr == NULL) return ; int ret = visit(curr->val); traverse(curr->left); traverse(curr->right); } * HLS: High-Level Synthesis 9

  10. Something is missing... 4 errors in 14 lines of code struct Node { bit-width. Merlin Node *left, *right; HLS * , int val; }; preallocated floating-point precision. etc. size? void init(Node **root) { recursive data structure. *root = (Node *)malloc(sizeof(Node)); } nested pointers void insert(Node **root, int *arr); void delete_tree(Node *root) { dynamic mem mgmt // … free(root); } void traverse(Node *curr) { if (curr == NULL) return ; int ret = visit(curr->val); traverse(curr->left); traverse(curr->right); } * HLS: High-Level Synthesis 10

  11. Something is missing... 4 errors in 14 lines of code struct Node { bit-width. Merlin Node *left, *right; HLS * , int val; }; preallocated floating-point precision. etc. size? void init(Node **root) { recursive data structure. *root = (Node *)malloc(sizeof(Node)); } nested pointers void insert(Node **root, int *arr); void delete_tree(Node *root) { dynamic mem mgmt // … free(root); } pointer operations void traverse(Node *curr) { if (curr == NULL) return ; int ret = visit(curr->val); traverse(curr->left); traverse(curr->right); } * HLS: High-Level Synthesis 11

  12. Something is missing... 4 errors in 14 lines of code struct Node { bit-width. Merlin Node *left, *right; HLS * , int val; }; preallocated floating-point precision. etc. size? void init(Node **root) { recursive data structure. *root = (Node *)malloc(sizeof(Node)); } nested pointers void insert(Node **root, int *arr); void delete_tree(Node *root) { dynamic mem mgmt // … free(root); } pointer operations void traverse(Node *curr) { if (curr == NULL) return ; recursion functions int ret = visit(curr->val); traverse(curr->left); traverse(curr->right); } * HLS: High-Level Synthesis 12

  13. Evolution of Programming Model Programmability Credit: A Multi-Paradigm Programming Infrastructure for (languages features, Heterogeneous Architectures by Cong et al. programming evolving difficulty, etc.) C++ 17 C++ 14 gap C++ 11 CPU C++ 03 ANSI C evolving Merlin HLS pragma simplified Vivado HLS C/C++ C/C++ untimed descriptions FPGA HDL 1989 2003 2011 2014 2017 Year 13

  14. Evolution of Programming Model Programmability Credit: A Multi-Paradigm Programming Infrastructure for (languages features, Heterogeneous Architectures by Cong et al. programming evolving difficulty, etc.) C++ 17 significant human effort. C++ 14 & error-prone. gap C++ 11 CPU C++ 03 ANSI C evolving Merlin HLS pragma simplified Vivado HLS C/C++ C/C++ untimed descriptions FPGA HDL 1989 2003 2011 2014 2017 Year 14

  15. Evolution of Programming Model Programmability Credit: A Multi-Paradigm Programming Infrastructure for (languages features, Heterogeneous Architectures by Cong et al. programming evolving difficulty, etc.) C++ 17 significant human effort. C++ 14 & error-prone. gap C++ 11 CPU C++ 03 waste scarce memory ANSI C evolving Merlin HLS pragma simplified Vivado HLS C/C++ C/C++ untimed descriptions FPGA HDL 1989 2003 2011 2014 2017 Year 15

  16. I want it to run! 16

  17. I want it to run efficiently! 17

  18. Automation! 18

  19. H ETERO R EFACTOR C++ Inputs Recursive Data Structures Support and Optimization Instrumentation one-click Integers Refactoring Bitwidth Optimization Selective Offloading Floating Points Bitwidth Optimization Vivado HLS / Merlin 19

  20. Part 1. Dynamic Data Structures C++ Inputs Recursive Data Structures Support and Optimization Instrumentation one-click Integers Refactoring Bitwidth Optimization Selective Offloading Floating Points Bitwidth Optimization Vivado HLS / Merlin 20

  21. Dynamic Data Structures: Instrumentation C++ Inputs Recursive Data Structures Instrumentation one-click Refactoring Data Structure Size Recursion Depth Selective Offloading Vivado HLS / Merlin 21

  22. Dynamic Data Structures: Refactoring C++ Inputs Recursive Data Structures Recursive Data Structures Support and Optimization Instrumentation one-click Refactoring Data Structure Size Recursion Depth Selective Offloading Rewrite Memory Modify Pointer Convert Management Access Recursion Vivado HLS / Merlin 22

  23. Example Program void init(Node **root) { *root = (Node *)malloc(sizeof(Node)); } C++ Inputs void delete_tree(Node *root) { // … free(root); } Instrumentation void traverse(Node *curr) { // entry if (curr == NULL) return ; Refactoring int ret = visit(curr->val); traverse(curr->left); traverse(curr->right); // return } Selective Offloading // top-level function float kernel( float input[], int n) { float value = computation( float (..), ..); Vivado HLS / Merlin } 23

  24. Refactoring Rule 1 : Rewrite Mem. Mgmt. void init(Node **root) { C++ Inputs *root = (Node *)malloc(sizeof(Node)); } void delete_tree(Node *root) { // … Instrumentation free(root); } Refactoring void init(Node_ptr *root) { *root = (Node_ptr)Node_malloc(sizeof(Node)); } Selective Offloading void delete_tree(Node_ptr root) { // … Node_free(root); } Vivado HLS / Merlin 24

  25. Refactoring Rule 1 : Rewrite Mem. Mgmt. void init(Node **root) { C++ Inputs *root = (Node *)malloc(sizeof(Node)); } void delete_tree(Node *root) { // … Instrumentation free(root); } Refactoring void init(Node_ptr *root) { *root = (Node_ptr)Node_malloc(sizeof(Node)); } Selective Offloading void delete_tree(Node_ptr root) { // … Node_free(root); } Vivado HLS / Merlin 25

  26. Refactoring Rule 2 : Rewrite Pointer Access void traverse(Node_ptr curr) { C++ Inputs if (curr == NULL) return ; int ret = visit(curr->val); traverse(curr->left); Instrumentation traverse(curr->right); } Refactoring Node Node_arr[NODE_ARR_SIZE]; void traverse(Node_ptr curr) { if (curr == NULL) return ; Selective Offloading int ret = visit(Node_arr[curr].val); traverse(Node_arr[curr].left); traverse(Node_arr[curr].right); } Vivado HLS / Merlin 26

  27. Refactoring Rule 3 : Convert Recursion void traverse(Node_ptr curr) { traverse(Node_arr[curr].left); C++ Inputs traverse(Node_arr[curr].right); } void traverse_converted(Node_ptr curr) { Instrumentation stack<context> s(STACK_SIZE); while (!s.empty()) { context c = s.pop(); goto c.location; Refactoring L0: // traverse(Node_arr[curr].left); c.location = L1; s.push(c); Selective Offloading s.push({curr: Node_arr[curr].left}); continue; L1: // ... Vivado HLS / Merlin } } 27

  28. Part 2. Integers C++ Inputs Recursive Data Structures Support and Optimization Instrumentation one-click Integers Refactoring Bitwidth Optimization Selective Offloading Floating Points Bitwidth Optimization Vivado HLS / Merlin 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend