microarchitectural mechanisms to exploit value structure
play

Microarchitectural Mechanisms to Exploit Value Structure in SIMT - PowerPoint PPT Presentation

Microarchitectural Mechanisms to Exploit Value Structure in SIMT Architectures Ji Kim, Christopher Torng, Shreesha Srinath, Derek Lockhart, and Christopher Batten Cornell University Cornell University IEEE/ACM International Symposium on


  1. Microarchitectural Mechanisms to Exploit Value Structure in SIMT Architectures Ji Kim, Christopher Torng, Shreesha Srinath, Derek Lockhart, and Christopher Batten Cornell University Cornell University IEEE/ACM International Symposium on Computer IEEE/ACM International Symposium on Computer Architecture 2013 (ISCA-40) Architecture 2013 (ISCA-40) 1/27 Cornell University Cornell University Cornell University Ji Kim Ji Kim Ji Kim 1/20 1/20

  2. Motivation GP-SIMT vs. FG-SIMT Value Structure FG-SIMT Baseline Compact Affine Execution Evaluation Motivation          • SIMT architectures exploit: • Control Structure (i.e. common instruction fetch/decode/issue) • Memory-Access Structure (i.e. memory coalescing) Value Structure occurs when the same operation uses values across threads which can be represented as a compact function. • Primary research questions: • How does value structure impact control and memory-access structure? • How can we realistically implement hardware mechanisms to exploit value structure to improve performance and energy-efficiency? 2/27 Cornell University Cornell University Cornell University Ji Kim Ji Kim Ji Kim 2/20 2/20

  3. Motivation GP-SIMT vs. FG-SIMT Value Structure FG-SIMT Baseline Compact Affine Execution Evaluation Presentation Outline • General-Purpose vs. Fine-Grain SIMT • Characterizing Value Structure • FG-SIMT Baseline Architecture • Compact Affine Execution • Evaluation 3/27 Cornell University Cornell University Cornell University Ji Kim Ji Kim Ji Kim 3/20 3/20

  4. Motivation GP-SIMT vs. FG-SIMT Value Structure FG-SIMT Baseline Compact Affine Execution Evaluation Why GP-SIMT and FG-SIMT?                                               • Holistic approach for evaluating on different SIMT architectures • GP-SIMT as a model for traditional SIMT architecture • Focus on exploiting inter-warp parallelism • FG-SIMT as our own alternative SIMT architecture that we are building from the ground up • Targeting flexible, compute-focused data-parallel accelerators • Focus on exploiting intra-warp parallelism, area-efficiency • Build credibility with FG-SIMT with cycle time, area, and energy analysis 4/27 Cornell University Cornell University Cornell University Ji Kim Ji Kim Ji Kim 4/20 4/20

  5. Motivation GP-SIMT vs. FG-SIMT Value Structure FG-SIMT Baseline Compact Affine Execution Evaluation GP-SIMT Programming Model FG-SIMT Programming Model                                               __global__ void vsadd( int y[], int a ) ! { ! int idx = // get thread index ! ! y[idx] = y[idx] + a; ! if ( y[idx] > THRESHOLD ) ! y[idx] = Y_MAX_VALUE; ! } ! • Key difference is in how kernel is launched • GP-SIMT: HW-managed, coarse-grain kernel launch • FG-SIMT: HW/SW-managed, fine-grain kernel launch 5/27 Cornell University Cornell University Cornell University Ji Kim Ji Kim Ji Kim 5/20 5/20

  6. Motivation GP-SIMT vs. FG-SIMT Value Structure FG-SIMT Baseline Compact Affine Execution Evaluation GP-SIMT Microarchitecture FG-SIMT Microarchitecture                                                         • Multi-warp execution • Single warp execution • Single-ported register file • Multi-ported register file • Wide, unbanked L1 cache • Shared, banked L1 cache • Integrated fetch/decode/issue • SW-programmable control processor • Distinct memory space • Unified memory space 6/27 Cornell University Cornell University Cornell University Ji Kim Ji Kim Ji Kim 6/20 6/20

  7. Motivation GP-SIMT vs. FG-SIMT Value Structure FG-SIMT Baseline Compact Affine Execution Evaluation Presentation Outline • General-Purpose vs. Fine-Grain SIMT • Characterizing Value Structure • FG-SIMT Baseline Architecture • Compact Affine Execution • Evaluation 7/27 Cornell University Cornell University Cornell University Ji Kim Ji Kim Ji Kim 7/20 7/20

  8. Motivation GP-SIMT vs. FG-SIMT Value Structure FG-SIMT Baseline Compact Affine Execution Evaluation Identifying Value Structure __global__ void ! vsadd: ! vsadd( int y[], int a ) { ! ld.sh R_a, M[A] ! R_a ! int idx = // get thread index ! ld.sh R_ybase, M[Y] ! R_ybase ! ! add R_yptr, R_ybase, IDX ! R_yptr ! R_ybase ! IDX ! y[idx] = y[idx] + a; ! load R_y, M[R_yptr] ! R_yptr ! if ( y[idx] > THRESHOLD ) ! add R_y, R_y, R_a ! R_a ! y[idx] = Y_MAX_VALUE; ! store R_y, M[R_yptr] ! R_yptr ! } ! branch R_y, THRESHOLD ! THRESHOLD ! branc imm R_max, Y_MAX_VALUE ! R_max ! Y_MAX_VALUE ! imm store R_max, M[R_yptr] ! R_max ! R_yptr ! store stop stop ! !  2 ! 2 ! 2 ! 2 !  32 ! 32 ! 32 ! 32 !  40 ! 40 ! 40 ! 40 !  0 ! 1 ! 2 ! 3 ! 32 ! 36 ! 40 !  44 !  19 ! 89 ! 8 ! 127 !     Affine Value Structure: V ( i ) = b + i × s 8/27 Cornell University Cornell University Cornell University Ji Kim Ji Kim Ji Kim 8/20 8/20

  9. Motivation GP-SIMT vs. FG-SIMT Value Structure FG-SIMT Baseline Compact Affine Execution Evaluation Why does value structure occur? __global__ void ! vsadd: ! vsadd( int y[], int a ) { ! ld.sh R_a, M[A] ! R_a ! int idx = // get thread index ! ld.sh R_ybase, M[Y] ! R_ybase ! ! add R_yptr, R_ybase, IDX ! R_yptr ! R_ybase ! IDX ! y[idx] = y[idx] + a; ! load R_y, M[R_yptr] ! R_yptr ! if ( y[idx] > THRESHOLD ) ! add R_y, R_y, R_a ! R_a ! y[idx] = Y_MAX_VALUE; ! store R_y, M[R_yptr] ! R_yptr ! } ! branch R_y, THRESHOLD ! THRESHOLD ! branc imm R_max, Y_MAX_VALUE ! R_max ! Y_MAX_VALUE ! imm store R_max, M[R_yptr] ! R_max ! R_yptr ! store stop stop ! ! • Operating on or loading constants • Common control flow (e.g., inner loops) • Manipulating addresses for structured memory access 9/27 Cornell University Cornell University Cornell University Ji Kim Ji Kim Ji Kim 9/20 9/20

  10. Motivation GP-SIMT vs. FG-SIMT Value Structure FG-SIMT Baseline Compact Affine Execution Evaluation How often does value structure occur? • GP-SIMT Hardware detection, Collange et al. HPPC-2009 • On average, 34% of register reads and 22% of register writes are affine • GP-SIMT Software detection, Lee et al. CGO-2013 • On average, 31% of combined register reads/writes are affine • Our own FG-SIMT functional simulation: • 30-80% of register reads and 20-70% of register writes are affine 10/27 Cornell University Cornell University Cornell University Ji Kim Ji Kim Ji Kim 10/20 10/20

  11. Motivation GP-SIMT vs. FG-SIMT Value Structure FG-SIMT Baseline Compact Affine Execution Evaluation Presentation Outline • General-Purpose vs. Fine-Grain SIMT • Characterizing Value Structure • FG-SIMT Baseline Architecture • Compact Affine Execution • Evaluation 11/27 Cornell University Cornell University Cornell University Ji Kim Ji Kim Ji Kim 11/20 11/20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend