data centric execution of speculative parallel programs
play

Data-Centric Execution of Speculative Parallel Programs MA MARK - PowerPoint PPT Presentation

Data-Centric Execution of Speculative Parallel Programs MA MARK JEFFREY, SUVINAY SUBRAMANIAN, MALEEN ABEYDEERA, JOEL EMER, DANIEL SANCHEZ MI MICRO 2016 Executive summary Many-cores must exploit cache locality to scale Current speculative


  1. Data-Centric Execution of Speculative Parallel Programs MA MARK JEFFREY, SUVINAY SUBRAMANIAN, MALEEN ABEYDEERA, JOEL EMER, DANIEL SANCHEZ MI MICRO 2016

  2. Executive summary Many-cores must exploit cache locality to scale Current speculative systems, e.g. TLS or TM, do not exploit locality Spatial Hints: run tasks likely to access the same data in the same place ◦ A software-given hint denotes the data a new task is likely to access ◦ Hardware maps tasks with the same hint to the same place ◦ Hardware uses hints to perform locality-aware load balancing Our techniques make speculative parallelism practical at large scale ◦ It is easy to modify programs to convey locality through hints ◦ Performance improves by 3.3x at 256 cores ◦ We reduce network traffic by 6.4x and wasted work by 3.5x DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 2

  3. Prior speculative systems scale poorly TRANSACTIONAL MEMORY (TM) SCHEDULERS SPATIAL HINTS Reduce wasted work of coarse-grain txns Make accesses local for fine-grain tasks Limit concurrency: When to run a task? Less data movement: Where to run a task? DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 3

  4. Prior speculative systems scale poorly TRANSACTIONAL MEMORY (TM) SCHEDULERS SPATIAL HINTS Reduce wasted work of coarse-grain txns Make accesses local for fine-grain tasks Limit concurrency: When to run a task? Less data movement: Where to run a task? Spatially map tasks for improved locality and less waste DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 3

  5. Prior non-speculative locality techniques do not work for speculation STATIC TASK MAPPING DYNAMIC TASK MAPPING Data dependences known a priori Work stealing ◦ Linear algebra, Anton 2 [ASPLOS ‘13] ◦ Cheap, local enqueues ◦ Steals to adapt to imbalance ◦ Limited application types Graph partitioning ◦ Stealing interferes with speculation ◦ Localizes communication and scheduling ◦ Slow preprocessing step ◦ Cannot adapt to imbalance DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 4

  6. Baseline Architecture: Swarm [MICRO ‘15] DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 5

  7. Baseline Swarm execution model Programs consist of timestamped tasks ◦ Tasks can create children tasks with >= timestamp ◦ Tasks appear to execute in timestamp order swarm::enqueue(function_pointer, timestamp, arguments...); General execution model supports ordered and unordered parallelism DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 6

  8. Baseline Swarm architecture Speculatively executes tasks out of order 64-tile, 256-core chip Tile organization Mem / IO Large hardware task queues L3 slice Router Scalable ordered speculation L2 Tile Mem / IO Mem / IO Scalable ordered commits L1I/D L1I/D L1I/D L1I/D Core Core Core Core Task unit Mem / IO Efficiently supports tiny speculative tasks DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 7

  9. Spatial Hints in Action COMBINING SPECULATION AND LOCALITY DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 8

  10. Example: Discrete event simulation (DES) r s t = r XOR s A r C 0 0 0 E t 1 0 1 D 1 1 0 s B DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  11. Example: Discrete event simulation (DES) 1 0 r s t = r XOR s A 0 r C 0 0 0 0 E t 1 0 1 0 D 0 1 1 1 0 s B DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  12. Example: Discrete event simulation (DES) 1 0 r s t = r XOR s A 0 r C 0 0 0 0 E t 1 0 1 0 D 0 1 1 1 0 s B Tasks r=1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  13. Example: Discrete event simulation (DES) 1 0 1 r s t = r XOR s A 0 r C 0 0 0 0 E t 1 0 1 0 D 0 1 1 1 0 s B Tasks r=1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  14. Example: Discrete event simulation (DES) 1 1 0 r s t = r XOR s A 0 r C 0 0 0 0 E t 1 0 1 0 D 0 1 1 1 0 s B Tasks r=1 A=1 D 0 =1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  15. Example: Discrete event simulation (DES) 0 1 0 1 r s t = r XOR s A 0 r C 0 0 0 0 E t 1 0 1 0 D 0 1 1 1 0 s B Tasks r=1 A=1 D 0 =1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  16. Example: Discrete event simulation (DES) 1 0 1 0 r s t = r XOR s A 0 r C 0 0 0 0 E t 1 0 1 0 D 0 1 1 1 0 s B Tasks r=1 A=1 C 0 =0 D 0 =1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  17. Example: Discrete event simulation (DES) 1 0 1 0 r s t = r XOR s A 0 r C 0 0 0 0 E t 1 0 1 0 D 0 1 1 1 0 s B Tasks r=1 A=1 C 0 =0 D 0 =1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  18. Example: Discrete event simulation (DES) 0 1 0 1 r s t = r XOR s A 0 r C 0 0 0 0 E t 1 0 1 1 0 D 0 1 1 1 0 s B Tasks r=1 A=1 C 0 =0 D 0 =1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  19. Example: Discrete event simulation (DES) 1 0 1 0 r s t = r XOR s A 0 r C 0 0 0 0 E t 1 0 1 1 0 D 0 1 1 1 0 s B Tasks r=1 A=1 C 0 =0 D 0 =1 E 1 =1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  20. Example: Discrete event simulation (DES) 0 1 0 1 r s t = r XOR s A 0 r C 0 0 0 1 0 E t 1 0 1 1 0 D 0 1 1 1 0 s B Tasks r=1 A=1 C 0 =0 D 0 =1 E 1 =1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  21. Example: Discrete event simulation (DES) 1 0 1 0 r s t = r XOR s A 0 r C 0 0 0 0 1 E t 1 0 1 1 0 D 0 1 1 1 0 s B Tasks r=1 A=1 C 0 =0 D 0 =1 E 1 =1 t=1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  22. Example: Discrete event simulation (DES) 1 0 1 0 r s t = r XOR s A 0 r C 0 0 0 0 1 E t 1 0 1 1 0 D 0 1 1 1 0 s B Tasks r=1 A=1 C 0 =0 D 0 =1 E 1 =1 t=1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  23. Example: Discrete event simulation (DES) 1 0 1 0 r s t = r XOR s A 0 r C 0 0 0 1 0 E t 1 0 1 0 1 D 0 1 1 1 0 s B Tasks s=1 r=1 A=1 C 0 =0 D 0 =1 E 1 =1 t=1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  24. Example: Discrete event simulation (DES) 1 0 1 0 r s t = r XOR s A 0 r C 0 0 0 0 1 E t 1 0 1 0 1 D 1 0 1 1 1 0 s B Tasks s=1 r=1 A=1 C 0 =0 D 0 =1 E 1 =1 t=1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  25. Example: Discrete event simulation (DES) 0 1 0 1 r s t = r XOR s A 0 r C 0 0 0 0 1 E t 1 0 1 0 1 D 1 0 1 1 1 0 s B Tasks s=1 C 1 =1 r=1 A=1 C 0 =0 B=1 D 0 =1 E 1 =1 t=1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  26. Example: Discrete event simulation (DES) 1 0 1 0 r s t = r XOR s A 0 r C 0 0 0 0 1 E t 1 0 1 0 1 D 0 1 1 0 1 1 0 s B Tasks s=1 C 1 =1 r=1 A=1 C 0 =0 B=1 D 0 =1 E 1 =1 t=1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  27. Example: Discrete event simulation (DES) 0 1 0 1 r s t = r XOR s A 0 r C 0 0 0 0 1 E t 1 0 1 1 0 D 0 1 1 0 1 1 0 s B Tasks s=1 C 1 =1 r=1 A=1 C 0 =0 B=1 D 1 =0 D 0 =1 E 1 =1 t=1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  28. Example: Discrete event simulation (DES) 0 1 0 1 r s t = r XOR s A 0 r C 0 0 0 0 1 E t 1 0 1 1 0 D 0 1 1 0 1 1 0 s B Tasks s=1 C 1 =1 r=1 A=1 C 0 =0 B=1 D 1 =0 D 0 =1 E 1 =1 t=1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  29. Example: Discrete event simulation (DES) 0 1 1 0 r s t = r XOR s A 0 r C 0 0 0 1 0 E t 1 0 1 0 D 1 0 0 1 1 1 0 s B Tasks s=1 C 1 =1 r=1 A=1 C 0 =0 B=1 D 1 =0 E 1 =0 D 0 =1 E 1 =1 t=1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  30. Example: Discrete event simulation (DES) 0 1 1 0 r s t = r XOR s A 0 r C 0 0 0 0 E t 1 0 1 0 D 1 0 0 1 1 1 0 s B Tasks s=1 C 1 =1 r=1 A=1 C 0 =0 B=1 D 1 =0 E 1 =0 t=0 D 0 =1 E 1 =1 t=1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  31. Example: Discrete event simulation (DES) 0 1 1 0 r s t = r XOR s A 0 r C 0 0 0 0 E t 1 0 1 0 D 1 0 0 1 1 1 0 s B Tasks s=1 C 1 =1 r=1 A=1 C 0 =0 B=1 D 1 =0 E 1 =0 t=0 D 0 =1 E 1 =1 t=1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend