parallel programming and heterogeneous computing
play

Parallel Programming and Heterogeneous Computing Shared-Memory - PowerPoint PPT Presentation

Parallel Programming and Heterogeneous Computing Shared-Memory Hardware Max Plauth, Sven Khler, Felix Eberhardt, Lukas Wenzel and Andreas Polze Operating Systems and Middleware Group Recap: Types of Parallelism Data Level Parallelism


  1. Parallel Programming and Heterogeneous Computing Shared-Memory Hardware Max Plauth, Sven Köhler, Felix Eberhardt, Lukas Wenzel and Andreas Polze Operating Systems and Middleware Group

  2. Recap: Types of Parallelism Data Level Parallelism ■ The same operation is applied in parallel to multiple D D D D D D D D units of data. D D D D I Task Level Parallelism ■ Multiple operations are executed in parallel. Instruction Level Parallelism (ILP) D □ D D D D D D D ... between operations in a task D Thread Level Parallelism (TLP) □ ParProg 2020 B3 ... between multiple tasks within a workload Shared-Memory Hardware Request Level Parallelism □ Lukas Wenzel ... between multiple workloads Chart 2

  3. Shared-Memory Hardware Exploiting Instruction Level Parallelism ILP arises naturally within a workload ■ Programmers think in terms of a single instruction sequence □ TLP is explicitly encoded within a workload ■ Programmers designate parallel operations using multiple tasks □ ParProg 2020 B3 Shared-Memory ILP TLP Hardware Lukas Wenzel Why consider ILP in a parallel programming lecture? Knowledge of common ILP mechanisms and assumptions enables Chart 3 performance optimization on single-thread granularity!

  4. Shared-Memory Hardware Exploiting Instruction Level Parallelism Pipelining Instruction execution phases (e.g. Instruction Fetch, Decode, Execute, ■ Memory Access, Writeback) employ distinct hardware units Without pipelining only one unit would operate each clock cycle □ Pipelining increases throughput by utilizing all units in every cycle ■ Latency per instruction remains the same ■ F D E M W F D E M W ParProg 2020 B3 F D E M W F D E M W Shared-Memory Hardware F D E M W F D E M W Lukas Wenzel 15 Cycles 7 Cycles 20% Utilization Approaching 100% Utilization Chart 4

  5. Shared-Memory Hardware Exploiting Instruction Level Parallelism Pipelining Example (Data Hazards) MOV R0,#1 R0 ← 0x01 R0: 0x00 ADD R1,R0,#3 ADD R1,R0,#3 LD R2,[R1] LD R2,[R1] R1: 0x00 LD R3,[R0] LD R3,[R0] R2: 0x00 ADD R0,R0,R3 ADD R0,R0,R3 R3: 0x00 LD R3,[R1] LD R3,[R1] Fetch Decode Memory Execute Writeback ParProg 2019 Shared-Memory Hardware Cycle 1 Lukas Wenzel Chart 5.1

  6. Shared-Memory Hardware Exploiting Instruction Level Parallelism Pipelining Example (Data Hazards) MOV R0,#1 R0 ← 0x01 R0 ← 0x01 R0: 0x00 ADD R1,R0,#3 R1 ← R0 + 0x03 LD R2,[R1] LD R2,[R1] R1: 0x00 LD R3,[R0] LD R3,[R0] R2: 0x00 ADD R0,R0,R3 ADD R0,R0,R3 R3: 0x00 LD R3,[R1] LD R3,[R1] Fetch Decode Memory Execute Writeback ParProg 2019 Shared-Memory Hardware Cycle 2 Lukas Wenzel Chart 5.2

  7. Shared-Memory Hardware Exploiting Instruction Level Parallelism Pipelining Example (Data Hazards) Forward MOV R0,#1 R0 ← 0x01 R0 ← 0x01 R0: 0x00 ADD R1,R0,#3 R1 ← R0 + 0x03 R1 ← 0x04 LD R2,[R1] R2 ← [R1] R1: 0x00 LD R3,[R0] LD R3,[R0] R2: 0x00 ADD R0,R0,R3 ADD R0,R0,R3 R3: 0x00 LD R3,[R1] LD R3,[R1] Fetch Decode Memory Execute Writeback ParProg 2019 Shared-Memory Hardware Cycle 3 Lukas Wenzel Chart 5.3

  8. Shared-Memory Hardware Exploiting Instruction Level Parallelism Pipelining Example (Data Hazards) Forward MOV R0,#1 R0 ← 0x01 R0: 0x01 ADD R1,R0,#3 R1 ← 0x04 R1 ← 0x04 LD R2,[R1] R2 ← [R1] R2 ← [0x04] R1: 0x00 LD R3,[R0] R3 ← [R0] R2: 0x00 ADD R0,R0,R3 ADD R0,R0,R3 R3: 0x00 LD R3,[R1] LD R3,[R1] Fetch Decode Memory Execute Writeback ParProg 2019 Shared-Memory Hardware Cycle 4 Lukas Wenzel Chart 5.4

  9. Shared-Memory Hardware Exploiting Instruction Level Parallelism Pipelining Example (Data Hazards) Operand Fetch MOV R0,#1 R0: 0x01 ADD R1,R0,#3 R1 ← 0x04 LD R2,[R1] R2 ← [0x04] R2 ← 0xd4 R1: 0x04 LD R3,[R0] R3 ← [R0] R3 ← [0x01] R2: 0x00 ADD R0,R0,R3 R0 ← R0 + R3 Dependency R3: 0x00 LD R3,[R1] LD R3,[R1] Fetch Decode Memory Execute Writeback ParProg 2019 Shared-Memory Hardware Cycle 5 Lukas Wenzel Chart 5.5

  10. Shared-Memory Hardware Exploiting Instruction Level Parallelism Pipelining Example (Data Hazards) MOV R0,#1 R0: 0x01 ADD R1,R0,#3 LD R2,[R1] R2 ← 0xd4 R1: 0x04 LD R3,[R0] R3 ← [0x01] R3 ← 0xd1 R2: 0xd4 ADD R0,R0,R3 R0 ← R0 + R3 Bubble R3: 0x00 LD R3,[R1] LD R3,[R1] Fetch Decode Memory Execute Writeback ParProg 2019 Shared-Memory Hardware Cycle 6 Lukas Wenzel Chart 5.6

  11. Shared-Memory Hardware Exploiting Instruction Level Parallelism Pipelining Example (Data Hazards) Operand Fetch MOV R0,#1 R0: 0x01 ADD R1,R0,#3 LD R2,[R1] R1: 0x04 LD R3,[R0] R3 ← 0xd1 R2: 0xd4 ADD R0,R0,R3 R0 ← R0 + R3 R0 ← 0xd2 Bubble R3: 0xd1 LD R3,[R1] R3 ← [R1] Fetch Decode Memory Execute Writeback ParProg 2019 Shared-Memory Hardware Cycle 7 Lukas Wenzel Chart 5.7

  12. Shared-Memory Hardware Exploiting Instruction Level Parallelism Pipelining Example (Data Hazards) Operand Fetch MOV R0,#1 R0: 0x01 ADD R1,R0,#3 LD R2,[R1] R1: 0x04 LD R3,[R0] R2: 0xd4 ADD R0,R0,R3 R0 ← 0xd2 R0 ← 0xd2 R3: 0xd1 LD R3,[R1] R3 ← [R1] R3 ← [0x04] Fetch Decode Memory Execute Writeback ParProg 2019 Shared-Memory Hardware Cycle 8 Lukas Wenzel Chart 5.8

  13. Shared-Memory Hardware Exploiting Instruction Level Parallelism Pipelining Example (Data Hazards) MOV R0,#1 R0: 0xd2 ADD R1,R0,#3 LD R2,[R1] R1: 0x04 LD R3,[R0] R2: 0xd4 ADD R0,R0,R3 R0 ← 0xd2 R3: 0xd1 LD R3,[R1] R3 ← [0x04] R3 ← 0xd4 Fetch Decode Memory Execute Writeback ParProg 2019 Shared-Memory Hardware Cycle 9 Lukas Wenzel Chart 5.9

  14. Shared-Memory Hardware Exploiting Instruction Level Parallelism Pipelining Example (Data Hazards) MOV R0,#1 R0: 0xd2 ADD R1,R0,#3 LD R2,[R1] R1: 0x04 LD R3,[R0] R2: 0xd4 ADD R0,R0,R3 R3: 0xd4 LD R3,[R1] R3 ← 0xd4 Fetch Decode Memory Execute Writeback ParProg 2019 Shared-Memory Hardware Cycle 10 Lukas Wenzel Chart 5.10

  15. Shared-Memory Hardware Exploiting Instruction Level Parallelism Pipelining Example (Control Hazard) LD R0,[#1] R0 ← [0x01] R0: 0x00 MOV R1,#108 MOV R1,#5 R1: 0x00 BEQ R0,R1,L1 BEQ R0,R1,L1 LD R1,[#2] LD R1,[#2] ADD R0,R0,R1 ADD R0,R0,R1 L1:ST R0,[#4] L1:ST R0,[#4] L1:ST R0,[#4] Fetch Decode Memory Execute Writeback Branch ParProg 2019 Shared-Memory Hardware Cycle 1 Lukas Wenzel Chart 6.1

  16. Shared-Memory Hardware Exploiting Instruction Level Parallelism Pipelining Example (Control Hazard) LD R0,[#1] R0 ← [0x01] R0 ← [0x01] R0: 0x00 MOV R1,#5 R1 ← 0x6c R1: 0x00 BEQ R0,R1,L1 BEQ R0,R1,L1 LD R1,[#2] LD R1,[#2] ADD R0,R0,R1 ADD R0,R0,R1 L1:ST R0,[#4] L1:ST R0,[#4] L1:ST R0,[#4] Fetch Decode Memory Execute Writeback Branch ParProg 2019 Shared-Memory Hardware Cycle 2 Lukas Wenzel Chart 6.2

  17. Shared-Memory Hardware Exploiting Instruction Level Parallelism Pipelining Example (Control Hazard) LD R0,[#1] R0 ← [0x01] R0 ← 0x6c R0: 0x00 MOV R1,#5 R1 ← 0x6c R1 ← 0x6c R1: 0x00 BEQ R0,R1,L1 R1 – R0 = 0: L1 LD R1,[#2] LD R1,[#2] ADD R0,R0,R1 ADD R0,R0,R1 L1:ST R0,[#4] L1:ST R0,[#4] L1:ST R0,[#4] Fetch Decode Memory Execute Writeback Branch ParProg 2019 Shared-Memory Hardware Cycle 3 Lukas Wenzel Chart 6.3

  18. Shared-Memory Hardware Exploiting Instruction Level Parallelism Pipelining Example (Control Hazard) LD R0,[#1] R0 ← 0x6c R0: 0x6c MOV R1,#5 R1 ← 0x6c R1 ← 0x6c R1: 0x00 BEQ R0,R1,L1 R1 – R0 = 0: L1 0x6c-0x6c=0: L1 LD R1,[#2] R1 ← [0x02] ADD R0,R0,R1 ADD R0,R0,R1 L1:ST R0,[#4] L1:ST R0,[#4] L1:ST R0,[#4] Fetch Decode Memory Execute Writeback Branch ParProg 2019 Shared-Memory Hardware Cycle 4 Lukas Wenzel Chart 6.4

  19. Shared-Memory Hardware Exploiting Instruction Level Parallelism Pipelining Example (Control Hazard) LD R0,[#1] R0: 0x6c MOV R1,#5 R1 ← 0x6c R1: 0x6c BEQ R0,R1,L1 0x6c-0x6c=0: L1 TRUE: L1 LD R1,[#2] R1 ← [0x02] R1 ← [0x02] ADD R0,R0,R1 R0 ← R0 + R1 L1:ST R0,[#4] L1:ST R0,[#4] L1:ST R0,[#4] Fetch Decode Memory Execute Writeback Branch ParProg 2019 Shared-Memory Hardware Cycle 5 Lukas Wenzel Chart 6.5

  20. Shared-Memory Hardware Exploiting Instruction Level Parallelism Pipelining Example (Control Hazard) LD R0,[#1] R0: 0x6c MOV R1,#5 R1: 0x6c BEQ R0,R1,L1 TRUE: L1 LD R1,[#2] R1 ← [0x02] R1 ← 0x12 ADD R0,R0,R1 R0 ← R0 + R1 R0 ← 0x6c+0x12 L1:ST R0,[#4] L1:ST R0,[#4] [0x04] ← R0 [0x04] ← R0 Fetch Decode Memory Execute Writeback Branch ParProg 2019 Shared-Memory FETCH L1 | FLUSH Hardware Cycle 6 Lukas Wenzel Chart 6.6

  21. Shared-Memory Hardware Exploiting Instruction Level Parallelism Pipelining Example (Control Hazard) LD R0,[#1] R0: 0x6c MOV R1,#5 R1: 0x6c BEQ R0,R1,L1 LD R1,[#2] ADD R0,R0,R1 L1:ST R0,[#4] [0x04] ← R0 [0x04] ← R0 Fetch Decode Memory Execute Writeback Branch ParProg 2019 Shared-Memory Hardware Cycle 7 Lukas Wenzel Chart 6.7

  22. Shared-Memory Hardware Exploiting Instruction Level Parallelism Pipelining Example (Control Hazard) LD R0,[#1] R0: 0x6c MOV R1,#5 R1: 0x6c BEQ R0,R1,L1 LD R1,[#2] ADD R0,R0,R1 L1:ST R0,[#4] [0x04] ← R0 [0x04] ← R0 [0x04] ← 0x6c Fetch Decode Memory Execute Writeback Branch ParProg 2019 Shared-Memory Hardware Cycle 8 Lukas Wenzel Chart 6.8

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend