CSC2/458 Parallel and Distributed Systems Automated Parallelization - PowerPoint PPT Presentation

Jun 15, 2023 •951 likes •1.12k views

CSC2/458 Parallel and Distributed Systems Automated Parallelization in Software Sreepathi Pai January 30, 2018 URCS Outline Out-of-order Superscalars and their Limitations Static Instruction Scheduling Outline Out-of-order Superscalars and

CSC2/458 Parallel and Distributed Systems Automated Parallelization in Software Sreepathi Pai January 30, 2018 URCS
Outline Out-of-order Superscalars and their Limitations Static Instruction Scheduling
Outline Out-of-order Superscalars and their Limitations Static Instruction Scheduling
How will a processor parallelize this? for(i = 0; i < A; i++) { sum1 = sum1 + i; } for(j = 0; j < A; j++) { sum2 = sum2 + j; }
Dynamic Instruction Stream i = 0 i < A (true) sum1 = sum1 + 0 i++ i < A (true) sum1 = sum1 + 1 i++ ... i < A (false) j = 0 j < A (true) sum2 = sum2 + 0 j++ j < A (true) sum2 = sum2 + 1 j++ j < A (true) ...
An Intel Processor Pipeline Source: Intel
Instruction Pipeline • Instructions flow into “issue window” • from dynamic instruction stream • Dependences are calculated and resources allocated • Independent instructions are dispatched to backend out-of-order • Instructions are retired in-order using a “reorder buffer”
Outline Out-of-order Superscalars and their Limitations Static Instruction Scheduling
VLIW Processors • Very Long Instruction Word Processors • Can execute multiple instructions at the same time • So superscalar • But leaves independence checking to the compiler • Compiler packs instructions into ”long words” • Example: Slot 1 Slot 2 VLIW1: ins1 ins2 VLIW2: ins3 [empty]
VLIW example Consider static code below: for(i = 0; i < A; i++) { sum1 = sum1 + i; } for(j = 0; j < A; j++) { sum2 = sum2 + j; } For a 2-wide VLIW, one packing could be: Slot 1 Slot 2 i = 0 j = 0 i < A j < A sum1 = sum1 + i sum2 = sum2 + j i++ j++
Program Semantics • When processors commit in-order, they preserve appearance of executing in program order • Not always true when multiple processors are involved • But when compilers emit code, they change order from what is in program • Which orders in the original program must be preserved? • Which orders do not need to be preserved?
Our Ordering Principles • Preserve Data Dependences • Preserve Control Dependences What about: printf("hello"); printf("world");
Basic Block Scheduling • Basic block is a single-entry, single-exit code block • Instructions in basic block have the same control dependence • All can execute together if they have no dependence • Is there an advantage in reordering instructions within a basic block?
Instruction Scheduling Consider: A = 1 // takes 1 cycle B = A + 1 // takes 1 cycle C = A * 3 // takes 2 cycles and 2 ALUs D = A + 5 // takes 1 cycle Assume you have 2 ALUs. How should you schedule these instructions to lower total time?
Increasing the size of Basic Blocks • Basic blocks are usually small • Not many opportunities to schedule instructions • How can we increase size of basic blocks? • Remember out-of-order processors do speculation ...

Recommend

CSC2/458 Parallel and Distributed Systems Parallel Memory Systems: Coherence Sreepathi Pai

CSC2/458 Parallel and Distributed Systems Parallel Memory Systems: Coherence Sreepathi Pai February 06, 2018 URCS Outline Introduction to Parallel Memory Systems Memory Systems in Parallel Processors Coherence Implementations in Hardware

572 views • 34 slides

CSC2/458 Parallel and Distributed Systems Introduction Sreepathi Pai January 18, 2018 URCS

CSC2/458 Parallel and Distributed Systems Introduction Sreepathi Pai January 18, 2018 URCS Outline Administrivia Parallel Computing Distributed Computing Outline Administrivia Parallel Computing Distributed Computing People

585 views • 25 slides

CSC2/458 Parallel and Distributed Systems Parallel Memory Systems Consistency Sreepathi Pai

CSC2/458 Parallel and Distributed Systems Parallel Memory Systems Consistency Sreepathi Pai February 8, 2018 URCS Outline Memory Consistency Programming on Relaxed Consistency Machines Memory Models for Languages Special Relativity

375 views • 36 slides

CSC2/458 Parallel and Distributed Systems Machines and Models Sreepathi Pai January 23, 2018

CSC2/458 Parallel and Distributed Systems Machines and Models Sreepathi Pai January 23, 2018 URCS Outline Recap Scalability Taxonomy of Parallel Machines Performance Metrics Outline Recap Scalability Taxonomy of Parallel Machines

707 views • 31 slides

CSC2/458 Parallel and Distributed Systems Parallel Data Structures - I Sreepathi Pai January 18,

CSC2/458 Parallel and Distributed Systems Parallel Data Structures - I Sreepathi Pai January 18, 2018 URCS Outline Concurrent Objects Correctness/Safety Outline Concurrent Objects Correctness/Safety Concurrent Objects/Data Structures

602 views • 26 slides

CSC2/458 Parallel and Distributed Systems Distribute Computing Other Programming Models

CSC2/458 Parallel and Distributed Systems Distribute Computing Other Programming Models Sreepathi Pai April 03, 2018 URCS Outline Abstractions for Distributed Computing Sparks Abstractions The Spark Runtime Layering on top of Spark

558 views • 32 slides

CSC2/458 Parallel and Distributed Systems Checkpointing and Recovery Sreepathi Pai April 17,

CSC2/458 Parallel and Distributed Systems Checkpointing and Recovery Sreepathi Pai April 17, 2018 URCS Outline Checkpointing and Recovery Independent Checkpointing Coordinated Checkpointing Message Logging Outline Checkpointing and

652 views • 23 slides

CSC2/458 Parallel and Distributed Systems Mutual Exclusion and Leader Elections Sreepathi Pai

CSC2/458 Parallel and Distributed Systems Mutual Exclusion and Leader Elections Sreepathi Pai March 29, 2018 URCS Outline Mutual Exclusion Using Voting Misras Token Recovery Algorithm Election Algorithms Outline Mutual Exclusion Using

590 views • 31 slides

CSC2/458 Parallel and Distributed Systems Automatic Parallelization in Hardware Sreepathi Pai

CSC2/458 Parallel and Distributed Systems Automatic Parallelization in Hardware Sreepathi Pai January 25, 2018 URCS Outline Pipelining Superscalar and Out-of-order Execution Speculation Outline Pipelining Superscalar and Out-of-order

821 views • 29 slides

CSC2/458 Parallel and Distributed Systems Consensus and Failures Sreepathi Pai April 10, 2018

CSC2/458 Parallel and Distributed Systems Consensus and Failures Sreepathi Pai April 10, 2018 URCS Outline The FLP theorem Outline The FLP theorem The Consensus Problem: Informal A set of processes must decide on 0 or 1 as output starting

338 views • 15 slides

CSC2/458 Parallel and Distributed Systems Clocks Sreepathi Pai March 22, 2018 URCS Outline

CSC2/458 Parallel and Distributed Systems Clocks Sreepathi Pai March 22, 2018 URCS Outline The Replica Problem Logical Clocks Outline The Replica Problem Logical Clocks Continuity and Contingency Planning Securities industry regulations

425 views • 23 slides

CSC2/458 Parallel and Distributed Systems PPMI: Basic Building Blocks Sreepathi Pai February 13,

CSC2/458 Parallel and Distributed Systems PPMI: Basic Building Blocks Sreepathi Pai February 13, 2018 URCS Outline Multiprocessor Machines Archetypes of Work Distribution Multiprocessing Multithreading and POSIX Threads Non-blocking I/O or

361 views • 34 slides

CSC2/458 Parallel and Distributed Systems Termination Detection Sreepathi Pai April 12, 2018

CSC2/458 Parallel and Distributed Systems Termination Detection Sreepathi Pai April 12, 2018 URCS Outline Termination Detection Ring termination (Dijkstra et al.) Misras Algorithm Outline Termination Detection Ring termination

403 views • 24 slides

CSC2/458 Parallel and Distributed Systems PPMI: Synchronization Preliminaries Sreepathi Pai

CSC2/458 Parallel and Distributed Systems PPMI: Synchronization Preliminaries Sreepathi Pai February 15, 2018 URCS Outline Synchronization Primitives Transactional Memory Mutual Exclusion Implementation Strategies Mutual Exclusion

711 views • 33 slides

33:010:458 33:010:458 Accounting Information Accounting Information Systems Systems Dr. Peter

33:010:458 33:010:458 Accounting Information Accounting Information Systems Systems Dr. Peter R. Gillett Associate Professor Department of Accounting, Business Ethics and Information Systems Rutgers Business SchoolNewark and New

436 views • 43 slides

33:010:458 33:010:458 Accounting Information Accounting Information Systems Systems Dr. Peter

800 views • 48 slides

Very Long Instruction Words (VLIW) 6.911 Architectures Anonymous Aaron Adler Very Long

Very Long Instruction Words (VLIW) 6.911 Architectures Anonymous Aaron Adler Very Long Instruction Words (VLIW) Fisher (HP) Trace Scheduling splits and rejoins ELI-512 500+ bit instruction VLIW Rau (HP):

666 views • 15 slides

Introduction to structured Test, diagnosis, and verification VLSI design Cost, defects,

Outline Electronics Manufacturing Introduction to structured Test, diagnosis, and verification VLSI design Cost, defects, fault models, and quality of test Test generation Design for Test (DfT) - Part 1 Erik Larsson

473 views • 16 slides

Unit 18 Field Programmable Gate Arrays (FPGAs) Implementing Logic Functions with Memories 18.2

18.1 Unit 18 Field Programmable Gate Arrays (FPGAs) Implementing Logic Functions with Memories 18.2 HARDWARE IMPLEMENTATION TARGETS 18.3 Processing Logic Approaches Application Recall HW/SW designs sit on a continuum Specific

660 views • 32 slides

Museum: A Case Study in the Management of a Small Special Library Erin Fulton, University of

Sustaining the Sacred Harp Museum: A Case Study in the Management of a Small Special Library Erin Fulton, University of Kentucky erinfulton@uky.edu SEMLA 2020 Founding director Charlene Wallace, 2016 Photo: Sasha Hsuczyk COLUMBIANA

393 views • 35 slides

LECTURE 10 Pipelining: Advanced ILP EXCEPTIONS An exception , or interrupt , is an event

LECTURE 10 Pipelining: Advanced ILP EXCEPTIONS An exception , or interrupt , is an event other than regular transfers of control Event Source Terminology (branches, jumps, calls, returns) that I/O Device Request External Interrupt

366 views • 32 slides

CS 3410 Computer Science Cornell University The slides are the product of many rounds of

CS 3410 Computer Science Cornell University The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, McKee, and Sirer. Also some slides from Amir Roth & Milo Martin in here. 1 C practice

1.09k views • 108 slides

CS 6354: Branch Prediction (cont) / Multiple Issue blt $t0, 10000, loop addiu $t0, $t0, 1 ...

CS 6354: Branch Prediction (cont) / Multiple Issue blt $t0, 10000, loop addiu $t0, $t0, 1 ... loop : } ... for ( int i = 0; i < 10000; i += 1) { Why bimodal: loops 3 unroll loops to have more to fjt in delays can be more than

306 views • 11 slides

CSCI341 Lecture 37, Introduction to Parallelism PIPELINING Exploits potential parallelism

CSCI341 Lecture 37, Introduction to Parallelism PIPELINING Exploits potential parallelism among instructions. Instruction-level parallelism INSTRUCTION-LEVEL PARALLELISM Increase depth of pipeline (greater overlap of

646 views • 26 slides