Fork-Join Parallelism Removing this assumption creates major - PDF document

12/1/2016 Changing a major assumption So far most or all of your study of computer science has assumed One thing happened at a time CSE373: Data Structures and Algorithms Called sequential programming – everything part of one sequence Fork-Join Parallelism Removing this assumption creates major challenges & opportunities – Programming: Divide work among threads of execution and coordinate (synchronize) among them Steve Tanimoto – Algorithms: How can parallel activity provide speed-up Autumn 2016 (more throughput: work done per unit time) – Data structures: May need to support concurrent access This lecture material represents the work of multiple instructors at the University of Washington. (multiple threads operating on data at the same time) Thank you to all who have contributed! Autumn 2016 CSE373: Data Structures & Algorithms 2 A simplified view of history What to do with multiple processors? Writing correct and efficient multithreaded code is often much more • Next computer you buy will likely have 4 processors difficult than for single-threaded (i.e., sequential) code (your current one might already) – Especially in common languages like Java and C – Wait a few years and it will be 8, 16, 32, … – So typically stay sequential if possible – The chip companies have decided to do this (not a “law”) From roughly 1980-2005, desktop computers got exponentially faster at running sequential programs • What can you do with them? – About twice as fast every couple years – Run multiple totally different programs at the same time • Already do that? Yes, but with time-slicing But nobody knows how to continue this – Do multiple things at once in one program – Increasing clock rate generates too much heat • Our focus – more difficult – Relative cost of memory access is too high • Requires rethinking everything from asymptotic – But we can keep making “wires exponentially smaller” complexity to how to implement data-structure operations (Moore’s “Law”), so put multiple processors on the same chip (“multicore”) Autumn 2016 CSE373: Data Structures & Algorithms 3 Autumn 2016 CSE373: Data Structures & Algorithms 4 Parallelism vs. Concurrency An analogy Note: Terms not yet standard but the perspective is essential – Many programmers confuse these concepts CS1 idea: A program is like a recipe for a cook – One cook who does one thing at a time! ( Sequential ) Concurrency: Parallelism: Use extra resources to Correctly and efficiently manage Parallelism: access to shared resources solve a problem faster – Have lots of potatoes to slice? requests work – Hire helpers, hand out potatoes and knives – But too many chefs and you spend all your time coordinating resource resources Concurrency: There is some connection: – Lots of cooks making different things, but only 4 stove burners – Common to use threads for both – Want to allow access to all 4 burners, but not cause spills or – If parallel computations need access to shared resources, incorrect burner settings then the concurrency needs to be managed We will just do a little parallelism, avoiding concurrency issues Autumn 2016 CSE373: Data Structures & Algorithms 5 Autumn 2016 CSE373: Data Structures & Algorithms 6 1

12/1/2016 Shared memory Shared memory The model we will assume is shared memory with explicit threads – Not the only approach, may not be best, but time for only one Threads each have own unshared call stack and current statement – (pc for “program counter”) Old story: A running program has – local variables are numbers, null , or heap references – One program counter (current statement executing) Any objects can be shared, but most are not – One call stack (with each stack frame holding local variables) – Objects in the heap created by memory allocation (i.e., new ) • (nothing to do with data structure called a heap) pc=… Unshared: Shared: – Static fields - belong to the class and not an instance (or object) locals and objects and of the class. Only one for all instances of a class. control static fields … New story: pc=… pc=… – A set of threads , each with its own program counter & call stack • No access to another thread’s local variables … – Threads can (implicitly) share static fields / objects … … • To communicate , write somewhere another thread reads Autumn 2016 CSE373: Data Structures & Algorithms 7 Autumn 2016 CSE373: Data Structures & Algorithms 8 Java basics Our Needs Learn a couple basics built into Java via java.lang.Thread To write a shared-memory parallel program, need new primitives – But for style of parallel programming we’ll advocate, do not use from a programming language or library these threads; use Java 7’s ForkJoin Framework instead To get a new thread running: • Ways to create and run multiple things at once 1. Define a subclass C of java.lang.Thread , overriding run – Let’s call these things threads 2. Create an object of class C 3. Call that object’s start method • Ways for threads to share memory • start sets off a new thread, using run as its “main” – Often just have threads with references to the same objects What if we instead called the run method of C ? • Ways for threads to coordinate (a.k.a. synchronize) – This would just be a normal method call, in the current thread – A way for one thread to wait for another to finish – [Other features needed in practice for concurrency] Let’s see how to share memory and coordinate via an example… Autumn 2016 CSE373: Data Structures & Algorithms 9 Autumn 2016 CSE373: Data Structures & Algorithms 10 Parallelism idea First attempt, part 1 • Example: Sum elements of a large array class SumThread extends java.lang.Thread { • Idea: Have 4 threads simultaneously sum 1/4 of the array int lo; // arguments – Warning: This is an inferior first approach, but it’s usually good to int hi; start with something naïve works int[] arr; int ans = 0; // result SumThread(int[] a, int l, int h) { lo=l; hi=h; arr=a; ans0 ans1 ans2 ans3 } + ans public void run() { //override must have this type for(int i=lo; i < hi; i++) – Create 4 thread objects , each given a portion of the work ans += arr[i]; } – Call start() on each thread object to actually run it in parallel } – Wait for threads to finish using join() – Add together their 4 answers for the final result Because we must override a no-arguments/no-result run , we use fields to communicate across threads Autumn 2016 CSE373: Data Structures & Algorithms 11 Autumn 2016 CSE373: Data Structures & Algorithms 12 2

Fork-Join Parallelism Removing this assumption creates major - PDF document

12/1/2016 Changing a major assumption So far most or all of your study of computer science has assumed One thing happened at a time CSE373: Data Structures and Algorithms Called sequential programming everything part of one sequence

Hardware Parallelism vs. Software Parallelism USENIX Workshop on Hot Topics in Parallelism March

Feasibility Review SOUTH SAN JOAQUIN ELECTRIC JUNE 28, 2016 A Fork in the Road BOARD DECISION

shell fork/exec Session ID ? Process Group ? ftree fork/exec fork/exec sleeper sleeper

Fork/Join Framework Checkout ForkJoinIntro project from SVN Merge sort recap Introduction

Function Objects and the Comparator Interface Merge Sort Fork/Join Framework Checkout

Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism

Stick a fork in it An attempt to summarise the Fork-Join framework through the same titled series

RFID from Farm to Fork Piero Filippin p.filippin@wlv.ac.uk RFID from Farm to Fork Funded by

Forks and Governance November 6, 2019 guha.jayachandran@sjsu.edu What is a Fork? What is a

Advanced FORK-256 Presented by Seokhie Seokhie Hong Hong Presented by hsh@cist.korea.ac.kr

Merge Sort & Fork-Join Parallelism Edna Jones & Tayler Burns CSSE 221 Section 2

Pervasive Parallelism Laboratory Stanford University ppl.stanford.edu Make parallelism

Data-Level Parallelism Nima Honarmand Fall 2015 :: CSE 610 Parallel Computer Architectures

Advanced OpenMP Lecture 6: Nested parallelism Nested parallelism Nested parallelism is

CSCI341 Lecture 37, Introduction to Parallelism PIPELINING Exploits potential parallelism

JOINS IN SQL By Rohit Dhanwani OBJECTIVES Define and use different types of joins INNER

Status of ProtoDUNE-SP Performance Paper Flavio, Tingjun, Tom ProtoDUNE DRA Meeting Dec 4, 2019

Continuation-Passing Style Transforms Type Theory and Coq Vincent Koppen 15-05-2018 Paper

Juggling Slide Rules By Colin Tombeur December 2011 INTRODUCTION I had always felt that

The KeY Platform for Verification and Analysis of Java Programs Reiner H ahnle Technische

Parallel Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn Sequential Algorithms Classical

Parallel Models Different ways to exploit parallelism Reusing this material This work is

OpenMP: a shared-memory parallel programming model Eduard Ayguad Computer Sciences Department

Why formalize? n ML is tricky, particularly in corner cases Formal Semantics n generalizable type

Fork-Join Parallelism Removing this assumption creates major - PDF document

12/1/2016 Changing a major assumption So far most or all of your study of computer science has assumed One thing happened at a time CSE373: Data Structures and Algorithms Called sequential programming everything part of one sequence

Hardware Parallelism vs. Software Parallelism USENIX Workshop on Hot Topics in Parallelism March

Feasibility Review SOUTH SAN JOAQUIN ELECTRIC JUNE 28, 2016 A Fork in the Road BOARD DECISION

shell fork/exec Session ID ? Process Group ? ftree fork/exec fork/exec sleeper sleeper

Fork/Join Framework Checkout ForkJoinIntro project from SVN Merge sort recap Introduction

Function Objects and the Comparator Interface Merge Sort Fork/Join Framework Checkout

Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism

Stick a fork in it An attempt to summarise the Fork-Join framework through the same titled series

RFID from Farm to Fork Piero Filippin p.filippin@wlv.ac.uk RFID from Farm to Fork Funded by

Forks and Governance November 6, 2019 guha.jayachandran@sjsu.edu What is a Fork? What is a

Advanced FORK-256 Presented by Seokhie Seokhie Hong Hong Presented by hsh@cist.korea.ac.kr

Merge Sort &amp; Fork-Join Parallelism Edna Jones &amp; Tayler Burns CSSE 221 Section 2

Pervasive Parallelism Laboratory Stanford University ppl.stanford.edu Make parallelism

Data-Level Parallelism Nima Honarmand Fall 2015 :: CSE 610 Parallel Computer Architectures

Advanced OpenMP Lecture 6: Nested parallelism Nested parallelism Nested parallelism is

CSCI341 Lecture 37, Introduction to Parallelism PIPELINING Exploits potential parallelism

JOINS IN SQL By Rohit Dhanwani OBJECTIVES Define and use different types of joins INNER

Status of ProtoDUNE-SP Performance Paper Flavio, Tingjun, Tom ProtoDUNE DRA Meeting Dec 4, 2019

Continuation-Passing Style Transforms Type Theory and Coq Vincent Koppen 15-05-2018 Paper

Juggling Slide Rules By Colin Tombeur December 2011 INTRODUCTION I had always felt that

The KeY Platform for Verification and Analysis of Java Programs Reiner H ahnle Technische

Parallel Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn Sequential Algorithms Classical

Parallel Models Different ways to exploit parallelism Reusing this material This work is

OpenMP: a shared-memory parallel programming model Eduard Ayguad Computer Sciences Department

Why formalize? n ML is tricky, particularly in corner cases Formal Semantics n generalizable type

Merge Sort & Fork-Join Parallelism Edna Jones & Tayler Burns CSSE 221 Section 2