Network Calculus for Parallel Processing George Kesidis The - PowerPoint PPT Presentation

Network Calculus for Parallel Processing George Kesidis The Pennsylvania State University kesidis@gmail.com Dagstuhl Seminar on Network Calculus March 8-11, 2015, at Schloss Dagstuhl � March 9, 2015 George Kesidis 1

Outline of the talk • Introduction • Review of two results from the 1980s for Markovian models – A two-server Markovian system - two M/M/1 queues with coupled arrivals – Multi-server Markovian system • Single-stage, fork-join system • Network calculus applications - in collaboration with Y. Shan, B. Urgaonkar & Jorg – Simple deterministic result – Stationary analysis via gSBB – Numerical example using Facebook data • Discussions – Load balancing in a single processing stage – Workload transformation for tandem processing stages – Dynamic scheduling – Applications with feedback, e.g., distributed simulation • References 2

Parallel processing systems - overview • Decades of study on concurrent programming and parallel processing (including cluster computing), often in highly application-specific settings. • Challenges include – resource allocation and load balancing so as to reduce delays at barrier (synchronization, join) points, – redundancy for robustness/protection, and – maintaining consistent shared memory/state across processors while minimizing com- munication overhead, – especially when dealing with feedback in the application itself. • Techniques may be proactive or reactive/dynamic in nature. • Today, popular platforms use Virtual Machines (VMs) mounted on multi-processor servers of a single data-center, or a group of data-centers forming a cloud. � March 9, 2015 George Kesidis 3

Feed-forward parallel-processing systems • A certain family of jobs are best served by a particular arrangement of VMs/processors for parallel execution. • In the following, we consider jobs that lend themselves to feed-forward parallel-processing systems, e.g., many search/data-mining applications. • In a single parallel-processing stage , a job is partitioned into tasks ( i.e., the job is “forked” or the tasks are demultiplexed); the tasks are then worked upon in parallel by different processors. • Within parallel-processing systems, there are often processing barriers (points of synchronization or “joins”) wherein some or all component tasks of a job need to be completed before the next stage of processing of the job can commence. • The terminus of the entire parallel-processing system is typically a barrier. • Thus, the latency of a stage (between barriers or between the exogenous job arrivals to the first barrier) is the greatest latency among the processing paths through it. � March 9, 2015 George Kesidis 4

MapReduce • Google’s MapReduce template for parallel processing with VMs (especially its open-source implementation Apache Hadoop) is a very popular such framework to handle a sequence of search tasks. • MapReduce is a multi-stage parallel-processing framework where each processor is a VM (again, mounted on a server of a data-center). • In MapReduce, jobs arrive and are partitioned into tasks. • Each task is then assigned to a mapper VM for initial processing (first stage). • The results of mappers are transmitted (shuffled), in pipelined fashion with the mapper’s operation, to reducers (second stage). • Reducer VMs combine the mapper results they have received and perform additional processing. • A barrier exists before each reducer (after its mapper-shuffler stage) and after all the reducers (after the reducer stage). � March 9, 2015 George Kesidis 5

Simple MapReduce example of a word-search application • Two mappers that search and one reducer that combines their results. • Document corpus to be searched is divided between the mappers. � March 9, 2015 George Kesidis 6

Single-stage, fork-join systems - a Markovian analysis • Jobs sequentially arrive to a parallel processing system of K identical servers. • The i th job arrives at time t i and spawns (forks) K tasks. • Let x j,i be the service-duration of the task assigned to server j by job i . • The tasks assigned to a server are queued in FIFO fashion. • The sojourn (or response) time D j,i − t i of the i th task of server j is the sum of its service time ( x j,i ) and its queueing delay: D j,i = x j,i + max { D j,i − 1 , t i } ∀ i ≥ 1 , 1 ≤ j ≤ K D j, 0 = 0 • The response time of the i th job is 1 ≤ j ≤ K D j,i − t i max � March 9, 2015 George Kesidis 7

Two-server ( K = 2 ) system • Suppose that jobs arrive according to a Poisson process with intensity λ > 0 , i.e., E ( t i − t i − 1 ) = λ − 1 . so that t i − t i − 1 ∼ exp( λ ) • Also, assume that the task service-times x j,i are mutually independent and exponentially distributed: x 1 ,i ∼ exp( α ) and x 2 ,i ∼ exp( β ) ∀ i ≥ 1 . • Let Q i ( t ) be the number of tasks in server i at time t . • ( Q 1 , Q 2 ) is a continuous-time Markov chain. � March 9, 2015 George Kesidis 8

Transition rates of ( Q 1 , Q 2 ) with m, n ≥ 0 � March 9, 2015 George Kesidis 9

Stationary distribution of ( Q 1 , Q 2 ) • Assume that the system is stable, i.e., λ < min { α, β } . • For the Markov process ( Q 1 , Q 2 ) in steady state, let the stationary p m,n = P (( Q 1 , Q 2 ) = ( m, n )) . • The balance equations are (1 + α 1 { m > 0 } + β 1 { n > 0 } ) p m,n ∀ m, n ∈ Z ≥ 0 , = λ 1 { m > 0 , n > 0 } p m − 1 ,n − 1 + αp m +1 ,n + βp m,n +1 , where ∞ ∞ � � p m,n = 1 . m =0 n =0 � March 9, 2015 George Kesidis 10

Stationary distribution of ( Q 1 , Q 2 ) (cont) • The balance equations can be solved by two-dimensional moment generating function (Z transform) [Flatto & Hahn 1984] ∞ ∞ � � p m,n z m w n , P ( z, w ) = z, w ∈ C m =0 n =0 • Multiplying the previous balance equations by z m w n and summing over m, n gives P ( z, w ) in terms of boundary values P ( z, 0) and P (0 , w ) . • In the load-balanced case where α = β with ρ := λ/α < 1 [equ (6.5) of FH’84], (1 − ρ ) 3 / 2 / � P ( z, 0) = 1 − ρz. • From this, we can find the first two moments of p m, 0 , ∞ � d = 1 � � mp m, 0 = d zP ( z, 0) 2 ρ � � z =1 m =0 ∞ ρ 2 � d zz d d = 1 2 ρ + 3 � m 2 p m, 0 � = d zP ( z, 0) 4 · � 1 − ρ � z =1 m =0 � March 9, 2015 George Kesidis 11

Job sojourn times • Recall that a job is completed (departs the system) only when all of its tasks are completed (have been served). • Some jobs have arrived but none of their tasks completed, while others have had only one task completed. • So, in the two-server ( K = 2 ) case, | Q 1 − Q 2 | represents the number of jobs queued in the system with just one task completed. • Let q k = P ( Q 1 − Q 2 = k ) in steady-state for k ∈ Z . • Note that ∀ k ≥ 0 , ∞ � q k = p m,m − k . m = k � March 9, 2015 George Kesidis 12

Job sojourn times in the load-balanced case • Summing the balance equations for ( Q 1 , Q 2 ) from m = k ≥ 0 with n = m − k gives (1 + α + β ) q k − βp k, 0 = q k + αq k +1 + βq k − 1 − βp k − 1 , 0 ⇒ α ( q k +1 − q k ) − β ( q k − q k − 1 ) = − βp k, 0 + βp k − 1 , 0 • In the symmetric case ( i.e., the servers are load balanced) where α = β > λ , this implies q k +1 − q k = − p k, 0 , ∀ k ≥ 0 where ∀ k ∈ Z , q k = q − k . • Thus, ∞ � = ∀ k ≥ 0 . q k p m, 0 , m = k � March 9, 2015 George Kesidis 13

Job sojourn times in the load-balanced case (cont) • Consider jobs with no tasks completed and those completed tasks whose siblings are not completed for the load-balanced ( α = β ) case. • By Little’s theorem the mean sojourn time of a job is: ∞ ∞ ∞ E Q 1 + E | Q 1 − Q 2 | α − λ + 1 1 α − λ + 1 1 � � � = kq k = k p m, 0 λ 2 λ λ λ k =1 k =1 m = k ∞ m ∞ m 2 + m α − λ + 1 1 α − λ + 1 1 � � � = k = p m, 0 p m, 0 λ λ 2 m =1 m =1 k =1 ρ 2 α − λ + 1 1 4 λρ + 3 1 − ρ + 1 = 8 λ · 4 λρ where α − λ 1 − ρ = , λ ρ and we have used the first two moments of p m, 0 computed above. � March 9, 2015 George Kesidis 14

Job sojourn times in the load-balanced case - main result • So, the mean sojourn time of a job in the load-balanced ( α = β ) case is: E Q 1 + E | Q 1 − Q 2 | 1 � 3 2 − 1 � = 8 ρ , λ 2 λ α − λ where 1 α − λ is just the mean number of jobs in a stationary M/M/1 queue. • Note that the delay factor above M/M/1 satisfies: 11 ≤ 3 2 − 1 8 ρ ≤ 3 2 . 8 � March 9, 2015 George Kesidis 15

Network Calculus for Parallel Processing George Kesidis The - PowerPoint PPT Presentation

Network Calculus for Parallel Processing George Kesidis The Pennsylvania State University kesidis@gmail.com Dagstuhl Seminar on Network Calculus March 8-11, 2015, at Schloss Dagstuhl March 9, 2015 George Kesidis 1 Outline of the talk

Experimentation with network-based security mechanisms GENI Security Workshop UC Davis G.

Relational Calculus Another Theoretical QL-Relational Calculus Comes in two flavors: Tuple

Lecture 1 : Lambda Calculus CS6202 Introduction 1 Lambda Calculus Lambda Calculus

Half-modelling of shaping in FIFO net with network calculus Context Network calculus: overview

Introduction to Parallel Computing George Karypis Principles of Parallel Algorithm Design

Introduction to Parallel Computing George Karypis Parallel Programming Platforms Elements of a

Ito calculus, Malliavin calculus and Mathematical Finance Shigeo Kusuoka Mathematical Finance

Resequencing Calculus Existing Solutions An Early Multivariate Approach Our Solution Revised

Whats Calculus? Answer: Next semester! (Fundamental Theorem of Calculus, by Newton and

Event calculus Problem Event Calculus I Constraint Logic Fritz Hamm Programming Event

Computational Semantics: Lambda Calculus Semantic Analysis Problems One Solution: -Calculus

-Calculus Christine Rizkallah CSE, UNSW (and data61) Term3 2019 1 -Calculus Church

Relational Calculus Module 3, Lecture 2 Database Management Systems, R. Ramakrishnan 1

Math 2200-01 (Calculus I) Spring 2020 Book 1 - fail for example ( one input variable onion 27

Introduction to Parallel Computing George Karypis Analytical Modeling of Parallel Algorithms

Network Calculus: Reference Material: J.-Y. LeBoudec and Patrick Thiran: Network

Selected Topics of Theory and Experiment on the Space- Charge-Dominated Beam Physics Y. Zou

Diophantine and tropical geometry David Zureick-Brown joint with Eric Katz (Waterloo) and Joe

Shape Analysis for Redistricting modern geometry meets modern politics Zachary Schutzman

Ellipses Dan Kalman American University (until 8/31) www.dankalman.net Topics

c i f i c a P T N PowerFactory Whats New 2020 E L DIgSILENT GmbH, Germany I S g

Econ 2148, fall 2019 Applications of Gaussian process priors Maximilian Kasy Department of

Lattice Envelopes Uri Bader Alex Furman Roman Sauer The Technion, Haifa University of Illinois

Robustness Envelopes for Temporal Plans Michael Cashmore 1 Alessandro Cimatti 2 Daniele Magazzeni 1