network calculus for parallel processing george kesidis
play

Network Calculus for Parallel Processing George Kesidis The - PowerPoint PPT Presentation

Network Calculus for Parallel Processing George Kesidis The Pennsylvania State University kesidis@gmail.com Dagstuhl Seminar on Network Calculus March 8-11, 2015, at Schloss Dagstuhl March 9, 2015 George Kesidis 1 Outline of the talk


  1. Network Calculus for Parallel Processing George Kesidis The Pennsylvania State University kesidis@gmail.com Dagstuhl Seminar on Network Calculus March 8-11, 2015, at Schloss Dagstuhl � March 9, 2015 George Kesidis 1

  2. Outline of the talk • Introduction • Review of two results from the 1980s for Markovian models – A two-server Markovian system - two M/M/1 queues with coupled arrivals – Multi-server Markovian system • Single-stage, fork-join system • Network calculus applications - in collaboration with Y. Shan, B. Urgaonkar & Jorg – Simple deterministic result – Stationary analysis via gSBB – Numerical example using Facebook data • Discussions – Load balancing in a single processing stage – Workload transformation for tandem processing stages – Dynamic scheduling – Applications with feedback, e.g., distributed simulation • References 2

  3. Parallel processing systems - overview • Decades of study on concurrent programming and parallel processing (including cluster computing), often in highly application-specific settings. • Challenges include – resource allocation and load balancing so as to reduce delays at barrier (synchronization, join) points, – redundancy for robustness/protection, and – maintaining consistent shared memory/state across processors while minimizing com- munication overhead, – especially when dealing with feedback in the application itself. • Techniques may be proactive or reactive/dynamic in nature. • Today, popular platforms use Virtual Machines (VMs) mounted on multi-processor servers of a single data-center, or a group of data-centers forming a cloud. � March 9, 2015 George Kesidis 3

  4. Feed-forward parallel-processing systems • A certain family of jobs are best served by a particular arrangement of VMs/processors for parallel execution. • In the following, we consider jobs that lend themselves to feed-forward parallel-processing systems, e.g., many search/data-mining applications. • In a single parallel-processing stage , a job is partitioned into tasks ( i.e., the job is “forked” or the tasks are demultiplexed); the tasks are then worked upon in parallel by different processors. • Within parallel-processing systems, there are often processing barriers (points of synchro- nization or “joins”) wherein some or all component tasks of a job need to be completed before the next stage of processing of the job can commence. • The terminus of the entire parallel-processing system is typically a barrier. • Thus, the latency of a stage (between barriers or between the exogenous job arrivals to the first barrier) is the greatest latency among the processing paths through it. � March 9, 2015 George Kesidis 4

  5. MapReduce • Google’s MapReduce template for parallel processing with VMs (especially its open-source implementation Apache Hadoop) is a very popular such framework to handle a sequence of search tasks. • MapReduce is a multi-stage parallel-processing framework where each processor is a VM (again, mounted on a server of a data-center). • In MapReduce, jobs arrive and are partitioned into tasks. • Each task is then assigned to a mapper VM for initial processing (first stage). • The results of mappers are transmitted (shuffled), in pipelined fashion with the mapper’s operation, to reducers (second stage). • Reducer VMs combine the mapper results they have received and perform additional pro- cessing. • A barrier exists before each reducer (after its mapper-shuffler stage) and after all the reducers (after the reducer stage). � March 9, 2015 George Kesidis 5

  6. Simple MapReduce example of a word-search application • Two mappers that search and one reducer that combines their results. • Document corpus to be searched is divided between the mappers. � March 9, 2015 George Kesidis 6

  7. Single-stage, fork-join systems - a Markovian analysis • Jobs sequentially arrive to a parallel processing system of K identical servers. • The i th job arrives at time t i and spawns (forks) K tasks. • Let x j,i be the service-duration of the task assigned to server j by job i . • The tasks assigned to a server are queued in FIFO fashion. • The sojourn (or response) time D j,i − t i of the i th task of server j is the sum of its service time ( x j,i ) and its queueing delay: D j,i = x j,i + max { D j,i − 1 , t i } ∀ i ≥ 1 , 1 ≤ j ≤ K D j, 0 = 0 • The response time of the i th job is 1 ≤ j ≤ K D j,i − t i max � March 9, 2015 George Kesidis 7

  8. Two-server ( K = 2 ) system • Suppose that jobs arrive according to a Poisson process with intensity λ > 0 , i.e., E ( t i − t i − 1 ) = λ − 1 . so that t i − t i − 1 ∼ exp( λ ) • Also, assume that the task service-times x j,i are mutually independent and exponentially distributed: x 1 ,i ∼ exp( α ) and x 2 ,i ∼ exp( β ) ∀ i ≥ 1 . • Let Q i ( t ) be the number of tasks in server i at time t . • ( Q 1 , Q 2 ) is a continuous-time Markov chain. � March 9, 2015 George Kesidis 8

  9. Transition rates of ( Q 1 , Q 2 ) with m, n ≥ 0 � March 9, 2015 George Kesidis 9

  10. Stationary distribution of ( Q 1 , Q 2 ) • Assume that the system is stable, i.e., λ < min { α, β } . • For the Markov process ( Q 1 , Q 2 ) in steady state, let the stationary p m,n = P (( Q 1 , Q 2 ) = ( m, n )) . • The balance equations are (1 + α 1 { m > 0 } + β 1 { n > 0 } ) p m,n ∀ m, n ∈ Z ≥ 0 , = λ 1 { m > 0 , n > 0 } p m − 1 ,n − 1 + αp m +1 ,n + βp m,n +1 , where ∞ ∞ � � p m,n = 1 . m =0 n =0 � March 9, 2015 George Kesidis 10

  11. Stationary distribution of ( Q 1 , Q 2 ) (cont) • The balance equations can be solved by two-dimensional moment generating function (Z transform) [Flatto & Hahn 1984] ∞ ∞ � � p m,n z m w n , P ( z, w ) = z, w ∈ C m =0 n =0 • Multiplying the previous balance equations by z m w n and summing over m, n gives P ( z, w ) in terms of boundary values P ( z, 0) and P (0 , w ) . • In the load-balanced case where α = β with ρ := λ/α < 1 [equ (6.5) of FH’84], (1 − ρ ) 3 / 2 / � P ( z, 0) = 1 − ρz. • From this, we can find the first two moments of p m, 0 , ∞ � d = 1 � � mp m, 0 = d zP ( z, 0) 2 ρ � � z =1 m =0 ∞ ρ 2 � d zz d d = 1 2 ρ + 3 � m 2 p m, 0 � = d zP ( z, 0) 4 · � 1 − ρ � z =1 m =0 � March 9, 2015 George Kesidis 11

  12. Job sojourn times • Recall that a job is completed (departs the system) only when all of its tasks are completed (have been served). • Some jobs have arrived but none of their tasks completed, while others have had only one task completed. • So, in the two-server ( K = 2 ) case, | Q 1 − Q 2 | represents the number of jobs queued in the system with just one task completed. • Let q k = P ( Q 1 − Q 2 = k ) in steady-state for k ∈ Z . • Note that ∀ k ≥ 0 , ∞ � q k = p m,m − k . m = k � March 9, 2015 George Kesidis 12

  13. Job sojourn times in the load-balanced case • Summing the balance equations for ( Q 1 , Q 2 ) from m = k ≥ 0 with n = m − k gives (1 + α + β ) q k − βp k, 0 = q k + αq k +1 + βq k − 1 − βp k − 1 , 0 ⇒ α ( q k +1 − q k ) − β ( q k − q k − 1 ) = − βp k, 0 + βp k − 1 , 0 • In the symmetric case ( i.e., the servers are load balanced) where α = β > λ , this implies q k +1 − q k = − p k, 0 , ∀ k ≥ 0 where ∀ k ∈ Z , q k = q − k . • Thus, ∞ � = ∀ k ≥ 0 . q k p m, 0 , m = k � March 9, 2015 George Kesidis 13

  14. Job sojourn times in the load-balanced case (cont) • Consider jobs with no tasks completed and those completed tasks whose siblings are not completed for the load-balanced ( α = β ) case. • By Little’s theorem the mean sojourn time of a job is: ∞ ∞ ∞ E Q 1 + E | Q 1 − Q 2 | α − λ + 1 1 α − λ + 1 1 � � � = kq k = k p m, 0 λ 2 λ λ λ k =1 k =1 m = k ∞ m ∞ m 2 + m α − λ + 1 1 α − λ + 1 1 � � � = k = p m, 0 p m, 0 λ λ 2 m =1 m =1 k =1 ρ 2 α − λ + 1 1 4 λρ + 3 1 − ρ + 1 = 8 λ · 4 λρ where α − λ 1 − ρ = , λ ρ and we have used the first two moments of p m, 0 computed above. � March 9, 2015 George Kesidis 14

  15. Job sojourn times in the load-balanced case - main result • So, the mean sojourn time of a job in the load-balanced ( α = β ) case is: E Q 1 + E | Q 1 − Q 2 | 1 � 3 2 − 1 � = 8 ρ , λ 2 λ α − λ where 1 α − λ is just the mean number of jobs in a stationary M/M/1 queue. • Note that the delay factor above M/M/1 satisfies: 11 ≤ 3 2 − 1 8 ρ ≤ 3 2 . 8 � March 9, 2015 George Kesidis 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend