Network Calculus for Parallel Processing George Kesidis The - - PowerPoint PPT Presentation
Network Calculus for Parallel Processing George Kesidis The - - PowerPoint PPT Presentation
Network Calculus for Parallel Processing George Kesidis The Pennsylvania State University kesidis@gmail.com Dagstuhl Seminar on Network Calculus March 8-11, 2015, at Schloss Dagstuhl March 9, 2015 George Kesidis 1 Outline of the talk
Outline of the talk
- Introduction
- Review of two results from the 1980s for Markovian models
– A two-server Markovian system - two M/M/1 queues with coupled arrivals – Multi-server Markovian system
- Single-stage, fork-join system
- Network calculus applications - in collaboration with Y. Shan, B. Urgaonkar & Jorg
– Simple deterministic result – Stationary analysis via gSBB – Numerical example using Facebook data
- Discussions
– Load balancing in a single processing stage – Workload transformation for tandem processing stages – Dynamic scheduling – Applications with feedback, e.g., distributed simulation
- References
2
Parallel processing systems - overview
- Decades of study on concurrent programming and parallel processing (including cluster
computing), often in highly application-specific settings.
- Challenges include
– resource allocation and load balancing so as to reduce delays at barrier (synchronization, join) points, – redundancy for robustness/protection, and – maintaining consistent shared memory/state across processors while minimizing com- munication overhead, – especially when dealing with feedback in the application itself.
- Techniques may be proactive or reactive/dynamic in nature.
- Today, popular platforms use Virtual Machines (VMs) mounted on multi-processor servers
- f a single data-center, or a group of data-centers forming a cloud.
March 9, 2015 George Kesidis 3
Feed-forward parallel-processing systems
- A certain family of jobs are best served by a particular arrangement of VMs/processors for
parallel execution.
- In the following, we consider jobs that lend themselves to feed-forward parallel-processing
systems, e.g., many search/data-mining applications.
- In a single parallel-processing stage, a job is partitioned into tasks (i.e., the job is “forked”
- r the tasks are demultiplexed); the tasks are then worked upon in parallel by different
processors.
- Within parallel-processing systems, there are often processing barriers (points of synchro-
nization or “joins”) wherein some or all component tasks of a job need to be completed before the next stage of processing of the job can commence.
- The terminus of the entire parallel-processing system is typically a barrier.
- Thus, the latency of a stage (between barriers or between the exogenous job arrivals to the
first barrier) is the greatest latency among the processing paths through it.
March 9, 2015 George Kesidis 4
MapReduce
- Google’s MapReduce template for parallel processing with VMs (especially its open-source
implementation Apache Hadoop) is a very popular such framework to handle a sequence of search tasks.
- MapReduce is a multi-stage parallel-processing framework where each processor is a VM
(again, mounted on a server of a data-center).
- In MapReduce, jobs arrive and are partitioned into tasks.
- Each task is then assigned to a mapper VM for initial processing (first stage).
- The results of mappers are transmitted (shuffled), in pipelined fashion with the mapper’s
- peration, to reducers (second stage).
- Reducer VMs combine the mapper results they have received and perform additional pro-
cessing.
- A barrier exists before each reducer (after its mapper-shuffler stage) and after all the reducers
(after the reducer stage).
March 9, 2015 George Kesidis 5
Simple MapReduce example of a word-search application
- Two mappers that search and one reducer that combines their results.
- Document corpus to be searched is divided between the mappers.
March 9, 2015 George Kesidis 6
Single-stage, fork-join systems - a Markovian analysis
- Jobs sequentially arrive to a parallel processing system of K identical servers.
- The ith job arrives at time ti and spawns (forks) K tasks.
- Let xj,i be the service-duration of the task assigned to server j by job i.
- The tasks assigned to a server are queued in FIFO fashion.
- The sojourn (or response) time Dj,i − ti of the ith task of server j is the sum of its service
time (xj,i) and its queueing delay:
Dj,i = xj,i + max{Dj,i−1, ti} ∀ i ≥ 1, 1 ≤ j ≤ K Dj,0 =
- The response time of the ith job is
max
1≤j≤K Dj,i − ti
March 9, 2015 George Kesidis 7
Two-server (K = 2) system
- Suppose that jobs arrive according to a Poisson process with intensity λ > 0, i.e.,
ti − ti−1 ∼ exp(λ)
so that E(ti − ti−1) = λ−1.
- Also, assume that the task service-times xj,i are mutually independent and exponentially
distributed:
x1,i ∼ exp(α)
and
x2,i ∼ exp(β) ∀i ≥ 1.
- Let Qi(t) be the number of tasks in server i at time t.
- (Q1, Q2) is a continuous-time Markov chain.
March 9, 2015 George Kesidis 8
Transition rates of (Q1, Q2) with m, n ≥ 0
March 9, 2015 George Kesidis 9
Stationary distribution of (Q1, Q2)
- Assume that the system is stable, i.e., λ < min{α, β}.
- For the Markov process (Q1, Q2) in steady state, let the stationary
pm,n =
P((Q1, Q2) = (m, n)).
- The balance equations are
(1 + α1{m > 0} + β1{n > 0}) pm,n = λ1{m > 0, n > 0}pm−1,n−1 + αpm+1,n + βpm,n+1, ∀m, n ∈ Z≥0,
where
∞
- m=0
∞
- n=0
pm,n = 1.
March 9, 2015 George Kesidis 10
Stationary distribution of (Q1, Q2) (cont)
- The balance equations can be solved by two-dimensional moment generating function (Z
transform) [Flatto & Hahn 1984]
P(z, w) =
∞
- m=0
∞
- n=0
pm,nzmwn, z, w ∈ C
- Multiplying the previous balance equations by zmwn and summing over m, n gives P(z, w)
in terms of boundary values P(z, 0) and P(0, w).
- In the load-balanced case where α = β with ρ := λ/α < 1 [equ (6.5) of FH’84],
P(z, 0) = (1 − ρ)3/2/
- 1 − ρz.
- From this, we can find the first two moments of pm,0,
∞
- m=0
mpm,0 =
d dzP(z, 0)
- z=1
= 1 2ρ
∞
- m=0
m2pm,0 =
d dzz d dzP(z, 0)
- z=1
= 1 2ρ + 3 4 · ρ2 1 − ρ
March 9, 2015 George Kesidis 11
Job sojourn times
- Recall that a job is completed (departs the system) only when all of its tasks are completed
(have been served).
- Some jobs have arrived but none of their tasks completed, while others have had only one
task completed.
- So, in the two-server (K = 2) case, |Q1 − Q2| represents the number of jobs queued in
the system with just one task completed.
- Let qk = P(Q1 − Q2 = k) in steady-state for k ∈ Z.
- Note that ∀k ≥ 0,
qk =
∞
- m=k
pm,m−k.
March 9, 2015 George Kesidis 12
Job sojourn times in the load-balanced case
- Summing the balance equations for (Q1, Q2) from m = k ≥ 0 with n = m − k gives
(1 + α + β)qk − βpk,0 = qk + αqk+1 + βqk−1 − βpk−1,0 ⇒ α(qk+1 − qk) − β(qk − qk−1) = −βpk,0 + βpk−1,0
- In the symmetric case (i.e., the servers are load balanced) where α = β > λ, this implies
qk+1 − qk = −pk,0, ∀k ≥ 0
where ∀k ∈ Z, qk = q−k.
- Thus,
qk =
∞
- m=k
pm,0, ∀k ≥ 0.
March 9, 2015 George Kesidis 13
Job sojourn times in the load-balanced case (cont)
- Consider jobs with no tasks completed and those completed tasks whose siblings are not
completed for the load-balanced (α = β) case.
- By Little’s theorem the mean sojourn time of a job is:
EQ1
λ + E|Q1 − Q2| 2λ = 1 α − λ + 1 λ
∞
- k=1
kqk = 1 α − λ + 1 λ
∞
- k=1
k
∞
- m=k
pm,0 = 1 α − λ + 1 λ
∞
- m=1
pm,0
m
- k=1
k = 1 α − λ + 1 λ
∞
- m=1
pm,0 m2 + m 2 = 1 α − λ + 1 4λρ + 3 8λ · ρ2 1 − ρ + 1 4λρ
where
α − λ λ = 1 − ρ ρ ,
and we have used the first two moments of pm,0 computed above.
March 9, 2015 George Kesidis 14
Job sojourn times in the load-balanced case - main result
- So, the mean sojourn time of a job in the load-balanced (α = β) case is:
EQ1
λ + E|Q1 − Q2| 2λ = 1 α − λ
3
2 − 1 8ρ
- ,
where
1 α − λ
is just the mean number of jobs in a stationary M/M/1 queue.
- Note that the delay factor above M/M/1 satisfies:
11 8 ≤ 3 2 − 1 8ρ ≤ 3 2.
March 9, 2015 George Kesidis 15
Bounds for K > 2 servers - Associated RVs
- Again, consider the load balanced (i.i.d. exp(α) task service times) and stable (λ < α)
case.
- To obtain an upper bound, it was argued in [Nelson and Tantawi 1988] that for all jobs i,
all of its task sojourn times {Sj,i := Dj,i − ti}K
j=1 form an “associated” group of random
variables.
- Taking any monotonic function g of each member group of an “associated” random variables
{Xj} leads to a group of random variables {g(Xj)} that have (pairwise) non-negative
covariance, cov(g(Xj), g(Xl)) ≥ 0.
- The following useful maximal inequality follows: ∀x > 0,
P( max
1≤j≤K Sj,i > x)
≤ 1 −
K
- j=1
P(Sj,i ≤ x) i.e., the Bernoulli random variables 1{Sj,i ≤ x} (a monotonically decreasing function of Sj,i) have non-negative covariance since P( max
1≤j≤K Sj,i > x)
= 1 − P( max
1≤j≤K Sj,i ≤ x).
March 9, 2015 George Kesidis 16
Bounds for K > 2 servers (cont)
- The stationary sojourn time S(K) of a job has distribution satisfying, ∀x > 0:
P(S(K) > x)
= lim
i→∞ P( max 1≤j≤K Sj,i > x)
≤ 1 −
K
- j=1
lim
i→∞ P(Sj,i ≤ x),
where the last equality is for the M/M/1 queue.
- Using PASTA and conditioning on the number of jobs in a stationary M/M/1 queue (∼
geom(ρ)), one can show that the sojourn time of a job in steady-state ∼ exp(α − λ), so that P(S(K) > x)
≤ 1 − (1 − exp((α − λ)x))K
- Thus, one can show using
ES(K)
=
∞ P(S(K) > x)dx
≤
∞
(1 − (1 − exp((α − λ)x))K)dx =: HK
March 9, 2015 George Kesidis 17
Bounds for K > 2 servers - main result
- From the previous display (for the load-balanced case α = β), the mean sojourn time
ES(K) ≤ HK.
- One can also show HK = O(log K), so that
ES(K)
=
O(log K).
- Ignoring queuing delays, we get a simple lower bound
ES(K)
≥ HK/α,
giving some measure of tightness to the previous upper bound.
March 9, 2015 George Kesidis 18
Single-stage, fork-join systems - a deterministic analysis
- Consider a bank of K parallel queues, with queue/processor k is provisioned with service
capacity sk.
- Here let A be the (fluid, positive time) cumulative input process of work that is divided
among queues so that the kth queue has arrivals ak and departures dk in such a way that
∀t ≥ 0, A(t) =
- k
ak(t).
- Define the virtual delay processes for hypothetical departures at time t ≥ 0 for queue k as
δk(t) = t − a−1
k (dk(t)),
where we define inverses a−1
k
- f non-decreasing functions ak as continuous from the left so
that ak(a−1
k (v)) ≡ a−1 k (ak(v)) ≡ v.
- The following definition of the cumulative departures D is such that the output ready
for processing in the subsequent (reducer) stage is determined by the most “lagging” queue/processor: ∀t ≥ 0,
D(t) = A(t − max
k
δk(t)) = A
- min
k
a−1
k (dk(t))
- 19
March 9, 2015 George Kesidis
Delay bound under service and input-burstiness curves
- The convolution (⊗) of two non-decreasing functions f and g, with f(t) = g(t) = 0 for
t ≤ 0, is (f ⊗ g)(t) = inf
0≤τ≤t
- f(τ) + g(t − τ)
.
- Define a delay function ∆v for any v ≥ 0 as
∆v(t) =
- if t ≤ v ,
+∞
if t > v .
- So, for any function f, constant v ≥ 0, and time t,
f(t − v) = (f ⊗ ∆v)(t).
- For a queue with cumulative arrival and departure functions given by a(t) and c(t), re-
spectively, the queue has a lower service curve smin if for all times t and arrivals a,
c(t) ≥ (smin ⊗ a)(t) .
- A lower service curve is a non-decreasing function that describes a service guarantee of the
queue.
March 9, 2015 George Kesidis 20
Delay bound under service and input-burstiness curves (cont)
- We assume that the arrivals to queue k are bounded by a burstiness curve (traffic envelope)
bin,k in the sense that for all t ≥ 0, ak(t) ≤ (ak ⊗ bin,k)(t) .
i.e., bk,in(x) is an upper bound on the arrivals to queue k in any time interval of length x.
- If a queue with lower service curve smin has arrivals with burstiness curve bin,k, an upper
bound on the delay is given by
dmax,k = min{z ≥ 0 : ∀x ≥ 0, smin,k(x) ≥ (bin,k ⊗ ∆z)(x)} .
(1)
- Here, dmax,k is the largest horizontal difference between bin,k and smin,k.
March 9, 2015 George Kesidis 21
Simple deterministic delay-bound claim
- Claim: A lower service curve of a fork-join system is given by
smin(t) = ∆maxk{dmax,k}(t) .
- Remark: The claim simply states that the maximum delay of the whole system is the
maximum delay among the queues.
22
Proof of deterministic delay-bound claim
- By hypothesis, ∀t ≥ v ≥ 0 and ∀k,
smin,k(t − v) ≥ (bin,k ⊗ ∆dmax,k)(t − v) = bin,k(t − v − dmax,k) ≥ ak(t − dmax,k) − ak(v) ,
- Thus, ∀t ≥ v ≥ 0 and ∀k,
ak(v) + smin,k(t − v) ≥ ak(t − dmax,k) ⇒ (ak ⊗ smin,k)(t) ≥ ak(t − dmax,k) ⇒ a−1
k ((ak ⊗ smin,k)(t))
≥ t − dmax,k ,
where we have used the fact that, ∀k, ak are nondecreasing.
- Thus,
D(t) = A
- min
k {a−1 k (dk(t))}
- ≥
A
- min
k {a−1 k ((ak ⊗ smin,k)(t))}
- ≥
A
- min
k {t − dmax,k}
- = A(t − max
k
dmax,k) = (A ⊗ ∆maxk{dmax,k})(t),
where we have used the fact that A is nondecreasing.
23
Single-stage, fork-join systems - a stationary analysis
- Claim: In the stationary regime at t ≥ 0, if
A1 service to queue k, sk ≫ smin,k where
∀v ≥ 0, smin,k(v) := vµk;
A2 the demux/mapper divides arriving work roughly proportional to minimum allocated service resources µk to queue k (strong load balancing), i.e., ∀k, ∃ small εk > 0 such that ∀v ≤ t,
- ak(t) − ak(v) − µk
M (A(t) − A(v))
- ≤
εk a.s.,
where M :=
k µk;
A3 the total arrivals have generalized (strong) stochastically bounded burstiness, P(max
v≤t A(t) − A(v) − M(t − v) ≥ x)
≤ Φ(x),
where Φ decreases in x > 0; then ∀x > 2M maxk εk/µk, P(A(t) − D(t) ≥ x)
≤ Φ(x − 2M max
k
εk/µk).
March 9, 2015 George Kesidis 24
Single-stage, fork-join systems - a stationary analysis
March 9, 2015 George Kesidis 25
A stationary analysis - proof of claim
P(A(t) − D(t) ≥ x)
=
P(A(t) − A(min
k
a−1
k (dk(t))) ≥ x)
=
P(min
k
a−1
k (dk(t)) ≤ A−1(A(t) − x) =: t − z)
=
P(∃k s.t. dk(t) ≤ ak(t − z))
=
P(∃k s.t. ak(t) − dk(t) ≥ ak(t) − ak(t − z) =: xk),
≤
P(∃k s.t. max
v≤t ak(t) − ak(v) − (t − v)µk ≥ xk)
- where we have used the fact that A and the ak are nondecreasing (cumulative arrivals) and
the inequality is by assumption A1.
- Also, we have defined non-negative random variables z and xk such that
- k
xk = x = A(t) − A(t − z).
March 9, 2015 George Kesidis 26
A stationary analysis - proof of claim (cont)
So by using A2 then A3, we get P(A(t) − D(t) ≥ x)
≤
P(∃k s.t. max
v≤t
µk M (A(t) − A(v)) + εk − (t − v)µk ≥ µk M x − εk) =
P(∃k s.t. max
v≤t (A(t) − A(v)) − (t − v)M ≥ x − 2M
µk εk) =
P(max
v≤t (A(t) − A(v)) − (t − v)M ≥ x − 2M max k
εk/µk) ≤ Φ(x − 2M max
k
εk/µk).
March 9, 2015 George Kesidis 27
Numerical example based on a Facebook dataset
- Figure 3 of [Chen et al. 2011] depicts a week-long trace of the total number of arriving
jobs to a MapReduce system operated by Facebook.
- Clearly, the job rate exhibits “time-of-day” periodicity in its mean and variance.
- It can be simply modeled as a bounded AR(1) (two-parameter autoregressive) process with
(deterministically) time-varying parameters.
- A day-long trace of the data of individual jobs from which Figure 3 of [Chen et al. 2011]
was partially derived is publicly available.
- From this dataset, we depict the aggregate job arrival rate, by ten-minute intervals (i.e.,
144 time samples), in the following figure.
- Here, we see that the data is roughly stationary.
- Rather than interpolating, we took the workload as zero during what was likely an hour-long
- bservational outage starting at hour 14.
March 9, 2015 George Kesidis 28
Aggregate job arrival rate
5 10 15 20 25 50 100 150 200 250 300 350 400 hours number of jobs
March 9, 2015 George Kesidis 29
Facebook job types
- Moreover, Table 1 of [Chen et al. 2011] identifies ten different Facebook job types (i.e.,
ten rows) identified through clustering based on the features (columns) given in this table.
- In column 1, the number of observed jobs nj of type j is given.
- Also, the mean number of “task-seconds” per type-j job for the mapper stage, wj, is given
in the “Map time” column (we divided “Map time” by 600s consistent with the ten-minute sampling of the aggregate number of jobs in the above figure).
- With this information, we can develop an aggregate workload model to the mapper stage,
A, assuming that at each point in time the types of jobs arriving are distributed as in
column 1 of Table 1 of [Chen et al. 2011].
- Timing information associated with workloads, including the total execution duration of the
individual jobs, are not given in the publicly available datasets.
- Execution times are provided for individual jobs of CMU’s OpenCloud Hadoop cluster (so
the previous assumption would not be necessary were we to model this dataset).
March 9, 2015 George Kesidis 30
Cumulative mapper workload A - typical generated trace
5 10 15 20 25 1 2 3 4 5 6 7 8 9 x 10
4
Cumulative arrivals and service curve Hours Workloads
March 9, 2015 George Kesidis 31
Queue process Q for A with service rate M = 600 (line’s slope)
5 10 15 20 25 2000 4000 6000 8000 10000 12000 14000 Hours Queue backlog Queue backlog
March 9, 2015 George Kesidis 32
gSBB bound Φ at service rate M = 600
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0.05 0.1 0.15 0.2 0.25 0.3 0.35 mean probability with confidence bar x probability
- Recall assumption A3 of the previous Claim
- Used the day-long raw trace given above and multiple samplings of the average “Map time”
data of Table 1 [Chen et al. 2011].
- The vertical lines are 95% confidence bars based on 30 independent trials
March 9, 2015 George Kesidis 33
Discussion - load balancing in a single processing stage
- Typically, the amount of allocated parallelism of a job at a stage is based on the size of
the job’s input data-set to that stage, as that information is readily available operationally
- nline.
- The execution time for the component tasks will, of course, greatly depend on other factors
such as algorithmic/computational complexity.
- E.g., Facebook data in rows 4 and 5 of Table 1 of [Chen et al. 2011], where two jobs
have about the same mean input data size (≈ 400KB in Input column) but significantly different mean Map times (one is roughly double the other).
- This said, it’s likely that the same algorithm will be applied for all tasks of a given job so
that effective load balancing from job to task typically may be achieved, i.e., when ∀k, l,
µk = µl in in the previous claim (which allows for processors of different capacities µ, as
considered in [Ghodsi et al. 2011]).
March 9, 2015 George Kesidis 34
Discussion - tandem processing stages
- In MapReduce applications for search, the first (mapper) stage performs search on input
files, with
- the workload partitioned among the mapper’s parallel processors to allow pipelined operation
with the shuffler (communication with the following reducer stage), and
- the reducer stage combines the results of the mapper.
- The incident workloads for the two stages may be very different (again, Table 1 of [Chen
et al. 2011]).
- Such tandem parallel processing stages can be simply modeled by suitably selecting the
available service capacities µ at each stage and by suitably transforming the workload between stages.
- For example, one can compute the gSBB bound Φ2 for the incident workload of the reducer
(second) stage, as Φ1 for the mapper (first) stage was computed above using the data from Figure 3 and Table 1 of [Chen et al. 2011], the latter also having reducer-workload information.
- That is, the departing workflow of the first mapper stage is transformed so as to have a
different gSBB bound for the next reducer stage.
March 9, 2015 George Kesidis 35
Workload transformation for tandem processing stages
- More precisely, suppose Ji is the counting process of incident jobs to the ith stage and that
the kth task of jth job at that stage has workload wi,j,k.
- Thus, the cumulative workload to the kth stage is simply
Ai(t) =
- j≤J −1
i (t)
- k
wi,j,k, =
- k
ai,k(t),
where the workload to the kth processor of the ith stage is
ai,k(t) =
- j≤J −1
i (t)
wi,j,k.
March 9, 2015 George Kesidis 36
Workload transformation between stages (cont)
- To use the workload data of Table 1 of [Chen et al. 2011] for the reducer stage, we need
the counting process of jobs J2 incident to the reducer stage (J1 for the mapper stage was given in their Figure 3).
- We can do this by considering the cumulative work done (“departures”) D1 from the first
stage, an object central to our previous claims above.
- The arrival time of the mth job to the second stage is
D−1
1
j≤m
- k
w1,j,k
=: J−1
2 (m).
- Note that the service rates M, µ of the first stage will affect J2, and hence A2.
- Given such gSBB-bound transformations between stages, the results of our previous claims
can be generalized to obtain end-to-end results across tandem processing stages,
- including for in-tree networks and certain more complex networks multiclass workflows or
feedback [C.-S. Chang 1999].
March 9, 2015 George Kesidis 37
Job/task specialization for performance improvements
- Difficult to optimally set-up processor topology and provision it for a job (or job family)
completely proactively.
- Resource allocation often modified within existing MapReduce/Hadoop template and cus-
tomizations for specific job/task types.
- These changes can be proactive or reactive/dynamic to deal with performance degradation
due to – excessive stragglers (overdue tasks causing delays at barrier points) causing cancel-and- relaunch or just relaunch of tasks (so increasing associated workload), – excessive communication overhead, including to maintain consistent shared-memory/state among processes of different stages, – faults.
- One can use redundant mapper/reducer functionality, e.g., same dataset assigned to mul-
tiple mappers.
March 9, 2015 George Kesidis 38
Job/task specialization for performance improvements (cont)
- Currently under MapReduce, such redundancy is done in “uniform” fashion.
- Alternatively, redundancy could be based recognition of hotspots or congestion points.
- For example, more mappers could be allocated according to the success of a search of a
particular data subset (which is duplicated for each assigned mapper).
- This can also be done proactively by customized cloud-computing templates for specific
jobs.
- Task prioritization can be added - non-FIFO scheduling, greater likelihood of certain types
- f tasks being relaunched when delayed by smaller amounts of time.
- Moreover, certain types of jobs may only require “soft” synchronization (not hard barriers)
at join-points of certain of their tasks.
- None of these methods are new to the general problem space of parallel computation.
March 9, 2015 George Kesidis 39
Applications with feedback
- So far, considered applications that map to feed-forward processor topologies.
- Processor topologies with feedback needed for, e.g., distributed simulation of
– communication networks – manufacturing systems with “re-entrant lines”
- Rather than “hard” synchronization, can use rollback when inconsistency is detected in
shared memory/state.
- Other application-specific tricks
– modeling, e.g., packet or fluid traffic models (ripple effect for latter), importance sam- pling – dynamic time-warp
March 9, 2015 George Kesidis 40
Summary
- Classical area of parallel processing - techniques of concurrent programming, cluster com-
puting, cloud computing (now trending).
- Markov models of fork-join systems studied in the 1980s under highly idealized assumptions.
- Possible to apply methods of network calculus.
- Workloads naturally change as jobs progress through the system.
- Workloads associated with component tasks change with application of proactive/reactive