 
              Day 3
Agenda for Today � Formulate simple problem statement � Revisit the workload characterization problem. � Present detailed (step by step) derivation of the workload.
Workload Modeling • Workload modeling is used to generate synthetic workloads based on real-life job execution observations. • The goal is typically to be able to create workloads that can be used in performance evaluation studies Negotiation Workload SLA Workload Submission Workload description (may Generation include QoS) Consolidate J1 Jn J1 result Analysis
Problem Formulation • Given 𝑄 number of available processors and m jobs, 𝒦 = 𝐾 1 , 𝐾 2 , ⋯ , 𝐾 𝑛 , waiting in a queue to be processed • Problem - allocate 𝑞 𝑘 processors to job 𝐾 𝑗 such that the overall execution time is minimized 𝑛 𝑛𝑗𝑜𝑛𝑗𝑡𝑓 𝑈 𝑘 𝑞 𝑘 , 𝑘=1 𝑛 Subject to 𝑞 𝑘 ≤ 𝑄, 𝑞 𝑘 ∈ *1,2, … , 𝑞 𝑛𝑏𝑦 + 𝑘=1 execution will starts only when 𝑞 = 𝑞 𝑛𝑏𝑦 allocated – 𝑈 𝑘 𝑜 be the execution time function of job j , – 𝑞 𝑛𝑏𝑦 the maximum parallelism job j can have, – 𝑞 𝑘 is the unknown processor allocation to job j , and
Problem Formulation • Assume that we have four jobs 𝒦 = J 1 , J 2 , J 3 , J 4 . • For simplicity assume that each job request 3 processors and the service demand of each job is 20 time units. • Suppose we have 𝑄 = 7 available homogenous processors to be assigned to the 4 jobs. • The assignment of the processor to the jobs must minimize the overall completion time of the jobs. • Assume only space sharing allocation
Allocation Possibility • One possible assignment is • J 1 = 3, • J 2 = 3 • J 3 = 1 • J 4 = 0 Initialization say 3 time units com completed J 1 , J 2 J 3 , J 4 J 3 , J 4 J 1 , J 2 20 sta tarted Time 40 • The total execution time is 43 units. • Can we do better?
Job Description • We have a set of jobs to be executed 𝒦 = 𝐾 1 , 𝐾 2 , ⋯ , 𝐾 𝑜 • Number of tasks per job – Each job 𝐾 𝑗 has a set of tasks 𝐾 𝑗 = 𝑈 1 , 𝑈 2 , ⋯ , 𝑈 𝑛 – Note that a task represents a part of the work that must be done serially by a single processor – Interdependence among job tasks is important to consider • A job is said to be “small” (or “large”) if it consists of a small (or large) number of tasks. • Tasks with a large service demand may introduce large queuing delays for queued jobs.
Job Description • A job is said to be “small” (or “large”) if it consists of a small (or large) number of tasks. Large jobs Small jobs • Based on he analysis of real workload logs used in production – the percentage of small jobs, with a small number of tasks, is higher than large jobs, with a large number of tasks. – For this reason, we examine the following distribution for the number of tasks per job. • Tasks with a large service demand may introduce large queuing delays for queued jobs.
Job Description • A job is completely described by the following parameters: – Cumulative job service demand (W) – The arrival time – The number of task • A job with one task is called a sequential job and a job with multiple tasks is called parallel jobs.
Workload Modeling • We will consider the following workloads – WK1 consists of curves with relatively good speedup. – WK2 consists of curves with not as good speedup as W1. – WK3 consists of curves with poor speedup. – WK4 contains jobs with all three speedup types, each appearing with approximately equal frequency.
Workload Generation • Sevcik proposed the following model to represent the execution time function of a job that can run on 𝑞 processors: 𝜚(𝑞) 𝑋 𝑈 𝑞 = + 𝛽 + 𝛾 ∙ 𝑞 𝑞 • It has been shown that a wide range of representative applications can be modeled by utilizing the above execution time function .
Workload Generation • The execution time function captures both the scale up and the overhead 𝜚(𝑞) 𝑋 𝑈 𝑞 = + 𝛽 + 𝛾 ∙ 𝑞 𝑞 𝑝𝑤𝑓𝑠ℎ𝑓𝑏𝑒 𝑡𝑑𝑏𝑚𝑓𝑣𝑞 • The runtime function allows to create different workloads by choosing different values for the parameters: φ, W, β, and α.
Workload Generation • There is certain level of load imbalance when running a job on multiple processors. • 𝜚(𝑞) parameter in the equation 𝜚(𝑞) 𝑋 𝑈 𝑞 = + 𝛽 + 𝛾 ∙ 𝑞 𝑞 – 𝜚(𝑞) represents the degree to which the work is not evenly spread across the p processors (i.e., load imbalance) – Real measurements conducted by Wu shows that its value is in the range of: 1.1 ≤ 𝜚(𝑞) ≤ 1.2 – Therefore, φ(p) can be considered equal to 1.0.
Workload Generation Note that adding processors to a job reduces computation time but there is certain level of increases in completion time due to sequential execution • 𝛽 parameters in the equation captures the above 𝜚(𝑞) 𝑋 𝑈 𝑞 = + 𝛽 + 𝛾 ∙ 𝑞 𝑞 – 𝛽 represents the increase of the work per processor due to parallelization (i.e., overhead)
Workload Generation • Note that adding processors to a job reduces computation time but increases communication time • 𝛾 parameter in the equation captures the above issue 𝜚(𝑞) 𝑋 𝑈 𝑞 = + 𝛽 + 𝛾 ∙ 𝑞 𝑞 – 𝛾 represents the communication and congestion delays that increase with the increase in the number of processors assigned to job. – What this says is that the more the number of processors given to a job the higher the communication cost and congestion delays
Workload Generation • 𝑋 in the runtime function represents the total service demand of the job 𝜚(𝑞) 𝑋 𝑈 𝑞 = + 𝛽 + 𝛾 ∙ 𝑞 𝑞 average service demand average service demand Small Large jobs = 1.3 and account for 7/8 = 101 and accounts for jobs of the jobs in the system 1/8 jobs in the system – The mean value 𝑋 is 13.76 and the coefficient of variation must be greater than one (e.g., 3.5, 10.0).
Workload Speed Up • We can determine the speed up of the application on p processors as follows: 𝜈 1 1 1 + 𝑞 𝑛𝑏𝑦 2 + 𝑞 𝑛𝑏𝑦 2 𝑇 𝑞 = 𝜈 1 𝑞 1 𝑞 + 𝑞 𝑛𝑏𝑦 2 + 𝑞 𝑛𝑏𝑦 2 • Where – 𝜈 ∈ ∞, 0.2,0.4 – 𝑞 𝑛𝑏𝑦 is the maximum number of processors assigned to the workload (WK1, WK2, WK3).
Workload Speed Up • We can determine the speed up for WK3 with 𝜈 = 0.2 and 𝑞 𝑛𝑏𝑦 =1,4,6,9, p=32 𝜈 1 1 1 + 𝑞 𝑛𝑏𝑦 2 + 𝑞 𝑛𝑏𝑦 2 𝑇 𝑞 = 𝜈 1 𝑞 1 𝑞 + 𝑞 𝑛𝑏𝑦 2 + 𝑞 𝑛𝑏𝑦 2 • We can substitutive the above 𝜈 1 1 1 + 4 2 + 4 2 𝑇 𝑞 = 𝜈 32 + 32 1 1 4 2 + 4 2
Workload Speed Up • The results for the speedup curve 𝑞 𝑛𝑏𝑦 = 4
Workload Speed Up • The results for the speedup curve when 𝑞 𝑛𝑏𝑦 = 16
Workload Speed Up • The results for the speedup curve when 𝑞 𝑛𝑏𝑦 = 64
Workload WK1 • Consists of curves with relatively good speedup. Task 1 • They correspond to 𝜈 = +∞ • Example is matrix multiplication application Task n-1 Task 2 Task 3 • The number of possible tasks is given by 𝑋 Task n 𝑜 = 𝛾 Structure of matrix multiplication application • β reflects the communication and congestion delays that increase with the number of processors.
Workload WK1 • The actual job service demand value is obtained from a two-stage hyper-exponential distribution depending on the coefficient of variation of the service time. 𝑋 𝑋 𝑔 𝑥 = 𝑄 1 − 𝑓 + 1 − 𝑄 1 − 𝑓 101 1.3 𝑚𝑏𝑠𝑓 𝑝𝑐𝑡 𝑡𝑛𝑏𝑚𝑚 𝑘𝑝𝑐𝑡
Workload WK1 • The actual job service demand value is obtained from a two-stage hyper-exponential distribution depending on the coefficient of variation of the service time. 𝑋 𝑋 101 + 1 − 𝑄 101 𝑔 𝑥 = 𝑄 1 − 𝑓 1 − 𝑓 Where – 𝑄 = 0125 – The mean value of 𝑋 is 13.76
Workload WK1 • The service demand of WK1 can now be computed as −13.76 − 13.76 𝑔 𝑥 = 0.125 1 − 𝑓 + 1 − 0.125 1 − 𝑓 101 1.3
Workload WK2 • They correspond to 𝜈 = 0.4 , which indicate that its speedup is not as good as WK1 • Example: n-body simulations of stellar or planetary movements, in which the movement of each body is governed by the gravitational forces produced by the system as a whole • The number of possible tasks is given by 𝑋 𝑜 = 𝛾
Workload WK3 • They correspond to 𝜈 = 0.2 , which indicate that its speedup is poorer than WK1 and WK2 • Example: Mean value analysis • kinds of applications exhibit this property. • The number of possible tasks is given by 𝑋 𝑜 = 𝛾
Workload WK3 • They correspond to 𝜈 = 0.2 , which indicate that its speedup is poorer than WK1 and WK2 • Example: Mean value analysis • kinds of applications exhibit this property. • The number of possible tasks is given by 𝑋 𝑜 = 𝛾
The Arrival Process • Jobs arrive to the system in stochastic manner J1 J4 J3 J2 • An exponential distribution with the mean inter-arrival time can be derived as follows: 𝑔 𝑦 = 𝜇 ∙ 𝑓 −𝜇∙𝑦 • where 𝜇 = 𝐹 𝑈 1 1 𝑂 ∗ 𝑀𝑝𝑏𝑒
Recommend
More recommend