Near-Optimal Adaptive Control of a Large Grid Application Det - - PowerPoint PPT Presentation

near optimal adaptive control of a large grid application
SMART_READER_LITE
LIVE PREVIEW

Near-Optimal Adaptive Control of a Large Grid Application Det - - PowerPoint PPT Presentation

Near-Optimal Adaptive Control of a Large Grid Application Det Buaklee Greg Tracy Mary Vernon Steve Wright Computer Science Department University of Wisconsin - Madison Talk Outline Condor Stochastic Optimization, ATR ATR


slide-1
SLIDE 1

Near-Optimal Adaptive Control

  • f a Large Grid Application

Det Buaklee Greg Tracy Mary Vernon Steve Wright

Computer Science Department University of Wisconsin - Madison

slide-2
SLIDE 2

ICS’02 New York City June 26, 2002 [2]

Talk Outline

  • Condor
  • Stochastic Optimization, ATR
  • ATR Execution Time Analysis
  • Model for Minimum Execution Time
  • Results: Optimized ATR Performance
slide-3
SLIDE 3

ICS’02 New York City June 26, 2002 [3]

Condor

  • Provides high throughput computation
  • Manages a heterogeneous & dynamic pool
  • MW layer supports Master-Worker applications

– Submitting node is the “master” node – Condor dynamically allocates “worker” nodes – Worker nodes can drop out during computation (min,max)

Application MW Layer Condor PVM/TCP

Communication Link

slide-4
SLIDE 4

ICS’02 New York City June 26, 2002 [4]

Stochastic Optimization

  • Non-trivial ~ 10,000 lines + LP codes
  • Optimization of a model with uncertain data

– Large number of possible scenarios for the data – Arises in planning-under-uncertainty applications

  • x: vector of variables (unknowns)

– aim to find the x that optimizes expected model performance over all the scenarios

  • Objective function is an expectation Q(x)

min cTx + Q(x) subject to Ax = b, x ≥ 0 x

slide-5
SLIDE 5

ICS’02 New York City June 26, 2002 [5]

  • Probabilistic weighted sum over the objective

for each individual scenario ωi, i=1,2,…N

Properties of Expectation Q(x)

Q(x)

x

i=1 N

Q(x) = ∑ pi Q(x;ωi)

  • N is number of scenarios evaluated

–Maybe sampled from the full set of scenarios –Increase N to improve the accuracy

slide-6
SLIDE 6

ICS’02 New York City June 26, 2002 [6]

ATR Parallelism

N = 16 = number of scenarios evaluated G = 4 = number of task groups T = 8 = number of tasks per iteration

N

For each Iteration

master

G T

workers

slide-7
SLIDE 7

ICS’02 New York City June 26, 2002 [7]

Goals

Given N and a set of workers:

  • Compute (near)optimal adaptive values of B, G, T

– Automated process – Fast/simple runtime computation

  • Compare adaptive and non-adaptive B, G, and

grouping/scheduling of tasks

Approach: LogP/LogGP/LoPC model

slide-8
SLIDE 8

ICS’02 New York City June 26, 2002 [8]

ATR in parallel

  • Each task i returns value of ΣiQ(x;ω), and a

subgradient (slope) for this partial sum

  • Sum over tasks to obtain complete function

Q(x) and its subgradient

x1 Q(x1;ω1) Q(x1;ω2) Q(x1;ω3) Q(x2;ω1) Q(x2;ω2) Q(x2;ω3) Q(x)

Master Workers

  • At the end of each iteration, set new x to be

minimizer of the latest approximation to Q(x) x2 x3

slide-9
SLIDE 9

ICS’02 New York City June 26, 2002 [9]

Execution Time Analysis

Measure LogP/LogGP/LoPC model parameters

– L (network latency) – o (message processing overhead) – G (gap per byte - Bandwidth) – P (number of Processors

master execution time worker execution time communication time

slide-10
SLIDE 10

ICS’02 New York City June 26, 2002 [10]

Execution Time Measurement

  • One master and one worker experiment
  • High variability

1.35 2.69 5.19 10.36 10.56 20.54 avg Worker Execution Time (sec) 13.27 0.05 3.33 21 2411 3.30 6.74 400 400 6.25 0.03 2.25 25 2092 3.84 6.12 200 200 3.41 0.05 1.57 31 1162 3.40 5.94 100 100 7.60 0.05 2.42 32 1936 3.64 6.83 100 50 3.05 0.01 1.32 47 1405 3.56 6.04 50 50 2.06 0.01 0.38 82 915 3.36 6.51 25 25 max min avg num it. max min avg Master Time to Compute a New Iterate, x (sec) Master Time to Update Model Function m(x) (msec) T G

2 3 1

slide-11
SLIDE 11

ICS’02 New York City June 26, 2002 [11]

Worker Execution Times

  • For a given planning problem tw is linear in

– Number of scenarios evaluated – Processor speed

10 20 30 40 50 200 400 600 800 1000 1200

Number of Scenarios Evaluated (N/G) Worker Execution Time (sec)

MIPS 600 MIPS 780 MIPS 1100 MIPS 1700

1

Total worker time = n(tw)max

slide-12
SLIDE 12

ICS’02 New York City June 26, 2002 [12]

Updating m(x) after each task group (G) returns

  • Variability in execution time due to:

– Excessive default debug I/O – Interference from Condor administrative tasks

  • Eliminating both makes this execution time <1ms

i.e., negligible

Master Execution Times

0.01 0.1 1 10 100 1000 1000 2000 3000 4000

Worker Completion Event Count Time to Update m (x ) (msec)

lightly loaded master, default debug level lightly loaded master, reduced debug level isolated master, reduced debug level

2

slide-13
SLIDE 13

ICS’02 New York City June 26, 2002 [13]

  • Hard to make prediction for the next iterate
  • Same characteristic for all planning problem

Master Execution Times

1 2 3 4 5 6 7 10 20 30 40 Iteration Number Time to Compute New x (sec) T = 200 T = 100 5 10 15 20 25 30 200 400 600 800 Iteration Number Time to Compute New x (sec)

SSN network design problem 20term problem

Time to computing new x

3

slide-14
SLIDE 14

ICS’02 New York City June 26, 2002 [14]

Master Execution Times

Generating new x at the end of each iteration:

25 50 75 100 200 400 600 800 1000

Number of Tasks (T) Total Master Processing Time (sec)

  • Number of iterations (n) and time to compute x for

each iteration depends on N, T

  • Given N, total master processing time (tM) is fixed!

3

0.5 1 1.5 2 2.5 3 3.5

200 400 600 800 1000

Number of Task (T)

  • Avg. Time to

Compute New x (sec)

20 40 60 80

Number of Iteration (n)

Optimize: T is large, but not too large

slide-15
SLIDE 15

ICS’02 New York City June 26, 2002 [15]

Communication Costs

2 4 6 8 2 4 6 8 10 12 14 16

Size of Data Sent (KB) Time (usec) 0.00 0.14 0.28 0.42 0.56 0.70 0.84 0.98 2 4 6 8 10 12 14 16

Size of Data Sent (KB)

Time (sec)

Experiment 1 Experiment 2

  • Round trip time measurement
  • Critical path contains one round trip time per iterate
  • Round trip time << worker execution time

for message sizes used in ATR (250–1200 bytes)

Between local nodes Between Wisconsin and Bologna, Italy

slide-16
SLIDE 16

ICS’02 New York City June 26, 2002 [16]

Effect of Basket Size

  • More iterations (n) needed for larger B –

approximately linear relationship of B and n

  • Optimal B=1

20 40 60 80 100 120 140 160 1 2 3 4 5 6 Basket Size (B) Number of Iterations (n) maximum average minimum

slide-17
SLIDE 17

ICS’02 New York City June 26, 2002 [17]

Model Vocabulary

N number of scenarios in model T number of tasks per iteration G number of groups of scenarios (units of work) B number of vectors x evaluated in parallel tM total master execution time tW individual worker execution time n total number of iterations

slide-18
SLIDE 18

ICS’02 New York City June 26, 2002 [18]

Building the Model

Master, Worker, Communication Times

  • Total master execution time

– Variable for N, T, B – Include only time to generate new x

  • Worker execution time per iteration:

– Very low variation – Consistent from one iteration to another

  • Insignificant contributions from:

– Communication time – Master updating Q(x) – (if T not too large)

tM + n(tw)max tM + n(tw)max

slide-19
SLIDE 19

ICS’02 New York City June 26, 2002 [19]

Model Validation for Homogenous Worker Pool Model: tM + ntw

WI-Argonne Flock 24.9 22.9 20.96 441 44 400 20,000 ssn WI-pool 12.1 10.3 6.32 64 44 100 10000 ssn WI-Argonne Flock 29.3 26.4 20.88 295 61 200 20,000 ssn WI-Argonne Flock 36.3 33.5 20.89 244 84 100 20,000 ssn WI-Argonne Flock 44.7 40.8 20.91 180 108 50 20,000 ssn WI-NM Flock 52.2 48.8 30.97 297 84 100 40,000 ssn WI pool 70.5 69.4 2.35 2762 597 200 5,000 20-terms

Measured Model Total (tM) num it. (n) Note Total Execution Time (min) Benchmark Average (tW) (sec) Compute New x (sec) T N Planning Problem

slide-20
SLIDE 20

ICS’02 New York City June 26, 2002 [20]

Model Validation for Heterogeneous Worker Pool

200 9.67 7.15 10.21 1.68 2.11 61.3 36 150 9.63 6.65 9.78 1.37 2.86 46.77 36 150 9.07 6.85 9.42 1.38 2.76 53.18 38 150 13.75 10.73 13.88 1.36 2.86 60.71 42 50 14.35 13.96 13.82 4.18 6.62 35.8 58 50 35.07 34.22 28.62 4.19 7.03 50.02 70 50 34.65 34.23 28.62 4.21 7.04 50.37 70 Number of Workers Request Measur ed Model tw

max

tw

min

avg. tM n Non Adaptive Execution Time (min) Worker Time ( sec) Computing new X (sec)

Model: tM + n(tw)max

slide-21
SLIDE 21

ICS’02 New York City June 26, 2002 [21]

Optimal Configuration for Homogenous Worker Pool

  • G should be equal to number of available processors
  • T should be large up to a point
  • B should be set to 1

18 min 149 min 68 min 92 min 61 min B=6 B=3 B=6 B=3 Default Debug Reduced Debug Near-Optimize ATR Execution Time Original ATR Execution Time (T = 100, G = 25)

3x – 6x faster!

slide-22
SLIDE 22

ICS’02 New York City June 26, 2002 [22]

20 40 9 18 27 9 18 27 15 13 10 20 26 30 1 2 3 4 5 6 7 8

Heterogeneous task assignment

20 20

9 9 10 13 15 20 20 20 master node’s worker queue master node’s job queue per iteration benchmark: Ew:

slide-23
SLIDE 23

ICS’02 New York City June 26, 2002 [23]

Adaptive task assignment

  • Heterogeneous & dynamic worker pool
  • Better utilization of worker node

Original task assignment Execution Time (min) 52% 91 35.59 39.56 100 82.94 98.6 8.06 20.33 29% 67 3.24 6.78 100 9.52 9.6 0.83 2.84 45% 86 3.41 4.78 100 8.67 9.5 1.20 2.25 49% 26 9.86 11.43 50 22.32 19.4 2.58 7.76 17% 45 8.03 10.50 50 12.66 8.9 1.69 4.02 Number

  • f

Workers Used Model Measured Number of Worker Model max min avg Estimated Speedup (%) Adaptive task assignment Execution time (min) Worker Time (tW) (sec)

slide-24
SLIDE 24

ICS’02 New York City June 26, 2002 [24]

Conclusion

  • Analysis of Grid Application Execution Time
  • Construct, Validate a Simple Performance Model
  • Create an Adaptive Control scheme guided by
  • ur Performance Model
  • Optimal adaptive parameter gives large speedup

(3x-6x) over original ATR code

  • Adaptive task assignment gives 15-55%

speedup over original policy, for optimal parameter values

slide-25
SLIDE 25

ICS’02 New York City June 26, 2002 [25]

Future Work

  • Apply the model to larger data sets
  • Apply the model to more complex objectives

such as controlling processor utilization

  • Apply this model to other grid applications
slide-26
SLIDE 26

ICS’02 New York City June 26, 2002 [26]

Acknowledgments

  • Jeff Linderoth (ATR)
  • Jichuan Chang (MW)
  • condor-admin@cs.wisc.edu
slide-27
SLIDE 27

ICS’02 New York City June 26, 2002 [27]

Question!?

slide-28
SLIDE 28

ICS’02 New York City June 26, 2002 [28]

Stochastic Optimization Example

  • First month data: Demand 10 units, Price

$1.00/unit, Storage cost $0.05/unit

  • Possible second month scenarios:

1.80 27.0 0.05 Very cold 1.50 14.0 0.15 Cold 0.85 08.0 0.30 Warm 1.00 10.0 0.50 Normal Price Demand Prob. Scenario

slide-29
SLIDE 29

ICS’02 New York City June 26, 2002 [29]

ATR

  • “Asynchronous Trust-Region”

algorithm for minimizing Q(x) subject to the constraints

  • Iterative fork-join

synchronization structure

  • Unpredictable number of

iteration to converge

  • Adjustable task parameter
  • 15,000 lines of code
slide-30
SLIDE 30

ICS’02 New York City June 26, 2002 [30]

Even More Parallelism!

  • Possibly generate new x before all Q(x;ωi)

return!

  • Now only have partial info about Q(x), so

expect lower quality estimates of x

  • Example:

x1 x2 Q(x)

Master Workers

Q(x1;ω1) Q(x1;ω2) Q(x1;ω3)

slide-31
SLIDE 31

ICS’02 New York City June 26, 2002 [31]

ATR Vocabulary

N number of scenarios in the model (possible values for the uncertain data)

e.g., N = 5,000 or N = 40,000

G number of groups of scenarios (units of work)

e.g., G = 50 or G = 100

T number of tasks in iteration

e.g., T = 200 or T = 1,000

B number of variables x evaluated in parallel

e.g., B = 5

slide-32
SLIDE 32

ICS’02 New York City June 26, 2002 [32]

Adaptive Control Algorithm

  • Sorting the worker list based on benchmarks

– Benchmark = execution time of a sample task group

  • n this worker

– Indicates the expected time needed for this worker to complete one task group

  • For each worker w, define

Ew = (# currently assigned tasksw +1) * benchmarkw

  • New task will be assigned to the worker with

lowest Ew