Use of a Levy Distribution for Modeling Best Case Execution Time - - PowerPoint PPT Presentation

use of a levy distribution for modeling best case
SMART_READER_LITE
LIVE PREVIEW

Use of a Levy Distribution for Modeling Best Case Execution Time - - PowerPoint PPT Presentation

Use of a Levy Distribution for Modeling Best Case Execution Time Variation Jonathan Beard, Roger Chamberlain SBS Stream Based Supercomputing Lab http://sbs.wustl.edu Work also supported by: 1 Outline Motivation Stream Processing


slide-1
SLIDE 1

Use of a Levy Distribution for Modeling Best Case Execution Time Variation

Jonathan Beard, Roger Chamberlain

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Work also supported by:

1

slide-2
SLIDE 2

Outline

  • Motivation
  • Stream Processing
  • Optimization Goals
  • Methodology
  • Distributions
  • Results

2

slide-3
SLIDE 3

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Streaming Computing

3

slide-4
SLIDE 4

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Streaming Computing

Kernel

3

slide-5
SLIDE 5

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Streaming Computing

Kernel 1 Kernel 2 Kernel 3 Kernel 2

Stream Stream Stream Stream

4

slide-6
SLIDE 6

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Streaming Languages

StreamIt, Auto-Pipe, Brook, Cg, S- Net, Scala-Pipe, Streams-C and many others

5

slide-7
SLIDE 7

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Optimization

Kernel Fast Slow Super Fast Medium

6

slide-8
SLIDE 8

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Optimization

Kernel 1 Kernel 2 Kernel 3 Kernel 2

multi-core A

1 2 3 4

multi-core B

1 2 3 4

More allocation choices, NUMA node A or B to allocate stream.

7

slide-9
SLIDE 9

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Optimization

Kernel 1 Kernel 2 Kernel 3 Kernel 2

multi-core A

1 2 3 4

multi-core B

1 2 3 4

More allocation choices, NUMA node A or B to allocate stream.

1 2 7

slide-10
SLIDE 10

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Optimization

Kernel 1 Kernel 2 Kernel 3 Kernel 2

multi-core A

1 2 3 4

multi-core B

1 2 3 4

More allocation choices, NUMA node A or B to allocate stream.

1 2 7

slide-11
SLIDE 11

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Optimization

B C Q1 Q2

A

A B C “Stream” is modeled as a Queue

8

slide-12
SLIDE 12

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Optimization

B C Q1 Q2

A

A B C “Stream” is modeled as a Queue

8

slide-13
SLIDE 13

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Streaming on Multi-core Systems

  • Commodity multi-core timer availability and latency
  • Frequency scaling and core migration
  • Measuring modifies the application behavior

Problem: Accurate measurement is very difficult. Is there a way to decide on a model without it.

9

We want good models for streaming systems

  • n shared multi-core systems (i.e., a cluster)
slide-14
SLIDE 14

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Derived Information

10

Expected Observed

slide-15
SLIDE 15

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Derived Information

10

Expected Observed Is there a pattern of minimal variation within the systems we’re running on?

  • Avg. Service Time = E[ X ] + Error
slide-16
SLIDE 16

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Goal

Find a distribution that characterizes the minimum expected variation of a hardware and software system Use this characterization to accept or reject models

11

slide-17
SLIDE 17

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Process

12

  • Measurement
  • Workload definition
  • Find a distribution
  • Utilize the distribution to aid model selection
slide-18
SLIDE 18

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Timer Mechanism

13

Ask for Time Receive Time Timer Thread Code

slide-19
SLIDE 19

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Timer Mechanism

14

Timer Thread rdtsc clock_gettime

  • x86 assembly
  • varying methods

to serialize

  • relatively fast
  • multiple drift

issues

  • POSIX standard
  • relatively accurate
  • portable
  • slower than rdtsc
slide-20
SLIDE 20

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Two Timing Choices

15

slide-21
SLIDE 21

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

NUMA Node Variations

16

slide-22
SLIDE 22

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Minimize Variation

  • Restricting timer to single core
  • Use the x86 rdtsc instruction with processor

recommended serializers for each processor type

  • Keeping processes under test on the same

NUMA node as timer

  • Run timer thread with altered priority to

minimize core context swaps

17

slide-23
SLIDE 23

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Best Case Execution Time Variation

  • no-op instruction implemented in most processors
  • usually takes exactly 1 cycle
  • no real functional units are involved, so least

taxing

  • variation observed in execution time should be

external to process

18

slide-24
SLIDE 24

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Data Collection

  • no-op loops calibrated for various nominal

times, tied to a single core and run thousands of times

  • Execution time measured end to end for

each run, environment collected

  • Parameters include:

Number of processes executing on core Number of context swaps (voluntary, involuntary) Many others

19

slide-25
SLIDE 25

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Levy Distribution

20

Execution Time Error ( obs - mean )

slide-26
SLIDE 26

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Levy Distribution

21

Normal Distribution

slide-27
SLIDE 27

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Levy Distribution

22

Gumbel Distribution

slide-28
SLIDE 28

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Levy Distribution

23

Levy Distribution

slide-29
SLIDE 29

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Levy Distribution

23

Levy Distribution

slide-30
SLIDE 30

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Levy Distribution

  • Truncation enables mean calculation, but

requires fitting to each dataset to find where to truncate

  • The truncation parameters are correlated to

both the number of processes per core and the expected execution time

  • Roughly linear relationship gives an

approximate solution to truncation parameters without refitting

24

slide-31
SLIDE 31

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Levy Fit

25

0.000025 0.00001 0.0000155 0.000015 0.0000145 0.000014

0.000025 0.00001 0.000014 0.0000135 0.000013 0.0000125

0.00006 0.00003 0.00005 0.000045 0.00004 0.000035 0.00003 0.000025 0.00002

0.00005 0.00002 0.00003 0.000025 0.00002 0.000015 0.00001

1 - 5 processes 6 - 10 processes 16 - 20 processes 11 - 15 processes

slide-32
SLIDE 32

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Test Setup

26

B Q1

A

Question: Can we use an M/M/1 queueing model to estimate the mean queue occupancy of this system?

  • Hypothesis: Lower Kullback-Leibler (KL) divergence

between expected and realized distribution is associated with higher model accuracy.

slide-33
SLIDE 33

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Test Setup

27

B Q1

A

  • 1. Dedicated thread of execution monitors

queue occupancy

  • 2. Calculate the estimated mean queue
  • ccupancy using the M/M/1 model
  • 3. Calculate KL Divergence for the arrival

process distribution using the truncated Levy distribution noise model

slide-34
SLIDE 34

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Convolution with Exponential

28

slide-35
SLIDE 35

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Conclusions

  • The truncated Levy distribution can be used to

approximate BCETV

  • The distribution of BCETV can be used as a tool

to accept or reject a stochastic queueing model based on distributional assumptions

  • KL divergence between the expected and

convolved distribution highly correlates with queue model accuracy

29

slide-36
SLIDE 36

SBS

Stream Based Supercomputing Lab

http://sbs.wustl.edu

Parting Notes

Slides available here: sbs.wust.edu

  • Timer C++ template code:

http://goo.gl/ItJ3jP

  • Test harness used to collect data:

http://goo.gl/U1VG6N

30