use of a levy distribution for modeling best case
play

Use of a Levy Distribution for Modeling Best Case Execution Time - PowerPoint PPT Presentation

Use of a Levy Distribution for Modeling Best Case Execution Time Variation Jonathan Beard, Roger Chamberlain SBS Stream Based Supercomputing Lab http://sbs.wustl.edu Work also supported by: 1 Outline Motivation Stream Processing


  1. Use of a Levy Distribution for Modeling Best Case Execution Time Variation Jonathan Beard, Roger Chamberlain SBS Stream Based Supercomputing Lab http://sbs.wustl.edu Work also supported by: 1

  2. Outline • Motivation � • Stream Processing � • Optimization Goals � • Methodology � • Distributions � • Results 2

  3. Streaming Computing SBS Stream Based 3 Supercomputing Lab http://sbs.wustl.edu

  4. Streaming Computing Kernel SBS Stream Based 3 Supercomputing Lab http://sbs.wustl.edu

  5. Streaming Computing Kernel 2 Stream Stream Kernel 1 Kernel 3 Stream Stream Kernel 2 SBS Stream Based 4 Supercomputing Lab http://sbs.wustl.edu

  6. Streaming Languages StreamIt, Auto-Pipe, Brook, Cg, S- Net, Scala-Pipe, Streams-C and many others SBS Stream Based 5 Supercomputing Lab http://sbs.wustl.edu

  7. Optimization Slow Medium Kernel Fast Super Fast SBS Stream Based 6 Supercomputing Lab http://sbs.wustl.edu

  8. Optimization multi-core A multi-core B Kernel 2 1 2 1 2 3 4 3 4 Kernel 1 Kernel 3 More allocation choices, NUMA node A or B to Kernel 2 allocate stream. SBS Stream Based 7 Supercomputing Lab http://sbs.wustl.edu

  9. Optimization 2 1 multi-core A multi-core B Kernel 2 1 2 1 2 3 4 3 4 Kernel 1 Kernel 3 More allocation choices, NUMA node A or B to Kernel 2 allocate stream. SBS Stream Based 7 Supercomputing Lab http://sbs.wustl.edu

  10. Optimization 2 1 multi-core A multi-core B Kernel 2 1 2 1 2 3 4 3 4 Kernel 1 Kernel 3 More allocation choices, NUMA node A or B to Kernel 2 allocate stream. SBS Stream Based 7 Supercomputing Lab http://sbs.wustl.edu

  11. Optimization A B C “Stream” is modeled as a Queue A B Q2 C Q1 SBS Stream Based 8 Supercomputing Lab http://sbs.wustl.edu

  12. Optimization A B C “Stream” is modeled as a Queue A B Q2 C Q1 SBS Stream Based 8 Supercomputing Lab http://sbs.wustl.edu

  13. Streaming on Multi-core Systems We want good models for streaming systems on shared multi-core systems (i.e., a cluster) Problem: Accurate measurement is very difficult. Is there a way to decide on a model without it. • Commodity multi-core timer availability and latency • Frequency scaling and core migration • Measuring modifies the application behavior SBS Stream Based 9 Supercomputing Lab http://sbs.wustl.edu

  14. Derived Information Expected Observed SBS Stream Based 10 Supercomputing Lab http://sbs.wustl.edu

  15. Derived Information Expected Observed Is there a pattern of minimal variation within the systems we’re running on? Avg. Service Time = E[ X ] + Error SBS Stream Based 10 Supercomputing Lab http://sbs.wustl.edu

  16. Goal Find a distribution that characterizes the minimum expected variation of a hardware and software system Use this characterization to accept or reject models SBS Stream Based 11 Supercomputing Lab http://sbs.wustl.edu

  17. Process • Measurement � • Workload definition � • Find a distribution � • Utilize the distribution to aid model selection SBS Stream Based 12 Supercomputing Lab http://sbs.wustl.edu

  18. Timer Mechanism Ask for Time Timer Thread Code Receive Time SBS Stream Based 13 Supercomputing Lab http://sbs.wustl.edu

  19. Timer Mechanism Timer Thread rdtsc clock_gettime • POSIX standard • x86 assembly • relatively accurate • varying methods • portable to serialize • slower than rdtsc • relatively fast • multiple drift issues SBS Stream Based 14 Supercomputing Lab http://sbs.wustl.edu

  20. Two Timing Choices SBS Stream Based 15 Supercomputing Lab http://sbs.wustl.edu

  21. NUMA Node Variations SBS Stream Based 16 Supercomputing Lab http://sbs.wustl.edu

  22. Minimize Variation • Restricting timer to single core � • Use the x86 rdtsc instruction with processor recommended serializers for each processor type � • Keeping processes under test on the same NUMA node as timer � • Run timer thread with altered priority to minimize core context swaps SBS Stream Based 17 Supercomputing Lab http://sbs.wustl.edu

  23. B est C ase E xecution T ime V ariation • no-op instruction implemented in most processors � • usually takes exactly 1 cycle � • no real functional units are involved, so least taxing � • variation observed in execution time should be external to process SBS Stream Based 18 Supercomputing Lab http://sbs.wustl.edu

  24. Data Collection • no-op loops calibrated for various nominal times, tied to a single core and run thousands of times � • Execution time measured end to end for each run, environment collected � • Parameters include: Number of processes executing on core Number of context swaps (voluntary, involuntary) Many others SBS Stream Based 19 Supercomputing Lab http://sbs.wustl.edu

  25. Levy Distribution Execution Time Error ( obs - mean ) SBS Stream Based 20 Supercomputing Lab http://sbs.wustl.edu

  26. Levy Distribution Normal Distribution SBS Stream Based 21 Supercomputing Lab http://sbs.wustl.edu

  27. Levy Distribution Gumbel Distribution SBS Stream Based 22 Supercomputing Lab http://sbs.wustl.edu

  28. Levy Distribution Levy Distribution SBS Stream Based 23 Supercomputing Lab http://sbs.wustl.edu

  29. Levy Distribution Levy Distribution SBS Stream Based 23 Supercomputing Lab http://sbs.wustl.edu

  30. Levy Distribution • Truncation enables mean calculation, but requires fitting to each dataset to find where to truncate � • The truncation parameters are correlated to both the number of processes per core and the expected execution time � • Roughly linear relationship gives an approximate solution to truncation parameters without refitting SBS Stream Based 24 Supercomputing Lab http://sbs.wustl.edu

  31. Levy Fit 1 - 5 processes 6 - 10 processes � 0.000014 � 0.0000125 � 0.0000145 � 0.000013 � 0.000015 � 0.0000135 � 0.0000155 � 0.000014 � 0.000025 � 0.00001 � 0.000025 � 0.00001 0 11 - 15 processes 16 - 20 processes � 0.00002 � 0.00001 � 0.000025 � 0.000015 � 0.00003 � 0.000035 � 0.00002 � 0.00004 � 0.000025 � 0.000045 � 0.00003 � 0.00005 � 0.00006 � 0.00003 0 � 0.00005 � 0.00002 0 SBS Stream Based 25 Supercomputing Lab http://sbs.wustl.edu

  32. Test Setup A B Q1 Question: Can we use an M/M/1 queueing model to estimate the mean queue occupancy of this system? � Hypothesis: Lower Kullback-Leibler (KL) divergence between expected and realized distribution is associated with higher model accuracy. SBS Stream Based 26 Supercomputing Lab http://sbs.wustl.edu

  33. Test Setup A B Q1 1. Dedicated thread of execution monitors queue occupancy 2. Calculate the estimated mean queue occupancy using the M/M/1 model 3. Calculate KL Divergence for the arrival process distribution using the truncated Levy distribution noise model SBS Stream Based 27 Supercomputing Lab http://sbs.wustl.edu

  34. Convolution with Exponential SBS Stream Based 28 Supercomputing Lab http://sbs.wustl.edu

  35. Conclusions • The truncated Levy distribution can be used to approximate BCETV � • The distribution of BCETV can be used as a tool to accept or reject a stochastic queueing model based on distributional assumptions � • KL divergence between the expected and convolved distribution highly correlates with queue model accuracy SBS Stream Based 29 Supercomputing Lab http://sbs.wustl.edu

  36. Parting Notes Slides available here: sbs.wust.edu � Timer C++ template code: http://goo.gl/ItJ3jP � Test harness used to collect data: http://goo.gl/U1VG6N SBS Stream Based 30 Supercomputing Lab http://sbs.wustl.edu

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend