Constructing Dynamic Policies for Paging Mode Selection Jason - - PowerPoint PPT Presentation

constructing dynamic policies for paging mode selection
SMART_READER_LITE
LIVE PREVIEW

Constructing Dynamic Policies for Paging Mode Selection Jason - - PowerPoint PPT Presentation

Constructing Dynamic Policies for Paging Mode Selection Jason Hiebel Laura E. Brown Zhenlin Wang jshiebel@mtu.edu lebrown@mtu.edu zlwang@mtu.edu Department of Computer Science Michigan Technological University International Conference on


slide-1
SLIDE 1

Constructing Dynamic Policies for Paging Mode Selection

Jason Hiebel

jshiebel@mtu.edu

Laura E. Brown

lebrown@mtu.edu

Zhenlin Wang

zlwang@mtu.edu

Department of Computer Science Michigan Technological University

International Conference on Parallel Processing August 2018

0 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-2
SLIDE 2

Paging Mode Selection Contextual Bandits DSP-OFFSET Evaluation Conclusion

0 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-3
SLIDE 3

Virtual Address Translation

Shadow Paging (SP)

virtual addresses shadow page table guest page table machine addresses

synchronized

virtual to machine

Hardware-Assisted Paging (HAP)

virtual addresses guest page table extended page table machine addresses virtual to physical physical to machine

1 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-4
SLIDE 4

Workload Behavior Determines Performance

Execution Time (s) Benchmark HAP SP SP / HAP gcc 413 632 + 53% tonto 950 1150 + 21% mcf 385 340

  • 12%

cactusADM 1610 1309

  • 19%

Shadow Paging Page faults cause expensive context switches and VM exits Hardware Assisted Paging DTLB misses more expensive due to extended page table

2 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-5
SLIDE 5

Workload Behavior Determines Performance

Execution Time (s) Benchmark HAP SP SP / HAP gcc 413 632 + 53% tonto 950 1150 + 21% mcf 385 340

  • 12%

cactusADM 1610 1309

  • 19%

Shadow Paging Page faults cause expensive context switches and VM exits Hardware Assisted Paging DTLB misses more expensive due to extended page table

2 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-6
SLIDE 6

Workload Behavior Determines Performance

Execution Time (s) Benchmark HAP SP SP / HAP gcc 413 632 + 53% tonto 950 1150 + 21% mcf 385 340

  • 12%

cactusADM 1610 1309

  • 19%

Shadow Paging Page faults cause expensive context switches and VM exits Hardware Assisted Paging DTLB misses more expensive due to extended page table

2 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-7
SLIDE 7

Paging Mode Selection

Goal ◮ Utilize paging mode most suited to the current workload Dynamic Selection ◮ Periodically select paging mode based on runtime behavior (page fault count, DTLB miss count) ◮ Paging mode performance depends on hardware, software

◮ memory hierarchy ◮ address space size

3 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-8
SLIDE 8

Existing Selection Methods

DSP-Manual (Wang et al.; VEE ‘11) ◮ Model constructed by domain experts ◮ Requires extensive manual profiling and analysis ASP-SVM (Kuang et al.; ML ‘15) ◮ Model constructed using off-the-shelf machine learning tools (Support Vector Machines) ◮ Requires enumerative profiling method

4 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-9
SLIDE 9

DSP-OFFSET

Overview ◮ Paging mode selection as a contextual bandit ◮ Construct model using simple, uniformly random profiling Advantages ◮ Equivalent performance to state-of-the-art (ASP-SVM) ◮ Significant (90%) reduction in profiling time

5 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-10
SLIDE 10

Paging Mode Selection Contextual Bandits DSP-OFFSET Evaluation Conclusion

5 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-11
SLIDE 11

The Contextual Bandit

◮ Sequential decision making with limited feedback

  • 1. Observe contextual information
  • 2. Select an action
  • 3. Receive reward for selected action

6 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-12
SLIDE 12

Action Selection

Online Selection ◮ Interactive — interleave exploration and exploitation ◮ Techniques not amenable to low-level implementation Offline Selection ◮ Non-interactive — exploration before exploitation ◮ Learn from logged (random) choices

7 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-13
SLIDE 13

Contextual Bandit Formulation

Contextual Information ◮ Page Faults ◮ DTLB Misses Action Space ◮ Hardware-Assisted Paging ◮ Shadow Paging Reward Function ◮ Throughput (Instructions Per Cycle)

8 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-14
SLIDE 14

Paging Mode Selection Contextual Bandits DSP-OFFSET Evaluation Conclusion

8 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-15
SLIDE 15

DSP-OFFSET

Logged Random Performance Data

  • 1. Profiling with Random Paging Modes

Contextual Bandit Data

  • 2. Per-Phase Reward Calculation

Weighted Data

  • 3. Binary-Offset Transformation

Paging Mode Selection Model

  • 4. Weighted Support Vector Machine

9 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-16
SLIDE 16
  • 1. Profiling with Random Paging Modes
  • Page Faults

DTLB Misses IPC 20000 40000 60000 10000000 20000000 30000000 0.4 0.6 0.8 1.0 1.2 1.4

  • Hardware−Assisted Paging

Shadow Paging 10 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-17
SLIDE 17

DSP-OFFSET

Logged Random Performance Data

  • 1. Profiling with Random Paging Modes

Contextual Bandit Data

  • 2. Per-Phase Reward Calculation

Weighted Data

  • 3. Binary-Offset Transformation

Paging Mode Selection Model

  • 4. Weighted Support Vector Machine

11 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-18
SLIDE 18
  • 2. Per-Phase Reward Calculation

Phase Detection ◮ Determine program phasing from random profiling data ◮ Segment data into phases using IPC change-points ◮ PELT (Pruned Exact Linear Time) change-point detection Reward ◮ Normalized IPC per-phase rewardi = log IPCi mean(IPCphasei)

12 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-19
SLIDE 19
  • 2. Per-Phase Reward Calculation
  • IPC

Reward 0.4 0.6 0.8 1.0 1.2 1.4 −0.4 −0.2 0.0 0.2 0.4 0.6

  • Hardware−Assisted Paging

Shadow Paging 12 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-20
SLIDE 20

DSP-OFFSET

Logged Random Performance Data

  • 1. Profiling with Random Paging Modes

Contextual Bandit Data

  • 2. Per-Phase Reward Calculation

Weighted Data

  • 3. Binary-Offset Transformation

Paging Mode Selection Model

  • 4. Weighted Support Vector Machine

13 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-21
SLIDE 21
  • 3. Binary-Offset Transformation

(Beygelzimer and Langford; SIGKDD ’09)

   context action reward    ⇒    context label weight    ◮ Does the selected action perform better than average? reward > 0 : (label, weight) ← (action, reward) ◮ Does the selected action perform worse than average? reward < 0 : (label, weight) ← (opposite action, |reward|)

14 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-22
SLIDE 22
  • 3. Binary-Offset Transformation
  • Reward

Weights −0.4 −0.2 0.0 0.2 0.4 0.6 0.0 0.1 0.2 0.3 0.4 0.5 0.6

  • Hardware−Assisted Paging

Shadow Paging 14 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-23
SLIDE 23

DSP-OFFSET

Logged Random Performance Data

  • 1. Profiling with Random Paging Modes

Contextual Bandit Data

  • 2. Per-Phase Reward Calculation

Weighted Data

  • 3. Binary-Offset Transformation

Paging Mode Selection Model

  • 4. Weighted Support Vector Machine

15 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-24
SLIDE 24
  • 4. Weighted Support Vector Machine

◮ Construct linear classifier for x = {Page Faults, DTLB Misses} f( x) = sign( β · x + β0) using the weighted support vector machine ◮ Prevent rapid switching using a margin

  • β ·

x + β0 > +0.25 switch to Shadow Paging

  • β ·

x + β0 < −0.25 switch to Hardware-Assisted Paging ◮ Low overhead Xen VM implementation

16 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-25
SLIDE 25

Paging Mode Selection Contextual Bandits DSP-OFFSET Evaluation Conclusion

16 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-26
SLIDE 26

Evaluation Environment

CPU (GHz) Cache (KB) DTLB (entries) L1 L2 L3 L1 L2 2.8 64 512 8192 64 512 4-way 8-way 16-way 4-way 4-way ◮ 1st generation Intel Core i5 processor (Nehalem) ◮ Xen 3.3.1 with paging mode selection patch ◮ 32-bit guest OS with 1 dedicated core and 3 GB memory

17 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-27
SLIDE 27

DSP-SAMPLE

◮ Simple online direct sampling method ◮ Alternate between exploration and exploitation

(“context-less” bandit)

//

t0 HAP SP HAP SP

exploration

BEST

exploitation

t1 HAP SP HAP SP

. . .

18 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-28
SLIDE 28

DSP-OFFSET

Benchmark-Specific

astar DSP-OFFSETastar bwaves DSP-OFFSETbwaves . . . . . . zeusmp DSP-OFFSETzeusmp SPEC CPU06

Benchmark-Agnostic

astar bzip2 . . . xalancbmk SPEC INT06 DSP-OFFSETSPEC INT06

19 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-29
SLIDE 29

Paging Mode Selection Models

Static Models ◮ Hardware-Assisted Paging (baseline) ◮ Shadow Paging Dynamic Models ◮ ASP-SVM (state-of-the-art) ◮ DSP-SAMPLE ◮ DSP-OFFSET (Benchmark-Agnostic, Benchmark-Specific)

20 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-30
SLIDE 30

Direct Sampling is Insufficient

0.8 1.0 1.2 1.4

cactusADM mcf astar

  • mnetpp

zeusmp xalancbmk sphinx3 libquantum sjeng lbm gamess gromacs h264ref hmmer namd gobmk perlbench bzip2 leslie3d dealII calculix GemsFDTD bwaves wrf soplex milc tonto gcc average Benchmark Mean Execution Time (Normalized to HAP) Shadow Paging ASP−SVM DSP−SAMPLE 21 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-31
SLIDE 31

Direct Sampling is Insufficient

0.8 1.0 1.2 1.4

cactusADM mcf astar

  • mnetpp

zeusmp xalancbmk sphinx3 libquantum sjeng lbm gamess gromacs h264ref hmmer namd gobmk perlbench bzip2 leslie3d dealII calculix GemsFDTD bwaves wrf soplex milc tonto gcc average Benchmark Mean Execution Time (Normalized to HAP) Shadow Paging ASP−SVM DSP−SAMPLE 21 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-32
SLIDE 32

DSP-OFFSET Matches State-Of-The-Art

0.8 1.0 1.2 1.4

cactusADM mcf astar

  • mnetpp

zeusmp xalancbmk sphinx3 libquantum sjeng lbm gamess gromacs h264ref hmmer namd gobmk perlbench bzip2 leslie3d dealII calculix GemsFDTD bwaves wrf soplex milc tonto gcc average Benchmark Mean Execution Time (Normalized to HAP) Shadow Paging ASP−SVM DSP−OFFSET (Benchmark−Agnostic) 22 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-33
SLIDE 33

DSP-OFFSET Matches State-Of-The-Art

0.8 1.0 1.2 1.4

cactusADM mcf astar

  • mnetpp

zeusmp xalancbmk sphinx3 libquantum sjeng lbm gamess gromacs h264ref hmmer namd gobmk perlbench bzip2 leslie3d dealII calculix GemsFDTD bwaves wrf soplex milc tonto gcc average Benchmark Mean Execution Time (Normalized to HAP) Shadow Paging ASP−SVM DSP−OFFSET (Benchmark−Agnostic) 22 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-34
SLIDE 34

DSP-OFFSET Generalizes Well

0.8 1.0 1.2 1.4

cactusADM mcf astar

  • mnetpp

zeusmp xalancbmk sphinx3 libquantum sjeng lbm gamess gromacs h264ref hmmer namd gobmk perlbench bzip2 leslie3d dealII calculix GemsFDTD bwaves wrf soplex milc tonto gcc average Benchmark Mean Execution Time (Normalized to HAP) Shadow Paging DSP−OFFSET (Benchmark−Specific) DSP−OFFSET (Benchmark−Agnostic) 23 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-35
SLIDE 35

DSP-OFFSET Generalizes Well

0.8 1.0 1.2 1.4

cactusADM mcf astar

  • mnetpp

zeusmp xalancbmk sphinx3 libquantum sjeng lbm gamess gromacs h264ref hmmer namd gobmk perlbench bzip2 leslie3d dealII calculix GemsFDTD bwaves wrf soplex milc tonto gcc average Benchmark Mean Execution Time (Normalized to HAP) Shadow Paging DSP−OFFSET (Benchmark−Specific) DSP−OFFSET (Benchmark−Agnostic) 23 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-36
SLIDE 36

DSP-OFFSET Reduces Profiling Time

SPEC INT06 Benchmark Executions Profiling (h) Samples ASP-SVM ∼ 5 > 24.0 60 DSP-OFFSET 1 2.5 25000

24 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-37
SLIDE 37

Action Selection Indicates Program Behavior

wrf tonto povray leslie3d GemsFDTD calculix milc dealII gobmk sphinx3 gcc perlbench soplex bzip2 astar xalancbmk bwaves sjeng mcf

  • mnetpp

libquantum lbm zeusmp cactusADM Time Hardware−Assisted Paging Shadow Paging Margin Behavior 25 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-38
SLIDE 38

Paging Mode Selection Contextual Bandits DSP-OFFSET Evaluation Conclusion

25 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-39
SLIDE 39

Conclusion

◮ Comparable performance vs state-of-the-art ◮ Over 90% reduction in profiling time vs state-of-the-art ◮ The contextual bandit is a viable selection model

26 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-40
SLIDE 40

Constructing Dynamic Policies for Paging Mode Selection

Jason Hiebel

jshiebel@mtu.edu

Laura E. Brown

lebrown@mtu.edu

Zhenlin Wang

zlwang@mtu.edu

Department of Computer Science Michigan Technological University

International Conference on Parallel Processing August 2018

26 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-41
SLIDE 41

References

[1] Alina Beygelzimer and John Langford. The Offset Tree for Learning with Partial Labels. KDD ’09, 2009. [2] Wei Kuang, Laura E. Brown, and Zhenlin Wang. Selective Switching Mechanism in Virtual Machines via Support Vector Machines and Transfer Learning. Machine Learning, 101(1), 2015. [3] Xiaolin Wang, Jiarui Zang, Zhenlin Wang, Yingwei Luo, and Xiaoming Li. Selective Hardware/Software Memory Virtualization. VEE ’11, 2011.

26 | 26 Hiebel, Brown, Wang; ICPP ’18

slide-42
SLIDE 42

Future Directions

Extend the approach to similar problems (e.g. hardware prefetchers) ◮ Selecting contextual information ◮ Larger, combinatorial action spaces ◮ Multi-core, co-tenant workloads Evaluation using Random Profiling ◮ DSP-OFFSET still requires in situ evaluation ◮ Alleviate cost of model validation

26 | 26 Hiebel, Brown, Wang; ICPP ’18