Boosting Job-Level Migration by Static Analysis Workshop on - - PowerPoint PPT Presentation

boosting job level migration by static analysis
SMART_READER_LITE
LIVE PREVIEW

Boosting Job-Level Migration by Static Analysis Workshop on - - PowerPoint PPT Presentation

Boosting Job-Level Migration by Static Analysis Workshop on Operating Systems Platforms for Embedded Real-Time Applications July 09, 2019 Tobias Klaus, Peter Ulbrich, Phillip Raffeck, Benjamin Frank, Lisa Wernet , Maxim Ritter von Onciul ,


slide-1
SLIDE 1

Boosting Job-Level Migration by Static Analysis

Workshop on Operating Systems Platforms for Embedded Real-Time Applications July 09, 2019 Tobias Klaus, Peter Ulbrich, Phillip Raffeck, Benjamin Frank, Lisa Wernet, Maxim Ritter von Onciul, Wolfgang Schröder-Preikschat

Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)

SCHR 603/9-2 SCHR 603/13-1 SCHR 603/14-2 CRC/TRR 89 Project C1

EU EFRE funds 0704/883 25

slide-2
SLIDE 2

Multi-Core Scheduling

t Core 1: τ1 τ3 t Core 2: τ2 T T 20 20 40 40 60 60 80 80 100 100

Multi-Core Systems

  • Static allocation of tasks to cores

Boosting job-level migration by static analysis 1

slide-3
SLIDE 3

Multi-Core Scheduling

t Core 1: τ1 τ3 t Core 2: τ2 T T 20 20 40 40 60 60 80 80 100 100

Multi-Core Systems

  • Static allocation of tasks to cores

→ Poor utilization and schedulability

Boosting job-level migration by static analysis 1

slide-4
SLIDE 4

Multi-Core Scheduling

t Core 1: τ1 τ3 t Core 2: τ2 T T 20 20 40 40 60 60 80 80 100 100

Multi-Core Systems

  • Static allocation of tasks to cores

→ Poor utilization and schedulability Solution: Full Migration

  • Dynamic (re)allocation of tasks
  • Good utilization and schedulability

Boosting job-level migration by static analysis 1

slide-5
SLIDE 5

Multi-Core Scheduling

t Core 1: τ1 τ3a t Core 2: τ2 τ3b T T 20 20 40 40 60 60 80 80 100 100

Multi-Core Systems

  • Static allocation of tasks to cores

→ Poor utilization and schedulability Solution: Full Migration?

  • Dynamic (re)allocation of tasks
  • Good utilization and schedulability

Boosting job-level migration by static analysis 1

slide-6
SLIDE 6

Multi-Core Scheduling

t Core 1: τ1 τ3a t Core 2: τ2 τ3b T T 20 20 40 40 60 60 80 80 100 100

Multi-Core Systems

  • Static allocation of tasks to cores

→ Poor utilization and schedulability Solution: Full Migration?

  • Dynamic (re)allocation of tasks
  • Good utilization and schedulability

→ Impractical in real-time systems

Boosting job-level migration by static analysis 1

slide-7
SLIDE 7

Multi-Core Scheduling

t Core 1: τ1 τ3a t Core 2: τ2

τ3b

T T 20 20 40 40 60 60 80 80 100 100

Multi-Core Systems

  • Static allocation of tasks to cores

→ Poor utilization and schedulability Solution: Full Migration?

  • Dynamic (re)allocation of tasks
  • Good utilization and schedulability

→ Impractical in real-time systems Static Allocation Again?

  • Split tasks to appropriate size

Boosting job-level migration by static analysis 1

slide-8
SLIDE 8

Splitting the Execution

Size versus Costs = 0; int32_t x uint16_t y = foo(); ( = 0; < 5; ++) { for uint8_t i i i += * bar[ ]; x y i } = * 4711; int64_t z x ( = 0; < 5; ++) { for uint8_t j j j += baz[ ]; z j } return ; z

1 2 3 4 5 6 7 8 9 10

Find Appropriate Split Points

Boosting job-level migration by static analysis 2

slide-9
SLIDE 9

Splitting the Execution

Size versus Costs = 0; int32_t x uint16_t y = foo(); ( = 0; < 5; ++) { for uint8_t i i i += * bar[ ]; x y i } = * 4711; int64_t z x ( = 0; < 5; ++) { for uint8_t j j j += baz[ ]; z j } return ; z

1 2 3 4 5 6 7 8 9 10

x y i z j Lifespan:

Find Appropriate Split Points

  • Static analysis

Boosting job-level migration by static analysis 2

slide-10
SLIDE 10

Splitting the Execution

Size versus Costs = 0; int32_t x uint16_t y = foo(); ( = 0; < 5; ++) { for uint8_t i i i += * bar[ ]; x y i } = * 4711; int64_t z x ( = 0; < 5; ++) { for uint8_t j j j += baz[ ]; z j } return ; z

1 2 3 4 5 6 7 8 9 10

x y i z j Lifespan:

Find Appropriate Split Points

  • Static analysis
  • Consider WCET

Boosting job-level migration by static analysis 2

slide-11
SLIDE 11

Splitting the Execution

Size versus Costs = 0; int32_t x uint16_t y = foo(); ( = 0; < 5; ++) { for uint8_t i i i += * bar[ ]; x y i } = * 4711; int64_t z x ( = 0; < 5; ++) { for uint8_t j j j += baz[ ]; z j } return ; z

1 2 3 4 5 6 7 8 9 10

x y i z j Lifespan:

Find Appropriate Split Points

  • Static analysis
  • Consider WCET
  • Minimize migration cost

Boosting job-level migration by static analysis 2

slide-12
SLIDE 12

Migration

Challenges

  • Split tasks to target WCET

Boosting job-level migration by static analysis 3

slide-13
SLIDE 13

Migration

Challenges

  • Split tasks to target WCET
  • Reduce migration cost

Boosting job-level migration by static analysis 3

slide-14
SLIDE 14

Migration

Challenges

  • Split tasks to target WCET
  • Reduce migration cost

Approach → Job-Level Migration → Static Analysis → Optimization within two dimensions

Boosting job-level migration by static analysis 3

slide-15
SLIDE 15

Overview

Randomly sized scheduling units Static analysis Split point graph Sequential Uniformly sized scheduling units Branches Optimization within WCET and migration cost Loops

Boosting job-level migration by static analysis 4

slide-16
SLIDE 16

Overview

Randomly sized scheduling units Static analysis Split point graph Sequential Uniformly sized scheduling units Branches Optimization within WCET and migration cost Loops

Boosting job-level migration by static analysis 5

slide-17
SLIDE 17

Static analysis

BB1 BB3 BB4 BB5 BB6 BB7

E1

1

Basic Procedure

  • 1. Create control-flow graph
  • 2. WCET analysis
  • 3. Lifespan analysis

Boosting job-level migration by static analysis 6

slide-18
SLIDE 18

Static analysis

BB1 BB3 BB4 BB5 BB6 BB7

E1

1

Basic Procedure

  • 1. Create control-flow graph
  • 2. WCET analysis
  • 3. Lifespan analysis

     Split-point candidates

Boosting job-level migration by static analysis 6

slide-19
SLIDE 19

Split-Point Graphs

Randomly sized scheduling units Static analysis Split-point graph Sequential Uniformly sized scheduling units Branches Optimization within WCET and migration cost Loops

Boosting job-level migration by static analysis 7

slide-20
SLIDE 20

General Concept: Split-Point Graphs

Control-Flow Graph

BB1 BB3 BB4 BB5 BB6 BB7

E1

w1 w2 w3 w4

Boosting job-level migration by static analysis 8

slide-21
SLIDE 21

General Concept: Split-Point Graphs

Control-Flow Graph

BB1 BB3 BB4 BB5 BB6 BB7

E1

w1 w2 w3 w4

Intermediate Graph

E1 Boosting job-level migration by static analysis 8

slide-22
SLIDE 22

General Concept: Split-Point Graphs

Control-Flow Graph

BB1 BB3 BB4 BB5 BB6 BB7

E1

w1 w2 w3 w4

Intermediate Graph

E1

Split-Point Graph

w1 w2 w3 w4 w5

Boosting job-level migration by static analysis 8

slide-23
SLIDE 23

General Concept: Split-Point Graphs

Control-Flow Graph

BB1 BB3 BB4 BB5 BB6 BB7

E1

w1 w2 w3 w4

Intermediate Graph

E1

Split-Point Graph

w1 w2 w3 w4 w5

Boosting Job-Level Migration

  • Static analysis of tasks w.r.t. WCET and resident-set size
  • Split-point graphs capture split-point candidates
  • Horizontal cuts: finding split points with low migration cost

Boosting job-level migration by static analysis 8

slide-24
SLIDE 24

Overview

Randomly sized scheduling units Static analysis Split point graph Sequential Uniformly sized scheduling units Branches Optimization within WCET and migration cost Loops

Boosting job-level migration by static analysis 9

slide-25
SLIDE 25

Splitting Loops

Let the body untouched!

Original Loop

1 LOOP_Bound(x:10); 2 for(int i = 0; i < x; ++i) 3 { .... }

  • Splitting the loop body?
  • # of iterations dominates WCET

Boosting job-level migration by static analysis 10

slide-26
SLIDE 26

Splitting Loops

Let the body untouched!

Original Loop

1 LOOP_Bound(x:10); 2 for(int i = 0; i < x; ++i) 3 { .... }

  • Splitting the loop body?
  • # of iterations dominates WCET

→ Split by number of iterations!

Boosting job-level migration by static analysis 10

slide-27
SLIDE 27

Splitting Loops

Let the body untouched!

Original Loop

1 LOOP_Bound(x:10); 2 for(int i = 0; i < x; ++i) 3 { .... }

  • Splitting the loop body?
  • # of iterations dominates WCET

→ Split by number of iterations! Loop after Splitting

1 int i = 0, C = 5; 2 for(; i < x && C; ++i) 3 { --C; .... } 4

....

5

C = 5;

6 for(; i < x && C; ++i) 7 { --C; .... }

General Approach

  • Compute number of iterations to fit target WCET
  • Derive upper bound for the number of cuts
  • Duplicate body and adjust loop condition

Boosting job-level migration by static analysis 10

slide-28
SLIDE 28

Splitting Branches

The problem with conditional load ...

Scheduling Unit (SU)

true false exit

205

C = 160

TRUE

cond

C = 205

FALSE

cond exit true false true false

SUA

200 150

SUB

C = 200

FALSE

SPLIT

C = 10

TRUE

C = 150

TRUE

C = 5

FALSE

350

Additional Pessimism Caused by Naive Splitting

  • Local optimization may lead to unbalanced cuts in branches
  • Condition is unknown at compile time

→ Overapproximation in timing analysis

Boosting job-level migration by static analysis 11

slide-29
SLIDE 29

Splitting Branches

  • Global vs. Local Optimization
  • Find suitable points locally
  • Global alignment between branches

→ Minimize size differences

Boosting job-level migration by static analysis 12

slide-30
SLIDE 30

Splitting Branches

  • Global vs. Local Optimization
  • Find suitable points locally
  • Global alignment between branches

→ Minimize size differences General Approach

  • Add jump
  • Additional logic

Boosting job-level migration by static analysis 12

slide-31
SLIDE 31

Overheads per Cut

How much is the fun?

Sequential Code i+

seq = 1 Boosting job-level migration by static analysis 13

slide-32
SLIDE 32

Overheads per Cut

How much is the fun?

Sequential Code i+

seq = 1

Branches i+

if

= nbranch ∗ 2 Marking the active branch + 1 Terminating the first scheduling unit + 3 Proceeding with the correct branch

Boosting job-level migration by static analysis 13

slide-33
SLIDE 33

Overheads per Cut

How much is the fun?

Sequential Code i+

seq = 1

Branches i+

if

= nbranch ∗ 2 Marking the active branch + 1 Terminating the first scheduling unit + 3 Proceeding with the correct branch Loops i+

loop = (5 + 1)

Counter for planned iterations + 2 Exiting the scheduling unit and resetting the iteration counter + 3 Executing the following part of the loop

i+ # additional instructions nbranch # branches, affected by a horizontal cut Boosting job-level migration by static analysis 13

slide-34
SLIDE 34

Overheads per Cut

How much is the fun?

Sequential Code i+

seq = 1

Branches i+

if

= nbranch ∗ 2 Marking the active branch + 1 Terminating the first scheduling unit + 3 Proceeding with the correct branch Loops i+

loop = (5 + 1)

Counter for planned iterations + 2 Exiting the scheduling unit and resetting the iteration counter + 3 Executing the following part of the loop

i+ # additional instructions nbranch # branches, affected by a horizontal cut

Low overall overhead

  • Only few additional instructions for all

different program constructs ⇒ Minor effects on overall execution time

Boosting job-level migration by static analysis 13

slide-35
SLIDE 35

Schedulability

3.5 0.0 3.6 3.7 3.8 3.9 4.0 utilization schedulability 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

  • riginal system

split system

Effects on the schedulability of systems with high utilization Experimental Setup

  • System with four processor cores
  • 12000 synthetic benchmark

systems Goal

  • Feasible allocation and schedule

for each task set

Boosting job-level migration by static analysis 14

slide-36
SLIDE 36

Schedulability

3.5 0.0 3.6 3.7 3.8 3.9 4.0 utilization schedulability 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

  • riginal system

split system

Effects on the schedulability of systems with high utilization Experimental Setup

  • System with four processor cores
  • 12000 synthetic benchmark

systems Goal

  • Feasible allocation and schedule

for each task set ⇒ 70 percent more schedulable task sets for the highest utilization

Boosting job-level migration by static analysis 14

slide-37
SLIDE 37

Migration Costs

Finding split points with low migration cost Experimental Setup

  • Real-world benchmarks taken from the TACLeBench suite
  • Creation of OSEK systems: one benchmark task and two load tasks
  • Generate systems which are unschedulable on two cores without migration
  • Only cut benchmark tasks
  • Recording of the resident-set size (in LLVM-IR types)
  • Worst-case migration cost observed in all possible split-point candidates
  • Migration cost of the split point chosen by our approach

Boosting job-level migration by static analysis 15

slide-38
SLIDE 38

Migration Costs

Benchmark Worst-case Resident-set Size [bits] Split-point Resident-set Size [bits] Cost improvement [bits] binarysearch 225 224 1 bitonic 65 64 1 complex_update 480 288 192 countnegative 2176 1568 608 filterbank 60 736 60 704 32 iir 432 400 32 insertsort 544 128 416 minver 17 568 16 800 768 petrinet 5057 5056 1

Boosting job-level migration by static analysis 16

slide-39
SLIDE 39

Migration Costs

Benchmark Worst-case Resident-set Size [bits] Split-point Resident-set Size [bits] Cost improvement [bits] binarysearch 225 224 1 bitonic 65 64 1 complex_update 480 288 192 countnegative 2176 1568 608 filterbank 60 736 60 704 32 iir 432 400 32 insertsort 544 128 416 minver 17 568 16 800 768 petrinet 5057 5056 1

⇒ Lower worst-case migration overhead ⇒ Tighter results from timing analysis

Boosting job-level migration by static analysis 16

slide-40
SLIDE 40

Conclusion and Outlook

Conclusion

  • Compile time
  • Beneficial size of scheduling units

⇒ Systems with high utilization become schedulable

Boosting job-level migration by static analysis 17

slide-41
SLIDE 41

Conclusion and Outlook

Conclusion

  • Compile time
  • Beneficial size of scheduling units

⇒ Systems with high utilization become schedulable

  • Runtime
  • Migration at beneficial points
  • Only if necessary

⇒ Reducing overapproximation in the WCET analysis

Boosting job-level migration by static analysis 17

slide-42
SLIDE 42

Conclusion and Outlook

Conclusion

  • Compile time
  • Beneficial size of scheduling units

⇒ Systems with high utilization become schedulable

  • Runtime
  • Migration at beneficial points
  • Only if necessary

⇒ Reducing overapproximation in the WCET analysis

Current Work and Outlook

  • More accurate WCET estimation
  • Adapt an OS to support migration threshold
  • Consider the OS and system calls within the analysis

Boosting job-level migration by static analysis 17