Where Have all the Cycles Gone? Investigating the Runtime Overheads - - PowerPoint PPT Presentation

where have all the cycles gone
SMART_READER_LITE
LIVE PREVIEW

Where Have all the Cycles Gone? Investigating the Runtime Overheads - - PowerPoint PPT Presentation

Fakultt Informatik Institut fr Systemarchitektur, Lehrstuhl Betriebssysteme Where Have all the Cycles Gone? Investigating the Runtime Overheads of OS- Assisted Replication Bjrn Dbel, Hermann Hrtig TU Dresden Operating Systems Group


slide-1
SLIDE 1

Björn Döbel, Hermann Härtig TU Dresden Operating Systems Group

Fakultät Informatik Institut für Systemarchitektur, Lehrstuhl Betriebssysteme

Koblenz, 16.09.2013

Where Have all the Cycles Gone?

Investigating the Runtime Overheads of OS- Assisted Replication

slide-2
SLIDE 2

TU Dresden, 07.05.14 Where have all the cycles gone? Folie 2 / 23

ASTEROID – OS-Assisted Replication

Fiasco.OC L4Re APP APP APP APP Driver Romain Enc. Proc. Driver Driver

[1] Döbel, Härtig, Engel: Operating System Support for Redundant Multithreading, EMSOFT 2012

[1]

slide-3
SLIDE 3

TU Dresden, 07.05.14 Where have all the cycles gone? Folie 3 / 23

ASTEROID – OS-Assisted Replication

Fiasco.OC L4Re

APP APP APP

APP Romain Enc. Proc.

  • Interpose on system

calls & CPU exceptions

  • Replicate memory

(no need for ECC)

  • Unmodified binary

applications

slide-4
SLIDE 4

TU Dresden, 07.05.14 Where have all the cycles gone? Folie 4 / 23

How Much is Replicated Execution?

  • Resource Overhead
  • Roughly: N x replication → N x resouces
  • Optimizations vs. error coverage
  • Fault Coverage
  • No complete measurements yet
  • Estimation: matches compiler-assistance
  • Runtime overhead
  • Should be optimized for
  • This paper: SPEC INT 2006
slide-5
SLIDE 5

TU Dresden, 07.05.14 Where have all the cycles gone? Folie 5 / 23

Experiment Setup

Intel X5650 @ 2.66 GHz L1 L2

slide-6
SLIDE 6

TU Dresden, 07.05.14 Where have all the cycles gone? Folie 6 / 23

Experiment Setup

L 3 L 3 12 GB RAM

slide-7
SLIDE 7

TU Dresden, 07.05.14 Where have all the cycles gone? Folie 7 / 23

ASTEROID – OS-Assisted Replication

  • L4/Fiasco.OC,

32 bit + Romain

  • SPEC INT 2006

400.perl 401.bzip2 403.gcc 429.mcf 445.gobmk 456.hmmer 458.sjeng 462.libquantum 464.h264 471.omnet++ 473.astar 478.xalancbmk

L 3 L 3 12 GB RAM

slide-8
SLIDE 8

TU Dresden, 07.05.14 Where have all the cycles gone? Folie 8 / 23

Engage!

slide-9
SLIDE 9

TU Dresden, 07.05.14 Where have all the cycles gone? Folie 9 / 23

slide-10
SLIDE 10

TU Dresden, 07.05.14 Where have all the cycles gone? Folie 10 / 23

The Problem: CPU Assignment

App CPU0 Higher priority logger App

Time

print() Native Execution

slide-11
SLIDE 11

TU Dresden, 07.05.14 Where have all the cycles gone? Folie 11 / 23

The Problem: CPU Assignment

App CPU0 Higher priority logger App

Time

print()

CPU0 CPU1 CPU2 CPU3 App App App Higher priority logger

print()

App App App

...

Native Execution Replicated Execution

slide-12
SLIDE 12

TU Dresden, 07.05.14 Where have all the cycles gone? Folie 12 / 23

Where does overhead come from?

Replica exec. Replica exec.

Sync time

Replica 1 Replica 2 Master

Validate States

Notification time

System Call Replica exec. Replica exec.

... ...

slide-13
SLIDE 13

TU Dresden, 07.05.14 Where have all the cycles gone? Folie 13 / 23

Source

Overhead vs. native

Per-replica execution unmodified +/- 0 Sync time No background load ~ 0 State comparison ~ 100 cycles System call Mostly unmodified ~ 0 Notifications Local core ~ 2,000 cycles On socket ~ 6,000 cycles Cross-socket ~ 14,300 cycles

Where does overhead come from?

Rule: Prefer placing replicas on the same CPU socket

slide-14
SLIDE 14

TU Dresden, 07.05.14 Where have all the cycles gone? Folie 14 / 23

Replicating SPEC INT 2006

slide-15
SLIDE 15

TU Dresden, 07.05.14 Where have all the cycles gone? Folie 15 / 23

Idea: Reduce Memory Management Overhead

  • Assumption: memory management

is expensive

  • Idea: reduce overhead by using x86

huge pages (4 MB)

  • Works for microbenchmark
  • SPEC CPU:

(nearly) no difference

Microbenchmark Native: 0.72 s 1x: 0.80 s 2x: 2.23 s 3x: 3.12 s Native: 0.38 s 1x: 0.38 s 2x: 0.53 s 3x: 0.91 s

4 kB pages 4 MB pages

slide-16
SLIDE 16

TU Dresden, 07.05.14 Where have all the cycles gone? Folie 16 / 23

Secondary Effects: Cache Miss Rates

Benchmark DMR misses TMR misses L2 L3 L2 L3 429.mcf 2,600 1,300,000 11,000,000 5,200,000 462.libquantum 2,500 570 440,000 387,000 471.omnet++ 270,000 6,900,000 35,000,000 21,200,000

x 130 x 3 Rule: Prefer placing replicas on a different CPU socket

slide-17
SLIDE 17

TU Dresden, 07.05.14 Where have all the cycles gone? Folie 17 / 23

Secondary Effects: Cache Miss Rates

Benchmark DMR misses TMR misses L2 L3 L2 L3

429.mcf

2,600 → 2,600 1,300,000 → 930,000 11,000,000 → 11,000,000 5,200,000 → 3,600,000

462.libquantum

2,500 → 2,500 570 → 323 440,000 → 385,000 387,000 → 8,700

471.omnet++

270,000 → 290,000 6,900,000 → 5,500,000 35,000,000 → 34,900,000 21,200,000 → 16,400,000

slide-18
SLIDE 18

TU Dresden, 07.05.14 Where have all the cycles gone? Folie 18 / 23

SPEC INT: Improved L3 Miss Rates

slide-19
SLIDE 19

TU Dresden, 07.05.14 Where have all the cycles gone? Folie 19 / 23

Reliable Computing Base [2]

And now for something slightly different...

Fiasco.OC L4Re APP APP APP APP Driver Romain Enc. Proc. Driver Driver

[2] Engel, Döbel: The Reliable Computing Base – A new Paradigm ..., SOBRES 2012

slide-20
SLIDE 20

TU Dresden, 07.05.14 Where have all the cycles gone? Folie 20 / 23

Protecting the RCB

  • We have full RCB source, so we can apply compiler-

level techniques.

  • Encoded compiler anyone?

→ Approximate Overhead

slide-21
SLIDE 21

TU Dresden, 07.05.14 Where have all the cycles gone? Folie 21 / 23

Protecting the RCB: 2nd try tprot := tapp + C * (tkern + tmaster + tkern') + thw tprot := tnative + C * tReplicated

thw = tkern = tkern' = 0

slide-22
SLIDE 22

TU Dresden, 07.05.14 Where have all the cycles gone? Folie 22 / 23

Protecting the RCB: 2nd try

  • Selected values for C [3]:
  • CSWIFT = 1.09
  • CANBD = 3.89

[3] Schiffel, Schmitt, Süßkraut, Fetzer: Software-Implemented Hardware Error Detection: Costs and Gains, DEPEND 2010

slide-23
SLIDE 23

TU Dresden, 07.05.14 Where have all the cycles gone? Folie 23 / 23

Summary

  • Romain: <5% overhead for 3x replicating

most of the SPEC INT 2006 benchmarks

  • Replica-core placement matters
  • RCB protection may add additional
  • verheads
  • Combination with compiler-level methods

seems feasible