The Scalable Commutativity Rule: Designing Scalable Software for - - PowerPoint PPT Presentation

the scalable commutativity rule designing scalable
SMART_READER_LITE
LIVE PREVIEW

The Scalable Commutativity Rule: Designing Scalable Software for - - PowerPoint PPT Presentation

The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors Austin T. Clements Thesis advisors: M. Frans Kaashoek Nickolai Zeldovich Robert Morris Eddie Kohler x86 CPU trends x86 CPU trends 2005 x86 CPU trends


slide-1
SLIDE 1

The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors

Austin T. Clements Thesis advisors:

  • M. Frans Kaashoek

Nickolai Zeldovich Robert Morris Eddie Kohler

slide-2
SLIDE 2

x86 CPU trends

slide-3
SLIDE 3

2005

x86 CPU trends

slide-4
SLIDE 4

Sources: Stanford CPUDB, Intel ARK 1985 1990 1995 2000 2005 2010 2015 1 10 100 1,000 10,000 100,000 Clock speed (MHz)

x86 CPU trends

slide-5
SLIDE 5

Sources: Stanford CPUDB, Intel ARK 1985 1990 1995 2000 2005 2010 2015 1 10 100 1,000 10,000 100,000 Clock speed (MHz) Power (watts)

x86 CPU trends

slide-6
SLIDE 6

Sources: Stanford CPUDB, Intel ARK 1985 1990 1995 2000 2005 2010 2015 1 10 100 1,000 10,000 100,000 Clock speed (MHz) Power (watts)

x86 CPU trends

slide-7
SLIDE 7

Sources: Stanford CPUDB, Intel ARK 1985 1990 1995 2000 2005 2010 2015 1 10 100 1,000 10,000 100,000 Clock speed (MHz) Power (watts) Cores per socket

x86 CPU trends

slide-8
SLIDE 8

Sources: Stanford CPUDB, Intel ARK 1985 1990 1995 2000 2005 2010 2015 1 10 100 1,000 10,000 100,000 Clock speed (MHz) Power (watts) Cores per socket Total megacycles/sec

x86 CPU trends

slide-9
SLIDE 9

Software must be increasingly parallel to keep up with hardware, but scaling with parallelism is notoriously hard

Parallelize or perish

slide-10
SLIDE 10

Software must be increasingly parallel to keep up with hardware, but scaling with parallelism is notoriously hard

2k 4k 6k 8k 10k 1 6 12 18 24 30 36 42 48 Messages/second Cores Exim mail server

Parallelize or perish

slide-11
SLIDE 11

Software must be increasingly parallel to keep up with hardware, but scaling with parallelism is notoriously hard

2k 4k 6k 8k 10k 1 6 12 18 24 30 36 42 48 Messages/second Cores Exim mail server

Problem lies in the OS kernel

Parallelize or perish

slide-12
SLIDE 12

Kernel scalability is important

  • Many applications depend on the OS kernel
  • If the kernel doesn't scale, many applications won't scale

And hard

  • |kernel threads| > ∑|application threads|
  • Diverse and unknown workloads

OS kernel scalability

slide-13
SLIDE 13

Linux scalability

OSDI '10

Bonsai VM

ASPLOS '12

RadixVM

EuroSys '13

Corey

OSDI '08 2008 2009 2010 2011 2012 2013 2014

Current approach to scalable software development

slide-14
SLIDE 14

Linux scalability

OSDI '10

Bonsai VM

ASPLOS '12

RadixVM

EuroSys '13

Corey

OSDI '08 2008 2009 2010 2011 2012 2013 2014

Workload

Current approach to scalable software development

slide-15
SLIDE 15

Linux scalability

OSDI '10

Bonsai VM

ASPLOS '12

RadixVM

EuroSys '13

Corey

OSDI '08 2008 2009 2010 2011 2012 2013 2014

Workload Plot scalability

Current approach to scalable software development

slide-16
SLIDE 16

Linux scalability

OSDI '10

Bonsai VM

ASPLOS '12

RadixVM

EuroSys '13

Corey

OSDI '08 2008 2009 2010 2011 2012 2013 2014

Workload Plot scalability Differential profile

x()

Current approach to scalable software development

slide-17
SLIDE 17

Linux scalability

OSDI '10

Bonsai VM

ASPLOS '12

RadixVM

EuroSys '13

Corey

OSDI '08 2008 2009 2010 2011 2012 2013 2014

Workload Plot scalability Differential profile Fix top bottleneck

x() +++

Current approach to scalable software development

slide-18
SLIDE 18

Linux scalability

OSDI '10

Bonsai VM

ASPLOS '12

RadixVM

EuroSys '13

Corey

OSDI '08 2008 2009 2010 2011 2012 2013 2014

Workload Plot scalability Differential profile Fix top bottleneck

x() +++

Current approach to scalable software development

slide-19
SLIDE 19

Successful in practice because it focuses developer effort Disadvantages

  • Requires huge amounts of effort
  • New workloads expose new bottlenecks
  • More cores expose new bottlenecks
  • The real bottlenecks may be in the interface design

Current approach to scalable software development

slide-20
SLIDE 20

Successful in practice because it focuses developer effort Disadvantages

  • Requires huge amounts of effort
  • New workloads expose new bottlenecks
  • More cores expose new bottlenecks
  • The real bottlenecks may be in the interface design

Current approach to scalable software development

slide-21
SLIDE 21

creat("x") creat("y") creat("z")

Interface scalability example

slide-22
SLIDE 22

creat("x") creat("y") creat("z")

stdin stdout stderr

Interface scalability example

slide-23
SLIDE 23

creat("x") creat("y") creat("z")

stdin stdout stderr

Solution: Change the interface?

Interface scalability example

slide-24
SLIDE 24

creat("x") creat("y") creat("z")

stdin stdout stderr

Solution: Change the interface?

Interface scalability example

slide-25
SLIDE 25

Whenever interface operations commute, they can be implemented in a way that scales. The scalable commutativity rule

Approach: Interface-driven scalability

slide-26
SLIDE 26

Whenever interface operations commute, they can be implemented in a way that scales. The scalable commutativity rule ?

creat with lowest FD Commutes Scalable implementation exists

Approach: Interface-driven scalability

slide-27
SLIDE 27

Whenever interface operations commute, they can be implemented in a way that scales. The scalable commutativity rule ?

creat with lowest FD Commutes Scalable implementation exists creat → 3 creat → 4

Approach: Interface-driven scalability

slide-28
SLIDE 28

Whenever interface operations commute, they can be implemented in a way that scales. The scalable commutativity rule

creat with lowest FD Commutes Scalable implementation exists

Approach: Interface-driven scalability

slide-29
SLIDE 29

Whenever interface operations commute, they can be implemented in a way that scales. The scalable commutativity rule

creat with lowest FD Commutes Scalable implementation exists

✗ ?

creat with any FD creat → 42 creat → 17

Approach: Interface-driven scalability

slide-30
SLIDE 30

Whenever interface operations commute, they can be implemented in a way that scales. The scalable commutativity rule

creat with lowest FD Commutes Scalable implementation exists

creat with any FD

✓ ✓

rule

Approach: Interface-driven scalability

slide-31
SLIDE 31

Design Implement Test The rule enables reasoning about scalability throughout the software design process Guides design of scalable interfaces Sets a clear implementation target Systematic, workload-independent scalability testing

Advantages of interface-driven scalability

slide-32
SLIDE 32

The scalable commutativity rule

  • Formalization of the rule and proof of its correctness
  • State-dependent, interface-based commutativity

Commuter: An automated scalability testing tool sv6: A scalable POSIX-like kernel

Contributions

slide-33
SLIDE 33

Defining the rule

  • Definition of scalability
  • Intuition
  • Formalization

Applying the rule

  • Commuter
  • Evaluation

Outline

slide-34
SLIDE 34

5 10 15 20 25 30 35 40 1 6 12 18 24 30 36 42 48 Normalized throughput Cores gmake Exim

A scalability bottleneck

slide-35
SLIDE 35

5 10 15 20 25 30 35 40 1 6 12 18 24 30 36 42 48 Normalized throughput Cores gmake Exim

One contended cache line

A single contended cache line can wreck scalability

A scalability bottleneck

slide-36
SLIDE 36

1.5k 2.5k 3.5k 500 1k 2k 3k 1 10 20 30 40 50 60 70 80 Cycles to read 1 writer + N readers

Cost of a contended cache line

slide-37
SLIDE 37

1.5k 2.5k 3.5k 500 1k 2k 3k 1 10 20 30 40 50 60 70 80 Cycles to read 1 writer + N readers

  • pen

Cost of a contended cache line

slide-38
SLIDE 38

✗ ✗ ✗ Core X Core Y W R

  • W

R

✓ ✓

What scales on today's multicores?

slide-39
SLIDE 39

✗ ✗ ✗ Core X Core Y W R

  • W

R

✓ ✓

✓ ✓

What scales on today's multicores?

slide-40
SLIDE 40

✗ ✗ ✗ Core X Core Y W R

  • W

R

✓ ✓

✓ ✗

What scales on today's multicores?

slide-41
SLIDE 41

✗ ✗ ✗ Core X Core Y W R

  • W

R

✓ ✓

✓ We say two or more operations are scalable if they are conflict-free.

Good approximation of current hardware.

What scales on today's multicores?

slide-42
SLIDE 42

Whenever interface operations commute, they can be implemented in a way that scales. Operations commute results independent of order communication is unnecessary without communication, no conflicts ⇒ ⇒ ⇒

The intuition behind the rule

slide-43
SLIDE 43

T1 T2 T3 T4 T5 iszero() → F iszero() → F dec() → 2 dec() → 1 dec() → 0

Example: Reference counter

slide-44
SLIDE 44

T1 T2 T3 T4 T5 iszero() → F iszero() → F dec() → 2 dec() → 1 dec() → 0 R1

Example: Reference counter

slide-45
SLIDE 45

T1 T2 T3 T4 T5 iszero() → F iszero() → F dec() → 2 dec() → 1 dec() → 0 R1 ✓ R1 commutes; conflict-free implementation: shared counter

Example: Reference counter

slide-46
SLIDE 46

T1 T2 T3 T4 T5 iszero() → F iszero() → F dec() → 2 dec() → 1 dec() → 0 R1 R2 ✓ R1 commutes; conflict-free implementation: shared counter

Example: Reference counter

slide-47
SLIDE 47

T1 T2 T3 T4 T5 iszero() → F iszero() → F dec() → 2 dec() → 1 dec() → 0 R1 R2 ✓ R1 commutes; conflict-free implementation: shared counter ✗ R2 does not commute because dec() returns counter value

Example: Reference counter

slide-48
SLIDE 48

T1 T2 T3 T4 T5 iszero() → F iszero() → F dec() → 2 dec() → 1 dec() → 0 R1

  • k
  • k
  • k

R2' ✓ R1 commutes; conflict-free implementation: shared counter ✗ R2 does not commute because dec() returns counter value

Example: Reference counter

slide-49
SLIDE 49

T1 T2 T3 T4 T5 iszero() → F iszero() → F dec() → 2 dec() → 1 dec() → 0 R1

  • k
  • k
  • k

R2' ✓ R1 commutes; conflict-free implementation: shared counter ✗ R2 does not commute because dec() returns counter value ✓ R2' does commute; conflict-free implementation: per-core counter

Example: Reference counter

slide-50
SLIDE 50

T1 T2 T3 T4 T5 iszero() → F iszero() → F dec() → 2 dec() → 1 dec() → 0 R1

  • k
  • k
  • k

R2' ✓ R1 commutes; conflict-free implementation: shared counter ✗ R2 does not commute because dec() returns counter value ✓ R2' does commute; conflict-free implementation: per-core counter R3

Example: Reference counter

slide-51
SLIDE 51

T1 T2 T3 T4 T5 iszero() → F iszero() → F dec() → 2 dec() → 1 dec() → 0 R1

  • k
  • k
  • k

R2' ✓ R1 commutes; conflict-free implementation: shared counter ✗ R2 does not commute because dec() returns counter value ✓ R2' does commute; conflict-free implementation: per-core counter R3 ✓ R3 depends on state ✓ Initial value > 3 ✗ Initial value ≤ 3

Example: Reference counter

slide-52
SLIDE 52

T1 T2 T3 T4 T5 iszero() → F iszero() → F dec() → 2 dec() → 1 dec() → 0 R1

  • k
  • k
  • k

R2' ✓ R1 commutes; conflict-free implementation: shared counter ✗ R2 does not commute because dec() returns counter value ✓ R2' does commute; conflict-free implementation: per-core counter R3 ✓ R3 depends on state ✓ Initial value > 3 ✗ Initial value ≤ 3

Example: Reference counter

slide-53
SLIDE 53

Definitions

  • History
  • Reordering
  • Commutativity

Formal scalable commutativty rule

Formalizing the rule

slide-54
SLIDE 54

A history H is a sequence of invocations and responses on threads. inc()

  • k iszero()

T T1 inc() iszero()

  • k

T T1 T2

Histories capture state and arguments

slide-55
SLIDE 55

A history H is a sequence of invocations and responses on threads. inc()

  • k iszero()

T T1 inc() iszero()

  • k

T T1 T2 A specification 𝒯 defines an interface. 𝒯 is the set of legal histories giving the allowed behavior of an interface. [Herlihy & Wing, '90]

Legal history Illegal history

Histories capture state and arguments

slide-56
SLIDE 56

A history H is a sequence of invocations and responses on threads. inc()

  • k iszero()

T T1 inc() iszero()

  • k

T T1 T2 A specification 𝒯 defines an interface. 𝒯 is the set of legal histories giving the allowed behavior of an interface. [Herlihy & Wing, '90]

Legal history Illegal history

Lets us talk about interfaces, arguments, and state without specifying an implementation or a state representation.

Histories capture state and arguments

slide-57
SLIDE 57

A reordering H' is a permutation of H that maintains operation

  • rder for each individual thread (H|t = H'|t for all t).

Reorderings

slide-58
SLIDE 58

A reordering H' is a permutation of H that maintains operation

  • rder for each individual thread (H|t = H'|t for all t).

inc() iszero()

  • k

T T1 T2 inc() iszero()

  • k

T T1 T2 iszero() inc() T

  • k

T1 T2 inc()

  • k

T1 T2 iszero() T

Reorderings

slide-59
SLIDE 59

A reordering H' is a permutation of H that maintains operation

  • rder for each individual thread (H|t = H'|t for all t).

inc() iszero()

  • k

T T1 T2 inc() iszero()

  • k

T T1 T2 iszero() inc() T

  • k

T1 T2 inc()

  • k

T1 T2 iszero() T

Reorderings

slide-60
SLIDE 60

A region Y of a legal history XY SIM-commutes if every reordering Y' of Y also yields a legal history and every legal extension Z of XY is also a legal extension of XY'. (And this must be true for every prefix of every reordering of Y.)

Commutativity

slide-61
SLIDE 61

T1 T2 A region Y of a legal history XY SIM-commutes if every reordering Y' of Y also yields a legal history and every legal extension Z of XY is also a legal extension of XY'. (And this must be true for every prefix of every reordering of Y.) I3() R3 I4() R4 Y

Commutativity

slide-62
SLIDE 62

T1 T2 A region Y of a legal history XY SIM-commutes if every reordering Y' of Y also yields a legal history and every legal extension Z of XY is also a legal extension of XY'. (And this must be true for every prefix of every reordering of Y.) I3() R3 I4() R4 Y I1() I2() R1 R2 X

Commutativity

slide-63
SLIDE 63

T1 T2 A region Y of a legal history XY SIM-commutes if every reordering Y' of Y also yields a legal history and every legal extension Z of XY is also a legal extension of XY'. (And this must be true for every prefix of every reordering of Y.) I3() R3 I4() R4 Y I1() I2() R1 R2 X

Commutativity

slide-64
SLIDE 64

T1 T2 A region Y of a legal history XY SIM-commutes if every reordering Y' of Y also yields a legal history and every legal extension Z of XY is also a legal extension of XY'. (And this must be true for every prefix of every reordering of Y.) I3() R3 I4() R4 Y I1() I2() R1 R2 X I5() I6() R5 R6 Z

Commutativity

slide-65
SLIDE 65

Let 𝒯 be a specification with a reference implementation M. Consider a history XY where Y commutes in XY and M can generate XY. There exists a correct implementation of 𝒯 whose execution of XY is conflict-free in the commutative region Y.

The formal scalable commutativity rule

slide-66
SLIDE 66

Let 𝒯 be a specification with a reference implementation M. Consider a history XY where Y commutes in XY and M can generate XY. There exists a correct implementation of 𝒯 whose execution of XY is conflict-free in the commutative region Y. M' X Emulate M Y Emulate M

The formal scalable commutativity rule

slide-67
SLIDE 67

Let 𝒯 be a specification with a reference implementation M. Consider a history XY where Y commutes in XY and M can generate XY. There exists a correct implementation of 𝒯 whose execution of XY is conflict-free in the commutative region Y. M' X Emulate M Y Emulate M

The formal scalable commutativity rule

slide-68
SLIDE 68

Let 𝒯 be a specification with a reference implementation M. Consider a history XY where Y commutes in XY and M can generate XY. There exists a correct implementation of 𝒯 whose execution of XY is conflict-free in the commutative region Y. M' X Emulate M Y Emulate M

The formal scalable commutativity rule

slide-69
SLIDE 69

Applying the rule to real systems

slide-70
SLIDE 70

Commuter

Applying the rule to real systems

slide-71
SLIDE 71

Interface specification (e.g., POSIX) Implementation (e.g., Linux) All scalability bottlenecks Commuter

Applying the rule to real systems

slide-72
SLIDE 72

SymInode = tstruct(data = tlist(SymByte), nlink = SymInt) SymIMap = tdict(SymInt, SymInode) SymFilename = tuninterpreted('Filename') SymDir = tdict(SymFilename, SymInt) class POSIX: def __init__(self): self.fname_to_inum = SymDir.any() self.inodes = SymIMap.any() @symargs(src=SymFilename, dst=SymFilename) def rename(self, src, dst): if src not in self.fname_to_inum: return (-1, errno.ENOENT) if src == dst: return 0 if dst in self.fname_to_inum: self.inodes[self.fname_to_inum[dst]].nlink -= 1 self.fname_to_inum[dst] = self.fname_to_inum[src] del self.fname_to_inum[src] return 0

Symbolic model

Input: Symbolic model

slide-73
SLIDE 73

Important to have discriminating commutativity conditions

  • ∀states, rename almost never commutes
  • More commutative cases ⇒ more opportunities to scale
  • Captures more operations applications actually do

rename(a, b) and rename(c, d) commute if:

  • Both source files exist and all names are different
  • Neither source file exists
  • a xor c exists, and it is not the other rename's destination
  • Both calls are self-renames
  • One call is a self-rename of an existing file and a ≠ c
  • a and c are hard links to the same inode, a ≠ c, and b = d

def __init__(self): self.fname_to_inum = SymDir.any() self.inodes = SymIMap.any() @symargs(src=SymFilename, dst=SymFilename) def rename(self, src, dst): if src not in self.fname_to_inum: return (-1, errno.ENOENT) if src == dst: return 0 if dst in self.fname_to_inum: self.inodes[self.fname_to_inum[dst]].nlink -= 1 self.fname_to_inum[dst] = self.fname_to_inum[src] del self.fname_to_inum[src] return 0

Symbolic model Analyzer Commutativity conditions

Commutativity conditions

slide-74
SLIDE 74

Important to have discriminating commutativity conditions

  • ∀states, rename almost never commutes
  • More commutative cases ⇒ more opportunities to scale
  • Captures more operations applications actually do

rename(a, b) and rename(c, d) commute if:

  • Both source files exist and all names are different
  • Neither source file exists
  • a xor c exists, and it is not the other rename's destination
  • Both calls are self-renames
  • One call is a self-rename of an existing file and a ≠ c
  • a and c are hard links to the same inode, a ≠ c, and b = d

if src not in self.fname_to_inum: return (-1, errno.ENOENT) if src == dst: return 0 if dst in self.fname_to_inum: self.inodes[self.fname_to_inum[dst]].nlink -= 1 self.fname_to_inum[dst] = self.fname_to_inum[src] del self.fname_to_inum[src] return 0

Symbolic model Analyzer Commutativity conditions

Commutativity conditions

slide-75
SLIDE 75

Important to have discriminating commutativity conditions

  • ∀states, rename almost never commutes
  • More commutative cases ⇒ more opportunities to scale
  • Captures more operations applications actually do

rename(a, b) and rename(c, d) commute if:

  • Both source files exist and all names are different
  • Neither source file exists
  • a xor c exists, and it is not the other rename's destination
  • Both calls are self-renames
  • One call is a self-rename of an existing file and a ≠ c
  • a and c are hard links to the same inode, a ≠ c, and b = d

if src not in self.fname_to_inum: return (-1, errno.ENOENT) if src == dst: return 0 if dst in self.fname_to_inum: self.inodes[self.fname_to_inum[dst]].nlink -= 1 self.fname_to_inum[dst] = self.fname_to_inum[src] del self.fname_to_inum[src] return 0

Symbolic model Analyzer Commutativity conditions

Commutativity conditions

slide-76
SLIDE 76

Important to have discriminating commutativity conditions

  • ∀states, rename almost never commutes
  • More commutative cases ⇒ more opportunities to scale
  • Captures more operations applications actually do

rename(a, b) and rename(c, d) commute if:

  • Both source files exist and all names are different
  • Neither source file exists
  • a xor c exists, and it is not the other rename's destination
  • Both calls are self-renames
  • One call is a self-rename of an existing file and a ≠ c
  • a and c are hard links to the same inode, a ≠ c, and b = d

if src not in self.fname_to_inum: return (-1, errno.ENOENT) if src == dst: return 0 if dst in self.fname_to_inum: self.inodes[self.fname_to_inum[dst]].nlink -= 1 self.fname_to_inum[dst] = self.fname_to_inum[src] del self.fname_to_inum[src] return 0

Symbolic model Analyzer Commutativity conditions

Commutativity conditions

slide-77
SLIDE 77

Important to have discriminating commutativity conditions

  • ∀states, rename almost never commutes
  • More commutative cases ⇒ more opportunities to scale
  • Captures more operations applications actually do

rename(a, b) and rename(c, d) commute if:

  • Both source files exist and all names are different
  • Neither source file exists
  • a xor c exists, and it is not the other rename's destination
  • Both calls are self-renames
  • One call is a self-rename of an existing file and a ≠ c
  • a and c are hard links to the same inode, a ≠ c, and b = d

def __init__(self): self.fname_to_inum = SymDir.any() self.inodes = SymIMap.any() @symargs(src=SymFilename, dst=SymFilename) def rename(self, src, dst): if src not in self.fname_to_inum: return (-1, errno.ENOENT) if src == dst: return 0 if dst in self.fname_to_inum: self.inodes[self.fname_to_inum[dst]].nlink -= 1 self.fname_to_inum[dst] = self.fname_to_inum[src] del self.fname_to_inum[src] return 0

Symbolic model Analyzer Commutativity conditions

Commutativity conditions

slide-78
SLIDE 78

Symbolic model Analyzer Commutativity conditions Testgen Test cases rename(a, b) and rename(c, d) commute if:

  • Both source files exist and all names are different
  • Neither source file exists
  • a xor c exists, and it is not the other rename's destination
  • Both calls are self-renames
  • One call is a self-rename of an existing file and a ≠ c
  • a and c are hard links to the same inode, a ≠ c, and b = d

del self.fname_to_inum[src] return 0

void setup() { close(creat("f0", 0666)); close(creat("f2", 0666)); } void test_opA() { rename("f0", "f1"); } void test_opB() { rename("f2", "f3"); }

+ 26 more

Test cases

slide-79
SLIDE 79

Symbolic model Analyzer Commutativity conditions Testgen Test cases Linux Conflicting cache lines Mtrace/QEMU

  • One call is a self-rename of an existing file and a ≠ c
  • a and c are hard links to the same inode, a ≠ c, and b == d

void setup() { close(creat("f0", 0666)); close(creat("f2", 0666)); } void test_opA() { rename("f0", "f1"); } void test_opB() { rename("f2", "f3"); }

test_opA test_opB

010100010111001110010110011010101010101 d_entry.d_lock inode_cache +17 more conflicts

Output: Conflicting cache lines

slide-80
SLIDE 80

Does the rule help build scalable systems?

Evaluation

slide-81
SLIDE 81

(Linux 3.8, ramfs)

  • pen

link unlink rename stat fstat lseek close pipe read write pread pwrite mmap munmap mprotect memread memwrite memwrite memread mprotect munmap mmap pwrite pread write read pipe close lseek fstat stat rename unlink link

  • pen

All tests conflict-free All tests conflicted

13,664 total test cases 68% are conflict-free

Commuter finds non-scalable cases in Linux

slide-82
SLIDE 82

(Linux 3.8, ramfs)

  • pen

link unlink rename stat fstat lseek close pipe read write pread pwrite mmap munmap mprotect memread memwrite memwrite memread mprotect munmap mmap pwrite pread write read pipe close lseek fstat stat rename unlink link

  • pen

All tests conflict-free All tests conflicted

13,664 total test cases 68% are conflict-free Directory-wide locking File descriptor reference counts Address space-wide locking

Commuter finds non-scalable cases in Linux

slide-83
SLIDE 83

(Linux 3.8, ramfs)

  • pen

link unlink rename stat fstat lseek close pipe read write pread pwrite mmap munmap mprotect memread memwrite memwrite memread mprotect munmap mmap pwrite pread write read pipe close lseek fstat stat rename unlink link

  • pen

All tests conflict-free All tests conflicted

13,664 total test cases 68% are conflict-free Many potential future bottlenecks

Commuter finds non-scalable cases in Linux

slide-84
SLIDE 84

POSIX-like operating system File system and virtual memory system follow commutativity rule Implementation using standard parallel programming techniques, but guided by Commuter

sv6: A scalable OS

slide-85
SLIDE 85
  • pen

link unlink rename stat fstat lseek close pipe read write pread pwrite mmap munmap mprotect memread memwrite memwrite memread mprotect munmap mmap pwrite pread write read pipe close lseek fstat stat rename unlink link

  • pen

All tests conflict-free All tests conflicted

Zero cache lines shared

13,664 total test cases 99% are conflict-free Remaining 1% are mostly "idempotent updates"

Commutative operations can be made to scale

slide-86
SLIDE 86
  • pen

link unlink rename stat fstat lseek close pipe read write pread pwrite mmap munmap mprotect memread memwrite memwrite memread mprotect munmap mmap pwrite pread write read pipe close lseek fstat stat rename unlink link

  • pen

All tests conflict-free All tests conflicted

Zero cache lines shared

13,664 total test cases 99% are conflict-free Remaining 1% are mostly "idempotent updates"

Two pwrites of same data to same offset Two lseeks of same FD to the same offset

Commutative operations can be made to scale

slide-87
SLIDE 87
  • Lowest FD versus any FD
  • stat versus xstat
  • Unordered sockets
  • Delayed munmap
  • fork+exec versus posix_spawn

Refining POSIX with the rule

slide-88
SLIDE 88

qmail-like multithreaded mail server

Non-commutative APIs: Lowest FD Ordered sockets fork+exec

10k 20k 30k 40k 50k 60k 70k 1 10 20 30 40 50 60 70 80 Total emails/sec # cores

Commutative operations matter to app scalabiliy

slide-89
SLIDE 89

qmail-like multithreaded mail server

Non-commutative APIs: Lowest FD Ordered sockets fork+exec

10k 20k 30k 40k 50k 60k 70k 1 10 20 30 40 50 60 70 80 Total emails/sec # cores

Commutative APIs: Any FD Unordered sockets posix_spawn

Commutative operations matter to app scalabiliy

slide-90
SLIDE 90

Commutativity and concurrency

  • [Bernstein '81]
  • [Weihl '88]
  • [Steele '90]
  • [Rinard '97]
  • [Shapiro '11]

Laws of Order [Attiya '11] Disjoint-access parallelism [Israeli '94] Scalable locks [MCS '91] Scalable reference counting [Ellen '07, Corbet '10]

Related work

slide-91
SLIDE 91

Whenever interface operations commute, they can be implemented in a way that scales.

Design Implement Test

Conclusion

slide-92
SLIDE 92

Whenever interface operations commute, they can be implemented in a way that scales.

Design Implement Test

Check out the code at http://pdos.csail.mit.edu/commuter

Conclusion