Optimizing JavaScript Filip Pizlo Apple Untyped Objects are - - PowerPoint PPT Presentation

optimizing javascript
SMART_READER_LITE
LIVE PREVIEW

Optimizing JavaScript Filip Pizlo Apple Untyped Objects are - - PowerPoint PPT Presentation

Optimizing JavaScript Filip Pizlo Apple Untyped Objects are hashtables Functions are objects var scale = 1.2; function foo(o) { return scale * Math.sqrt(o.x * o.x + o.y * o.y); } for (var i = 0; i < 100; ++i) print(foo({x:1.5,


slide-1
SLIDE 1

Optimizing JavaScript

Filip Pizlo Apple

slide-2
SLIDE 2
  • Untyped
  • Objects are hashtables
  • Functions are objects
slide-3
SLIDE 3

var scale = 1.2; function foo(o) { return scale * Math.sqrt(o.x * o.x + o.y * o.y); } for (var i = 0; i < 100; ++i) print(foo({x:1.5, y:2.5}));

slide-4
SLIDE 4

History

  • Smalltalk
  • Deutsch and Schiffman POPL’84
  • Self
  • Smith and Ungar OOPSLA’87
  • Holze, Chambers, Ungar ECOOP’91
  • widely used in JavaScript
  • many, many more recent papers
slide-5
SLIDE 5
slide-6
SLIDE 6
  • WebKit open source

project

  • JavaScriptCore virtual

machine

  • www.webkit.org
slide-7
SLIDE 7

Parser + Bytecode Generator + Cache

slide-8
SLIDE 8

Low Level Interpreter

“Instant on”

Parser + Bytecode Generator + Cache

slide-9
SLIDE 9

Baseline JIT Low Level Interpreter

“Instant on” Fast compile

Parser + Bytecode Generator + Cache

slide-10
SLIDE 10

Baseline JIT Low Level Interpreter OSR

“Instant on” Fast compile

Parser + Bytecode Generator + Cache

slide-11
SLIDE 11

Baseline JIT Optimizing JIT Low Level Interpreter OSR

“Instant on” Fast compile Throughput

Parser + Bytecode Generator + Cache

slide-12
SLIDE 12

Baseline JIT Optimizing JIT OSR Low Level Interpreter OSR

“Instant on” Fast compile Throughput

Parser + Bytecode Generator + Cache

slide-13
SLIDE 13

Baseline JIT Optimizing JIT OSR Low Level Interpreter OSR

“Instant on” Fast compile Throughput

Parser + Bytecode Generator + Cache

slide-14
SLIDE 14

Bytecode Parser Prediction Propagation Type Check Hoisting CFA Simplify CSE Code Generation

slide-15
SLIDE 15
slide-16
SLIDE 16
  • Martin Richards’ PL benchmark
slide-17
SLIDE 17
  • Martin Richards’ PL benchmark
  • C & Java: 1.2ms
slide-18
SLIDE 18
  • Martin Richards’ PL benchmark
  • C & Java: 1.2ms
  • Simple JS interpreter: 129ms
slide-19
SLIDE 19
  • Martin Richards’ PL benchmark
  • C & Java: 1.2ms
  • Simple JS interpreter: 129ms
  • Low Level Interpreter: 58ms
slide-20
SLIDE 20
  • Martin Richards’ PL benchmark
  • C & Java: 1.2ms
  • Simple JS interpreter: 129ms
  • Low Level Interpreter: 58ms
  • Baseline JIT: 8.4ms
slide-21
SLIDE 21
  • Martin Richards’ PL benchmark
  • C & Java: 1.2ms
  • Simple JS interpreter: 129ms
  • Low Level Interpreter: 58ms
  • Baseline JIT: 8.4ms
  • Optimizing JIT: 2.1ms
slide-22
SLIDE 22
  • 1. Profile
  • 2. Predict
  • 3. Prove
slide-23
SLIDE 23

var scale = 1.2; function foo(o) { return scale * Math.sqrt(o.x * o.x + o.y * o.y); } for (var i = 0; i < 100; ++i) print(foo({x:1.5, y:2.5}));

slide-24
SLIDE 24
  • .x * o.x + o.y * o.y
slide-25
SLIDE 25
  • .x

.x .y .y * * +

  • .x * o.x + o.y * o.y
slide-26
SLIDE 26
  • .x

.x .y .y * * +

  • .x * o.x + o.y * o.y

pure pure pure

slide-27
SLIDE 27
  • .x

.x .y .y * * +

  • .x * o.x + o.y * o.y

pure heap pure pure

slide-28
SLIDE 28

Profile

  • Heap
  • Arguments
  • Call returns
slide-29
SLIDE 29

void JIT::emit_op_get_by_id(Instruction* currentInstruction) { unsigned resultVReg = currentInstruction[1].u.operand; unsigned baseVReg = currentInstruction[2].u.operand; Identifier* ident = &(m_codeBlock-> identifier(currentInstruction[3].u.operand)); emitGetVirtualRegister(baseVReg, regT0); compileGetByIdHotPath(baseVReg, ident); emitValueProfilingSite(); emitPutVirtualRegister(resultVReg); }

JITPropertyAccess.cpp

slide-30
SLIDE 30
  • Unpredictable values are profiled.
  • Every ~1000 executions of a function, a

bounding type is computed for each profile.

var x = o.f;

  • ValueProfile
slide-31
SLIDE 31
  • Unpredictable values are profiled.
  • Every ~1000 executions of a function, a

bounding type is computed for each profile.

var x = o.f;

  • ValueProfile
slide-32
SLIDE 32
  • Unpredictable values are profiled.
  • Every ~1000 executions of a function, a

bounding type is computed for each profile.

var x = o.f;

  • ValueProfile

5

slide-33
SLIDE 33
  • Unpredictable values are profiled.
  • Every ~1000 executions of a function, a

bounding type is computed for each profile.

var x = o.f;

  • ValueProfile

5 0.5

slide-34
SLIDE 34
  • Unpredictable values are profiled.
  • Every ~1000 executions of a function, a

bounding type is computed for each profile.

var x = o.f;

  • ValueProfile

5 0.5 7

slide-35
SLIDE 35
  • Unpredictable values are profiled.
  • Every ~1000 executions of a function, a

bounding type is computed for each profile.

var x = o.f;

  • ValueProfile

5 0.5 7 Int32

slide-36
SLIDE 36
  • Unpredictable values are profiled.
  • Every ~1000 executions of a function, a

bounding type is computed for each profile.

var x = o.f;

  • ValueProfile

5 0.5 7 Int32 4.5

slide-37
SLIDE 37
  • Unpredictable values are profiled.
  • Every ~1000 executions of a function, a

bounding type is computed for each profile.

var x = o.f;

  • ValueProfile

5 0.5 7 Int32 4.5 9.5

slide-38
SLIDE 38
  • Unpredictable values are profiled.
  • Every ~1000 executions of a function, a

bounding type is computed for each profile.

var x = o.f;

  • ValueProfile

5 0.5 7 Int32 4.5 9.5 10.1

slide-39
SLIDE 39
  • Unpredictable values are profiled.
  • Every ~1000 executions of a function, a

bounding type is computed for each profile.

var x = o.f;

  • ValueProfile

5 0.5 7 Int32 4.5 9.5 10.1 Int32 ∪ Double

slide-40
SLIDE 40

Predict

  • Heap: type that bounds all values seen
  • Pure: abstract interpretation
slide-41
SLIDE 41

case ArithMul: { SpeculatedType left = node->child1()->prediction(); SpeculatedType right = node->child2()->prediction(); if (left && right) { if (isInt32(left) && isInt32(right)) changed |= mergePrediction(SpecInt32); else changed |= mergePrediction(SpecDouble); }

DFGPredictionPropagationPhase.cpp

(roughly)

slide-42
SLIDE 42

Prove

slide-43
SLIDE 43
  • Code size reduction
  • Type propagation

ArithMul will spec-fail if its operands are not numbers.

slide-44
SLIDE 44

We know that an ArithMul that is predicted double will always produce a double.

. . . c: ArithMul(@a, @b) . . .

slide-45
SLIDE 45

We know that an ArithMul that is predicted double will always produce a double.

. . . c: ArithMul(@a, @b) . . .

know nothing about a, b

slide-46
SLIDE 46

We know that an ArithMul that is predicted double will always produce a double.

. . . c: ArithMul(@a, @b) . . .

know nothing about a, b know that a, b, c must be double

slide-47
SLIDE 47

[ 61] mul r5, r5, r6 0x10b05169c: mov %rax, %rdx 0x10b05169f: mov 0x28(%r13), %rax 0x10b0516a3: cmp %r14, %rax 0x10b0516a6: jb 0x10b051b1b 0x10b0516ac: cmp %r14, %rdx 0x10b0516af: jb 0x10b051b47 0x10b0516b5: mov %rax, %rcx 0x10b0516b8: imul %edx, %ecx 0x10b0516bb: jo 0x10b051ada 0x10b0516c1: test %ecx, %ecx 0x10b0516c3: jnz 0x10b0516ee 0x10b0516c9: cmp $0x0, %eax 0x10b0516cc: jl 0x10b0516db 0x10b0516d2: cmp $0x0, %edx 0x10b0516d5: jge 0x10b0516ee 0x10b0516db: mov $0x10af99bfc, %r11 0x10b0516e5: add $0x1, (%r11) 0x10b0516e9: jmp 0x10b051ada 0x10b0516ee: mov %rcx, %rax 0x10b0516f1: or %r14, %rax 0x10b0516f4: mov %rax, 0x28(%r13)

slide-48
SLIDE 48

28: <!1:3> ArithMul(d@23<Double>, d@23<Double>, Number|MustGen|CanExit, bc#61) 0x10b051dff: cmp %r14, %rcx 0x10b051e02: jae 0x10b051e21 0x10b051e08: test %rcx, %r14 0x10b051e0b: jz 0x10b051f5c 0x10b051e11: mov %rcx, %rax 0x10b051e14: add %r14, %rax 0x10b051e17: movd %rax, %xmm0 0x10b051e1c: jmp 0x10b051e25 0x10b051e21: cvtsi2sd %ecx, %xmm0 0x10b051e25: movsd %xmm0, %xmm2 0x10b051e29: mulsd %xmm0, %xmm2

spec fail

slide-49
SLIDE 49

OSR exit

slide-50
SLIDE 50

OSR exit

  • p_add

Bytecode

slide-51
SLIDE 51

OSR exit

  • p_add

Bytecode Baseline

mov 0x0(%r13), %rax mov -0x40(%r13), %rdx cmp %r14, %rax jb <slow path> cmp %r14, %rdx jb <slow path> add %edx, %eax jo <slow path>

  • r %r14, %rax

mov %rax, 0x8(%r13)

slide-52
SLIDE 52

OSR exit

add %ecx, %edx jo <exit>

  • p_add

Bytecode Optimized Baseline

mov 0x0(%r13), %rax mov -0x40(%r13), %rdx cmp %r14, %rax jb <slow path> cmp %r14, %rdx jb <slow path> add %edx, %eax jo <slow path>

  • r %r14, %rax

mov %rax, 0x8(%r13)

slide-53
SLIDE 53

OSR exit

add %ecx, %edx jo <exit>

  • p_add

Bytecode Optimized Baseline

mov 0x0(%r13), %rax mov -0x40(%r13), %rdx cmp %r14, %rax jb <slow path> cmp %r14, %rdx jb <slow path> add %edx, %eax jo <slow path>

  • r %r14, %rax

mov %rax, 0x8(%r13)

slide-54
SLIDE 54

OSR exit

add %ecx, %edx jo <exit>

Optimized Baseline

mov 0x0(%r13), %rax mov -0x40(%r13), %rdx cmp %r14, %rax jb <slow path> cmp %r14, %rdx jb <slow path> add %edx, %eax jo <slow path>

  • r %r14, %rax

mov %rax, 0x8(%r13)

slide-55
SLIDE 55

mov 0x0(%r13), %rax mov -0x40(%r13), %rdx cmp %r14, %rax jb <slow path> cmp %r14, %rdx jb <slow path> add %edx, %eax jo <slow path>

  • r %r14, %rax

mov %rax, 0x8(%r13)

add %ecx, %edx jo <exit>

Optimized

OSR exit

Baseline

slide-56
SLIDE 56

mov 0x0(%r13), %rax mov -0x40(%r13), %rdx cmp %r14, %rax jb <slow path> cmp %r14, %rdx jb <slow path> add %edx, %eax jo <slow path>

  • r %r14, %rax

mov %rax, 0x8(%r13) sub %ecx, %edx

  • r %r14, %rdx

mov %rdx, 0x0(%r13) mov $0xa, %rax mov %rax, 0x8(%r13) mov $0x109f5a800, %r11 mov %r11, -0x8(%r13) mov 0x0(%r13), %rax mov $0x32fb420014b1, %rdx jmp %rdx

add %ecx, %edx jo <exit>

Optimized

OSR exit

Baseline

slide-57
SLIDE 57

mov 0x0(%r13), %rax mov -0x40(%r13), %rdx cmp %r14, %rax jb <slow path> cmp %r14, %rdx jb <slow path> add %edx, %eax jo <slow path>

  • r %r14, %rax

mov %rax, 0x8(%r13) sub %ecx, %edx

  • r %r14, %rdx

mov %rdx, 0x0(%r13) mov $0xa, %rax mov %rax, 0x8(%r13) mov $0x109f5a800, %r11 mov %r11, -0x8(%r13) mov 0x0(%r13), %rax mov $0x32fb420014b1, %rdx jmp %rdx

add %ecx, %edx jo <exit>

Optimized

OSR exit

Baseline

slide-58
SLIDE 58
  • Protect the main path
  • Record why we exited
  • Recompile with exponential backoff
slide-59
SLIDE 59
  • .x

.x .y .y * * +

  • .x * o.x + o.y * o.y
slide-60
SLIDE 60

[ 66] add r2, r4, r5 0x10b0516f8: mov %rax, %rdx 0x10b0516fb: mov 0x20(%r13), %rax 0x10b0516ff: cmp %r14, %rax 0x10b051702: jb 0x10b051bc0 0x10b051708: cmp %r14, %rdx 0x10b05170b: jb 0x10b051bda 0x10b051711: add %edx, %eax 0x10b051713: jo 0x10b051b7f 0x10b051719: or %r14, %rax 0x10b05171c: mov %rax, 0x10(%r13)

slide-61
SLIDE 61

30: <!1:3> ArithAdd(d@20<Double>, d@28<Double>, Number|MustGen, bc#66) 0x10b051e2d: addsd %xmm2, %xmm1

slide-62
SLIDE 62

Objects

slide-63
SLIDE 63

var o = new Object();

  • .f = 1;
  • .g = 2;
  • .h = 3;
slide-64
SLIDE 64

var o = new Object();

  • .f = 1;
  • .g = 2;
  • .h = 3;

S B data { }

S0:

slide-65
SLIDE 65

var o = new Object();

  • .f = 1;
  • .g = 2;
  • .h = 3;

S B data { }

S0:

{ f }

S1:

1

slide-66
SLIDE 66

var o = new Object();

  • .f = 1;
  • .g = 2;
  • .h = 3;

S B data { }

S0:

{ f }

S1:

{ f, g }

S2:

1 2

slide-67
SLIDE 67

var o = new Object();

  • .f = 1;
  • .g = 2;
  • .h = 3;

S B data { }

S0:

{ f }

S1:

{ f, g }

S2:

{ f, g, h }

S3:

1 2 3

slide-68
SLIDE 68

var o = new Object();

  • .f = 1;
  • .g = 2;
  • .h = 3;

S B data { }

S0:

{ f }

S1:

{ f, g }

S2:

{ f, g, h }

S3:

1 2 3 { f, x }

slide-69
SLIDE 69

var o = new Object();

  • .f = 1;
  • .g = 2;
  • .h = 3;

S B data { }

S0:

{ f }

S1:

{ f, g }

S2:

{ f, g, h }

S3:

1 2 3 { f, x } { f, g, y }

slide-70
SLIDE 70

var x = o.f; cmpq S3, (%rax) jne _slowPath movq 16(%rax), %rax

slide-71
SLIDE 71
  • .f = x;

cmpq S0, (%rax) jne _slowPath movq S1, (%rax) movq %rdx, 16(%rax)

slide-72
SLIDE 72

var scale = 1.2; function foo(o) { return scale * Math.sqrt(o.x * o.x + o.y * o.y); } for (var i = 0; i < 100; ++i) print(foo({x:1.5, y:2.5}));

slide-73
SLIDE 73

CheckStructure(@41<Final>, struct(0x108aec560)) 0x2ffd0ae01c78: mov $0x108aec560, %r11 0x2ffd0ae01c82: cmp %r11, (%rax) 0x2ffd0ae01c85: jnz 0x2ffd0ae01dee 15: GetByOffset(@41<Final>, JS, id3{x}, 2) 0x2ffd0ae01cc9: mov 0x10(%rax), %rbx 20: ArithMul(d@15<Double>, d@15<Double>)

slide-74
SLIDE 74
  • Untyped languages are cool
  • We optimized one of them
  • Now it runs faster
slide-75
SLIDE 75

function foo(o) { if (o.f && o.f() == 42) print("hello"); if (o.g && o.g() == 63) print("bye"); } function return42() { return 42; } function return63() { return 63; } for (var i = 0; i < 2000; ++i) foo({f:return42, g:return63});

slide-76
SLIDE 76

3: CheckStructure(GetLocal(arg1)) 7: Branch(WeakJSConstant(return42)) 11: CheckStructure(GetLocal(arg1))

  • -> return42

16: JSConstant(Int32: 42) 20: Branch(CompareEq(@16, @16)) 23: Watchpoint(WeakJSConstant(global)) 31: Call(WeakJSConstant(print), ...) 35: CheckStructure(GetLocal(arg1)) 39: Branch(WeakJSConstant(return42)) 43: CheckStructure(GetLocal(arg1))

  • -> return63

48: JSConstant(Int32: 63) 52: Branch(CompareEq(@16, @16)) 55: Watchpoint(WeakJSConstant(global)) 63: Call(WeakJSConstant(print), ...) 68: Return(JSConstant(Undefined))

After Bytecode Parsing

slide-77
SLIDE 77

3: CheckStructure(GetLocal(arg1)) 7: Branch(WeakJSConstant(return42)) 11: CheckStructure(GetLocal(arg1))

  • -> return42

16: JSConstant(Int32: 42) 20: Branch(CompareEq(@16, @16)) 23: Watchpoint(WeakJSConstant(global)) 31: Call(WeakJSConstant(print), ...) 35: CheckStructure(GetLocal(arg1)) 39: Branch(WeakJSConstant(return42)) 43: CheckStructure(GetLocal(arg1))

  • -> return63

48: JSConstant(Int32: 63) 52: Branch(CompareEq(@16, @16)) 55: Watchpoint(WeakJSConstant(global)) 63: Call(WeakJSConstant(print), ...) 68: Return(JSConstant(Undefined))

After Prediction Propagation

slide-78
SLIDE 78

71: CheckStructure(GetLocal(arg1)) 3: CheckStructure(GetLocal(arg1)) 7: Branch(WeakJSConstant(return42)) 11: CheckStructure(GetLocal(arg1))

  • -> return42

16: JSConstant(Int32: 42) 20: Branch(CompareEq(@16, @16)) 23: Watchpoint(WeakJSConstant(global)) 31: Call(WeakJSConstant(print), ...) 35: CheckStructure(GetLocal(arg1)) 39: Branch(WeakJSConstant(return42)) 43: CheckStructure(GetLocal(arg1))

  • -> return63

48: JSConstant(Int32: 63) 52: Branch(CompareEq(@16, @16)) 55: Watchpoint(WeakJSConstant(global)) 63: Call(WeakJSConstant(print), ...) 68: Return(JSConstant(Undefined))

After Type Check Hoisting

slide-79
SLIDE 79

71: CheckStructure(GetLocal(arg1)) 3: CheckStructure(GetLocal(arg1)) 7: Branch(WeakJSConstant(return42)) 11: CheckStructure(GetLocal(arg1))

  • -> return42

16: JSConstant(Int32: 42) 20: Branch(CompareEq(@16, @16)) 23: Watchpoint(WeakJSConstant(global)) 31: Call(WeakJSConstant(print), ...) 35: CheckStructure(GetLocal(arg1)) 39: Branch(WeakJSConstant(return42)) 43: CheckStructure(GetLocal(arg1))

  • -> return63

48: JSConstant(Int32: 63) 52: Branch(CompareEq(@16, @16)) 55: Watchpoint(WeakJSConstant(global)) 63: Call(WeakJSConstant(print), ...) 68: Return(JSConstant(Undefined))

After CFA & folding

slide-80
SLIDE 80

71: CheckStructure(GetLocal(arg1)) 3: CheckStructure(GetLocal(arg1)) 7: Branch(WeakJSConstant(return42)) 11: CheckStructure(GetLocal(arg1))

  • -> return42

16: JSConstant(Int32: 42) 20: Branch(CompareEq(@16, @16)) 23: Watchpoint(WeakJSConstant(global)) 31: Call(WeakJSConstant(print), ...) 35: CheckStructure(GetLocal(arg1)) 39: Branch(WeakJSConstant(return42)) 43: CheckStructure(GetLocal(arg1))

  • -> return63

48: JSConstant(Int32: 63) 52: Branch(CompareEq(@16, @16)) 55: Watchpoint(WeakJSConstant(global)) 63: Call(WeakJSConstant(print), ...) 68: Return(JSConstant(Undefined))

After CFA & folding

slide-81
SLIDE 81

71: CheckStructure(GetLocal(arg1)) 3: CheckStructure(GetLocal(arg1)) 7: Branch(WeakJSConstant(return42)) 11: CheckStructure(GetLocal(arg1))

  • -> return42

16: JSConstant(Int32: 42) 20: Branch(CompareEq(@16, @16)) 23: Watchpoint(WeakJSConstant(global)) 31: Call(WeakJSConstant(print), ...) 35: Watchpoint(GetLocal(arg1)) 39: Branch(WeakJSConstant(return42)) 43: CheckStructure(GetLocal(arg1))

  • -> return63

48: JSConstant(Int32: 63) 52: Branch(CompareEq(@16, @16)) 55: Watchpoint(WeakJSConstant(global)) 63: Call(WeakJSConstant(print), ...) 68: Return(JSConstant(Undefined))

After CFA & folding

slide-82
SLIDE 82

71: CheckStructure(GetLocal(arg1)) 3: CheckStructure(GetLocal(arg1)) 7: Branch(WeakJSConstant(return42)) 11: CheckStructure(GetLocal(arg1))

  • -> return42

16: JSConstant(Int32: 42) 20: Branch(JSConstant(True)) 23: Watchpoint(WeakJSConstant(global)) 31: Call(WeakJSConstant(print), ...) 35: Watchpoint(GetLocal(arg1)) 39: Branch(WeakJSConstant(return42)) 43: CheckStructure(GetLocal(arg1))

  • -> return63

48: JSConstant(Int32: 63) 52: Branch(JSConstant(True)) 55: Watchpoint(WeakJSConstant(global)) 63: Call(WeakJSConstant(print), ...) 68: Return(JSConstant(Undefined))

After CFA & folding

slide-83
SLIDE 83

71: CheckStructure(GetLocal(arg1)) 7: Branch(WeakJSConstant(return42)) 20: Branch(JSConstant(True)) 23: Watchpoint(WeakJSConstant(global)) 31: Call(WeakJSConstant(print), ...) 35: Watchpoint(GetLocal(arg1)) 39: Branch(WeakJSConstant(return42)) 52: Branch(JSConstant(True)) 55: Watchpoint(WeakJSConstant(global)) 63: Call(WeakJSConstant(print), ...) 68: Return(JSConstant(Undefined))

After CFA & folding

slide-84
SLIDE 84

71: CheckStructure(GetLocal(arg1)) 7: Branch(WeakJSConstant(return42)) 20: Branch(JSConstant(True)) 23: Watchpoint(WeakJSConstant(global)) 31: Call(WeakJSConstant(print), ...) 35: Watchpoint(GetLocal(arg1)) 39: Branch(WeakJSConstant(return42)) 52: Branch(JSConstant(True)) 55: Watchpoint(WeakJSConstant(global)) 63: Call(WeakJSConstant(print), ...) 68: Return(JSConstant(Undefined))

After CFG simplify

slide-85
SLIDE 85

71: CheckStructure(GetLocal(arg1)) 23: Watchpoint(WeakJSConstant(global)) 31: Call(WeakJSConstant(print), ...) 35: Watchpoint(GetLocal(arg1)) 55: Watchpoint(WeakJSConstant(global)) 63: Call(WeakJSConstant(print), ...) 68: Return(JSConstant(Undefined))

After CFG simplify