Nashorn War Stories (from a battle scarred veteran of invokedynamic ) - - PowerPoint PPT Presentation

nashorn war stories from a battle scarred veteran of
SMART_READER_LITE
LIVE PREVIEW

Nashorn War Stories (from a battle scarred veteran of invokedynamic ) - - PowerPoint PPT Presentation

THE QUEST FOR DYNAMIC LANGUAGE PERFORMANCE <Insert Picture Here> ON THE JVM [NASHORN WAR STORIES] (from a battle scarred veteran of invokedynamic ) Nashorn War Stories (from a battle scarred veteran of invokedynamic ) Marcus Lagergren


slide-1
SLIDE 1

<Insert Picture Here>

Nashorn War Stories (from a battle scarred veteran of invokedynamic)

Marcus Lagergren Oracle

Marcus Lagergren Oracle

THE QUEST FOR DYNAMIC LANGUAGE PERFORMANCE ON THE JVM

[NASHORN WAR STORIES]

(from a battle scarred veteran of invokedynamic)

slide-2
SLIDE 2

<Insert Picture Here>

Nashorn War Stories (from a battle scarred veteran of invokedynamic)

Marcus Lagergren Oracle

Marcus Lagergren Oracle [NASHORN RANTS]

slide-3
SLIDE 3

<Insert Picture Here>

Nashorn War Stories (from a battle scarred veteran of invokedynamic)

Marcus Lagergren Oracle

Marcus Lagergren Oracle [JAVASCRIPT RANTS]

slide-4
SLIDE 4

The Legal Slide

"THE FOLLOWING IS INTENDED TO OUTLINE OUR GENERAL PRODUCT DIRECTION. IT IS INTENDED FOR INFORMATION PURPOSES ONLY, AND MAY NOT BE INCORPORATED INTO ANY CONTRACT. IT IS NOT A COMMITMENT TO DELIVER ANY MATERIAL, CODE, OR FUNCTIONALITY, AND SHOULD NOT BE RELIED UPON IN MAKING PURCHASING DECISION. THE DEVELOPMENT, RELEASE, AND TIMING OF ANY FEATURES OR FUNCTIONALITY DESCRIBED FOR ORACLE'S PRODUCTS REMAINS AT THE SOLE DISCRETION OF ORACLE."

slide-5
SLIDE 5

Who am I?

@lagergren

slide-6
SLIDE 6
slide-7
SLIDE 7
slide-8
SLIDE 8
slide-9
SLIDE 9
slide-10
SLIDE 10

I am here to talk about…

slide-11
SLIDE 11

I am here to talk about…

What we’ve suffered through so far to implement a dynamic language on the JVM

slide-12
SLIDE 12

I am here to talk about…

What we’ve suffered through so far to implement a dynamic language on the JVM The Nashorn Project

slide-13
SLIDE 13

Also – a parade of JavaScript horrors

slide-14
SLIDE 14

Agenda

  • What is Nashorn and why?
  • The problem of compiling an alien language to

Java [sic] bytecode

  • Types
  • Optimistic assumptions
  • The JVM and its issues
slide-15
SLIDE 15

What is Nashorn and why?

slide-16
SLIDE 16

What is Nashorn?

  • Nashorn is a 100% pure Java runtime for JavaScript
  • Nashorn generates bytecode
  • Invokedynamics are everywhere
  • Nashorn currently performs somewhere on the order of

~2-10x better than Rhino

  • Nashorn is in JDK 8
  • Nashorn is 100% ECMAScript compliant
  • Nashorn has a well thought through security model
slide-17
SLIDE 17

Why Nashorn?

  • Started as an invokedynamic POC.
  • Rhino is still alive today after ~18 years. Why?
  • JSR-223
  • Nashorn is now mature and replaces Rhino for Java 8
slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20

Performance

rhino nashorn 1 2 3 4 5 6 7 8 rhino nashorn
slide-21
SLIDE 21

Performance

rhino nashorn 1 2 3 4 5 6 7 8 rhino nashorn
slide-22
SLIDE 22

When is Nashorn available?

  • Nashorn is part of OpenJDK8
  • Already available in JDK 8 builds.

> jjs jjs> var x = “hello”; jjs> print(x); hello jjs>

slide-23
SLIDE 23

Compiling an alien language to Java [sic] bytecode

slide-24
SLIDE 24

Compiling an alien (non-Java language) to bytecode

slide-25
SLIDE 25

Compiling an alien (non-Java language) to bytecode

  • Scala is fairly good fit
slide-26
SLIDE 26

Compiling an alien (non-Java language) to bytecode

  • Scala is fairly good fit
  • Yes I know: hard tail call optimization, interface

injection etc.

slide-27
SLIDE 27

Compiling an alien (non-Java language) to bytecode

  • Scala is fairly good fit
  • Yes I know: hard tail call optimization, interface

injection etc.

  • Ruby and JavaScript are pretty bad fits
slide-28
SLIDE 28

Compiling an alien (non-Java language) to bytecode

  • Scala is fairly good fit
  • Yes I know: hard tail call optimization, interface

injection etc.

  • Ruby and JavaScript are pretty bad fits
  • No types
  • Things change at runtime. A lot.
  • Invokedynamic certainly alleviates a lot of the pain,

but plenty of stuff remains to be solved

slide-29
SLIDE 29

JavaScript! Was it deliberately designed to make every efficient representation useless?

slide-30
SLIDE 30

Let’s talk about JavaScript

jjs> Array.prototype[1] = 17;

slide-31
SLIDE 31

Let’s talk about JavaScript

jjs> Array.prototype[1] = 17; 17 jjs>

slide-32
SLIDE 32

Let’s talk about JavaScript

jjs> Array.prototype[1] = 17; 17 jjs> print([,,,]);

slide-33
SLIDE 33

Let’s talk about JavaScript

jjs> Array.prototype[1] = 17; 17 jjs> print([,,,]); ,17, jjs>

slide-34
SLIDE 34

Let’s talk about JavaScript - Numbers

  • Numbers in JavaScript have no fixed ranges
  • “Intish”. “Doublish”.
  • Not very nice for strongly typed bytecode
  • Overflows must be handled
  • Conservative: At least they tend to fit in Java doubles.
slide-35
SLIDE 35

Let’s talk about JavaScript - Numbers

  • Double arithmetic is slower than integer arithmetic on

modern HW

  • But double arithmetic is sometimes faster than int

arithmetic with the necessary overflow checks.

  • WAT!
  • (getting back to that)
slide-36
SLIDE 36

Let’s talk about JavaScript – Types/Numbers

  • HotSpot itself was originally tested and developed

with bytecode that came from Java

  • Representing everything as Objects to get the

bytecode format type agnostic is nowhere near viable, performance wise.

  • Boxing
  • Go primitive
slide-37
SLIDE 37

We should

  • For bytecode performance we should
  • Use whatever static types we have
  • (mostly) done
  • Optimistically assume stuff about types
  • On it
slide-38
SLIDE 38

Let’s talk about JavaScript – Static type info

  • JavaScript type coercion semantics and literals – uses

and definitions

  • That’s all the static type info we’re going to get from the

compiler

  • Java int: statically enough for ~,&,|,^
  • Java double: statically enough for: *,/,-,%
  • Object: binary + and pretty much everything else
slide-39
SLIDE 39

Let’s talk about JavaScript – Static type info

  • Callsites, though. How do we deal with parameter types?

int square(int x) { return x * x; } iload_0 dup imul ireturn

slide-40
SLIDE 40

Let’s talk about JavaScript – Static type info

  • But…

function square(x) { return x * x; } jjs> square(2) 4 jjs> square(2.1) 4.41 jjs> square(“a”) NaN

slide-41
SLIDE 41

Let’s talk about JavaScript – Static type info

  • So conservatively…

square(Ljava/lang/Object;)D aload_0 // hopefully just unbox: invokestatic coerce2Double(Ljava/lang/Object;)D dup dmul // returns mul result, so always double dreturn

slide-42
SLIDE 42

Let’s talk about JavaScript – Static type info

  • Guess again

jjs> square({ valueOf: function() { global++; return 2 + global; }); ...

slide-43
SLIDE 43

Let’s talk about JavaScript – Static type info

  • So conservatively…

square(Ljava/lang/Object;)D aload_0 // hopefully just unbox: invokestatic coerce2Double(Ljava/lang/Object;)D dup dmul // returns mul result, so always double dreturn

slide-44
SLIDE 44

Let’s talk about JavaScript – Static type info

*sigh* - well at least the return value HAS to be double

square(Ljava/lang/Object;)D aload_0 invokestatic coerce2Double(Ljava/lang/Object;)D aload_0 invokestatic coerce2Double(Ljava/lang/Object;)D dmul // returns mul result, so always double dreturn

slide-45
SLIDE 45

JavaScript has a lot of magic in its number coercion

var dict = Object.create(null); var key = ‘valueOf’; //later dict[key] = formatHarddriveFunction; //much later dict++;

slide-46
SLIDE 46

… and this turns into “10”, of course

++[[]][+[]]+[+[]] === “10”

Brendan

slide-47
SLIDE 47

Fibbonacci calculator

function fib(_) { for(_=[+[],++[[]][+[]],+[],_],_[++[++[++[[]][+[]]] [+[]]][+[]]]=(((_[++[++[++[[]][+[]]][+[]]][+[]]]- (++[[]][+[]]))&(((--[[]][+[]])>>>(++[[]][+[]])))) ===(_[++[++[++[[]][+[]]][+[]]][+[]]]- (++[[]][+[]])))?(_[++[++[[]][+[]]][+[]]]= ++[[]][+[]],_[++[++[++[[]][+[]]][+[]]][+[]]]- (++[[]][+[]])):+[];_[++[++[++[[]][+[]]][+[]]] [+[]]]--;_[+[]]=(_[++[[]][+[]]]= _[++[++[[]][+[]]][+[]]]=_[+[]]+_[++[[]][+[]]])- _[+[]]); return _[++[++[[]][+[]]][+[]]]; }

slide-48
SLIDE 48

Callsite specialization

  • We can, and do, use static callsite types though.
  • (ignore int overflows for a bit)

// Even if square is replaced, callsite type is not // It always takes a number, always returns a number var a = b * square(17.0);

slide-49
SLIDE 49

Callsite specialization

  • We can, and do, use static callsite types though.
  • (ignore int overflows for a bit)

// Even if square is replaced, callsite type is not // It always takes a number, always returns a number var a = b * square(17.0); square(D)D dload 0 dup dmul dreturn

slide-50
SLIDE 50

Callsite specialization

  • We can, and do, use static callsite types though.
  • (ignore int overflows for a bit)

// Even if square is replaced, callsite type is not // It always takes a number, always returns a number var a = b * square(17.0); square = function(x) { return x + “string”; } square(D)D dload 0 dup dmul dreturn

slide-51
SLIDE 51

Callsite specialization

square(Ljava/lang/Object;)Ljava/lang/Object; aload 0 ldc “string” JS_ADD(Ljava/lang/Object;Ljava/lang/Object);Ljava/lang/Object; areturn

slide-52
SLIDE 52

Callsite specialization

square(Ljava/lang/Object;)Ljava/lang/Object; aload 0 ldc “string” JS_ADD(Ljava/lang/Object;Ljava/lang/Object);Ljava/lang/Object; areturn revert_square(D)D dload 0 coerceToJSObject(D)Ljava/lang/Object; # param filter invokedynamic square(Ljava/lang/Object;)Ljava/lang/Object; coerceToDouble(Ljava/lang/Object;)D dreturn

slide-53
SLIDE 53

Callsite specialization

square(Ljava/lang/Object;)Ljava/lang/Object; aload 0 ldc “string” JS_ADD(Ljava/lang/Object;Ljava/lang/Object);Ljava/lang/Object; areturn revert_square(D)D dload 0 coerceToJSObject(D)Ljava/lang/Object; # param filter invokedynamic square(Ljava/lang/Object;)Ljava/lang/Object; coerceToDouble(Ljava/lang/Object;)D dreturn

slide-54
SLIDE 54

Static compile time types bring us performance, [But they are too rare to take us all the way]

slide-55
SLIDE 55

Type Specialization

function am3(i,x,w,j,c,n) { var this_array = this.array; var w_array = w.array; var xl = x&0x3fff, xh = x>>14; while(--n >= 0) { var l = this_array[i]&0x3fff; var h = this_array[i++]>>14; var m = xh*l+h*xl; l = xl*l+((m&0x3fff)<<14)+w_array[j]+c; c = (l>>28)+(m>>14)+xh*h; w_array[j++] = l&0xfffffff; } return c; }

slide-56
SLIDE 56

Type Specialization – Prove ints

function am3(i,x,w,j,c,n) { var this_array = this.array; var w_array = w.array; var xl = x&0x3fff, xh = x>>14; while(--n >= 0) { var l = this_array[i]&0x3fff; var h = this_array[i++]>>14; var m = xh*l+h*xl; l = xl*l+((m&0x3fff)<<14)+w_array[j]+c; c = (l>>28)+(m>>14)+xh*h; w_array[j++] = l&0xfffffff; } return c; }

slide-57
SLIDE 57

Type Specialization – Prove doubles

function am3(i,x,w,j,c,n) { var this_array = this.array; var w_array = w.array; var xl = x&0x3fff, xh = x>>14; while(--n >= 0) { var l = this_array[i]&0x3fff; var h = this_array[i++]>>14; var m = xh*l+h*xl; l = xl*l+((m&0x3fff)<<14)+w_array[j]+c; c = (l>>28)+(m>>14)+xh*h; w_array[j++] = l&0xfffffff; } return c; }

slide-58
SLIDE 58

Static range analysis – fold doubles to ints

function am3(i,x,w,j,c,n) { var this_array = this.array; var w_array = w.array; var xl = x&0x3fff, xh = x>>14; // xl = max 32 bits, xh: 18 bits while(--n >= 0) { var l = this_array[i]&0x3fff; // l max 12 bits var h = this_array[i++]>>14; // h max (32-14) = 18 bits var m = xh*l+h*xl; // will never overflow l = xl*l+((m&0x3fff)<<14)+w_array[j]+c; c = (l>>28)+(m>>14)+xh*h; w_array[j++] = l&0xfffffff; } return c; }

slide-59
SLIDE 59

Static range analysis

function am3(i,x,w,j,c,n) { var this_array = this.array; var w_array = w.array; var xl = x&0x3fff, xh = x>>14; // xl = max 32 bits, xh: 18 bits while(--n >= 0) { var l = this_array[i]&0x3fff; // l max 12 bits var h = this_array[i++]>>14; // h max (32-14) = 18 bits var m = xh*l+h*xl; // will never overflow l = xl*l+((m&0x3fff)<<14)+w_array[j]+c; c = (l>>28)+(m>>14)+xh*h; w_array[j++] = l&0xfffffff; } return c; }

slide-60
SLIDE 60

Do we need our own inlining as well?

slide-61
SLIDE 61

Do we need our own inlining as well?

We can statically prove a few primitive numbers from callsites to am3. Not from all of them.

Runtime callsite is really: (Ljava/lang/Object;IILjava/lang/Object;III)I Statically unprovable, though

slide-62
SLIDE 62

Summary – Static analysis

  • Just ignore all primitive types – use boxing everywhere

and axxx instructions

  • Way too slow. The JVM is nowhere near being able to

cope with that amount of boxing, and probably never will

slide-63
SLIDE 63

Summary – Static analysis

  • Just ignore all primitive types – use boxing everywhere

and axxx instructions

  • Way too slow. The JVM is nowhere near being able to

cope with that amount of boxing, and probably never will

  • Use what primitives we can
  • Definitely gives us performance, depending on the

amount of statically provable primitives

slide-64
SLIDE 64

Summary – Static analysis

  • Just ignore all primitive types – use boxing everywhere

and axxx instructions

  • Way too slow. The JVM is nowhere near being able to

cope with that amount of boxing, and probably never will

  • Use what primitives we can
  • Definitely gives us performance, depending on the

amount of statically provable primitives

  • Add static range checking
  • Gives us another 30% or so
slide-65
SLIDE 65

Summary – Static analysis

  • Just ignore all primitive types – use boxing everywhere

and axxx instructions

  • Way too slow. The JVM is nowhere near being able to

cope with that amount of boxing, and probably never will

  • Use what primitives we can
  • Definitely gives us performance, depending on the

amount of statically provable primitives

  • Add static range checking
  • Gives us another 30% or so
  • Augment CFG with usedef chains to establish param

types

slide-66
SLIDE 66

But soon… static analysis won’t get us further unless we build our own native JavaScript runtime

slide-67
SLIDE 67

But soon… static analysis won’t get us further unless we build our own native JavaScript runtime Become adaptive/dynamic/optimistic

slide-68
SLIDE 68

Statically provable callsites for am3

  • (Object, int, Object, Object, double, int, Object)Object
  • (Object, Object, Object, Object, double, int, int)Object
  • (Object, Object, double, Object, double, Object, double)Object
  • (Object, Object, Object, Object, double, int, int)Object
  • (Object, int, int, Object, double, int, Object)Object
  • (Object, int, Object, Object, Object, int, Object)Object
slide-69
SLIDE 69

In fact they are…

  • (Object, int, int, Object, int, int, int)Object
  • (Object, int, int, Object, int, int, int)Object
  • (Object, int, int, Object, int, int, int)Object
  • (Object, int, int, Object, int, int, int)Object
  • (Object, int, int, Object, int, int, int)Object
  • (Object, int, int, Object, int, int, int)Object
slide-70
SLIDE 70

In fact they are…

  • (Object, int, int, Object, int, int, int)Object
  • (Object, int, int, Object, int, int, int)Object
  • (Object, int, int, Object, int, int, int)Object
  • (Object, int, int, Object, int, int, int)Object
  • (Object, int, int, Object, int, int, int)Object
  • (Object, int, int, Object, int, int, int)Object
  • We know this when linking at runtime
slide-71
SLIDE 71

In fact they are…

  • (Object, int, int, Object, int, int, int)Object
  • (Object, int, int, Object, int, int, int)Object
  • (Object, int, int, Object, int, int, int)Object
  • (Object, int, int, Object, int, int, int)Object
  • (Object, int, int, Object, int, int, int)Object
  • (Object, int, int, Object, int, int, int)Object
  • We know this when linking at runtime
  • Use this signature to generate an optimistic version of am3, guard the types
  • Just because it’s int right now, doesn’t mean it’s not undefined later. Guard

required.

slide-72
SLIDE 72

In fact they are…

  • (Object, int, int, Object, int, int, int)Object
  • (Object, int, int, Object, int, int, int)Object
  • (Object, int, int, Object, int, int, int)Object
  • (Object, int, int, Object, int, int, int)Object
  • (Object, int, int, Object, int, int, int)Object
  • (Object, int, int, Object, int, int, int)Object
  • We know this when linking at runtime
  • Use this signature to generate an optimistic version of am3, guard the types
  • Just because it’s int right now, doesn’t mean it’s not undefined later. Guard

required.

  • x2 Performance
slide-73
SLIDE 73

We really want to use ints where we can

  • x++ pessimistic: x is double (if no static range analysis can prove
  • therwise)
  • Having a double as a loop counter is slow
  • Loop unrolling doesn’t work for non integer strides
  • Factor ~50 in improvement if replacing with ints

function f() { var x = 0; while (x < y) { x++; } return x; }

slide-74
SLIDE 74

We really want to use ints where we can

  • All non-bitwise arithmetic can potentially overflow
  • The + operator is the worst, as it can take any object
  • Experiment: TypeScript frontend
  • A lot more performance with no further mods
  • Nashorn performs well with known primitive int types

function f() { var x = 0; while (x < y) { x++; // dadd? iadd with overflow check? } return x; }

slide-75
SLIDE 75

Using ints, problem 1 of 2 – Overflow check

  • verhead

static int addExact(int x, int y) { int result = x + y; if ((x ^ result) & (y ^ result) < 0) { throw new ArithmeticException(“int overflow”) } return result; } function f() { var x = 0; while (x < y) { x = addExact(x, 1); } return x; }

This is actually pretty much as slow as the dadd alone Not sometimes, but often.

slide-76
SLIDE 76

Solution: Intrinsify math operations

  • Java 8: addExact/subExact/mulExact
  • Intrinsify them
  • Basically and addExact is just

add eax, edx jo fail ret fail: //slow stuff

  • < 10-15% slower than just the iadd when it doesn’t fault
  • Twice the speed of the non-intrinsified version with xors
  • Only slightly faster than dadd, but enables everything
slide-77
SLIDE 77

Solution: Intrinsify math operations

slide-78
SLIDE 78

This is almost native-fast with add intrinsic and the int specialization.

function f() { var x = 0; while (x < y) { x = addExact(x, 1); } return x; } iconst_0 istore_0 while: iload_0 invokedynamic get y()I if_icmpge exit iload_0 iconst_1 invokestatic addExact //intrinsic goto while exit: istore_0 ireturn

slide-79
SLIDE 79

(One more optimization: is y loop invariant? It may be a getter with side effects or anything as this is JavaScript hell… Hotspot won’t be able to tell with the indy)

function f() { var x = 0; while (x < y) { x = addExact(x, 1); } return x; } iconst_0 istore_0 invokedynamic get y()I //check primitive istore_1 while: iload_0 iload_1 // y if_icmpge exit iload_0 iconst_1 invokestatic addExact //intrinsic goto while exit: istore_0 ireturn

slide-80
SLIDE 80

Native-fast

iconst_0 istore_0 invokedynamic get y()I //check primitive istore_1 while: iload_0 iload_1 // y if_icmpge exit iload_0 iconst_1 invokestatic addExact //intrinsic goto while exit: istore_0 ireturn

slide-81
SLIDE 81

We really want to use ints where we can

Very common instance of same problem.

function f() { return 17 + array[3]; } ... bipush 17 aload 2 //scope invokedynamic get:array(Ljava/lang/Object;)Ljava/lang/Object; aload 2 iconst_3 invokedynamic getElem(Ljava/lang/Object;I)Ljava/lang/Object; invokedynamic ADD:OIO_I(ILjava/lang/Object;)Ljava/lang/Object; areturn
slide-82
SLIDE 82

We really want to use ints where we can

Very common instance of same problem.

function f() { return 17 + array[3]; } ... bipush 17 aload 2 //scope invokedynamic get:array(Ljava/lang/Object;)Ljava/lang/Object; aload 2 iconst_3 invokedynamic getElem(Ljava/lang/Object;I)I invokestatic Math.addExact ireturn
slide-83
SLIDE 83

Using ints problem 2 of 2 – erroneous assumptions

  • So what do we do if we overflow or miss an assumption?
  • Bytecode is strongly typed, so we can’t reuse the same

code

  • Throw errors or add guards/version code
slide-84
SLIDE 84
  • So what do we do if we overflow or miss an assumption?
  • Bytecode is strongly typed, so we can’t reuse the same

code

  • Throw errors or add guards/version code

if (x < y) { x &= 1; if (x < 2) { x *= 2; if (k) { x += “string” //keep branching } } } return x; //hope this is an int

Using ints problem 2 of 2 – erroneous assumptions

slide-85
SLIDE 85

So add a catch block, take a continuation and jump to a less specialized version of the code

slide-86
SLIDE 86

So add a catch block, take a continuation and jump to a less specialized version of the code Uh-oh…

slide-87
SLIDE 87

Continuations, you say?

Start out with

... ALOAD w_array ILOAD j INVOKEDYNAMIC dyn:getElem(I)I ... IADD ...
slide-88
SLIDE 88

Continuations, you say?

Mark callsite optimistic, tag it with a program point

... ALOAD w_array ILOAD j INVOKEDYNAMIC dyn:getElem(I)I [optimistic | pp 17] ... IADD ...
slide-89
SLIDE 89

Continuations, you say?

Add a return value filter throwing an Exception if we return a non-int type

public class UnwarrantedOptimismException extends Exception { ... public int getProgramRestartPointId() { ... }; public Object getReturnedValue() { ... }; }
slide-90
SLIDE 90

Continuations, you say?

Send a message to the caller to regenerate the method

try { ... ALOAD w_array ILOAD j // make sure bc stack is written to locals INVOKEDYNAMIC dyn:getElem(I)I [optimistic | pp 17] ... IADD ... } catch (UnwarrantedOptimismException e) { // ask linker to regenerate method throw new RewriteException(e.getId(), e.getReturnValue(), locals); }
slide-91
SLIDE 91

Continuations, you say?

  • We know when we are relinking a rewritable method
  • Add a MethodHandles.catchException for

RewriteException

  • Catch triggers recompilation, with the failed callsite made

more pessimistic.

  • Also generates and invokes a “rest of” method
restOfMethod(RewriteException e) { // store to locals e.getLocals(); // ... // all code after invokedynamic that failed with // maximum pessimism // (can never throw UnwarrantedOptimismException) return pessimisticReturnValue; }
slide-92
SLIDE 92

The JVM situation

slide-93
SLIDE 93

JVM issues

  • Java 7
  • Pretty quickly started giving us the infamous

NoClassDefFoundError bug

  • Circumvented by running with everything in

bootclasspath (Eww… )

  • Java 8
  • A lot of C++ was reimplemented as LambdaForms
  • Initially, 10% of Java 7 performance. L
slide-94
SLIDE 94

print(Math.round(0.5)); WTF?

slide-95
SLIDE 95
slide-96
SLIDE 96

JVM issues

slide-97
SLIDE 97

JVM issues

  • Many inlining problems
  • Even, traditionally, for normal Java code – add a code

line, 50% of performance disappears

  • Seen that from time to time with HotSpot
  • Relevant in our quick paths in Nashorn too
  • LambdaForms & MethodHandles
  • Tremendous pressure on inlining, lambda form

classes also on metaspace

  • Discovered a few very old bugs in C2 inliner
  • E.g: dead nodes counted as size.
slide-98
SLIDE 98

JVM issues

slide-99
SLIDE 99

JVM issues

slide-100
SLIDE 100

JVM issues

  • LambdaForms compile a lot of code, generate a lot of

metaspace stress

  • If we have to have LambdaForms, they might not be able

to remain in bytecode land?

  • Inlining, despite tweaking has a lot of problems that

remain to be solved

  • Boxing removal boxing removal boxing removal
  • (probably enabled by local escape analysis)
slide-101
SLIDE 101

JVM issues

  • MethodHandle.invoke (not exact) is slow
public class Test { private final static MethodHandle CALC = MethodHandles.publicLookup().findStatic( Test.class, "calc", int.class, int.class, Object.class); static int test() throws Throwable { MethodHandle mh = CALC; Object aString = "A"; int a = mh.invoke(1, aString); int b = mh.invoke(2, "B"); Integer c = mh.invoke((Integer)3, 3); return a+b+c; } static int calc(int x, Object o) { return x + o.hashCode(); } }
slide-102
SLIDE 102

JVM issues

  • MethodHandle.invoke (not exact) is slow
public class Test { private final static MethodHandle CALC = MethodHandles.publicLookup().findStatic( Test.class, "calc", int.class, int.class, Object.class); static int test() throws Throwable { return 140; } static int calc(int x, Object o) { return x + o.hashCode(); } }
slide-103
SLIDE 103

JVM issues

  • Still artifacts here. We do ugly stuff in Java like
@Override public long getLong(final long key) { final int index = ArrayIndex.getArrayIndex(key); final ArrayData array = getArray(); if (array.has(index)) { return array.getLong(index); } return getLong(index, convertKey(key)); }
slide-104
SLIDE 104

JVM issues

  • Still artifacts here. We do ugly stuff in Java like
@Override public long getLong(final double key) { final int index = ArrayIndex.getArrayIndex(key); final ArrayData array = getArray(); if (array.has(index)) { return array.getLong(index); } return getLong(index, convertKey(key)); }
slide-105
SLIDE 105

JVM issues

  • Still artifacts here. We do ugly stuff in Java like
@Override public long getLong(final Object key) { final int index = ArrayIndex.getArrayIndex(key); final ArrayData array = getArray(); if (array.has(index)) { return array.getLong(index); } return getLong(index, convertKey(key)); }
slide-106
SLIDE 106

JVM issues

  • Still artifacts here. We do ugly stuff in Java like
@Override public long getLong(final int key) { final ArrayData array = getArray(); if (array.has(key)) { return array.getLong(key); } return getLong(key, convertKey(key)); }
slide-107
SLIDE 107

War story: warmup

  • Indy intrinsically needs bootstrapping
  • Every call site contributes to warmup
  • LambdaForms contribute to warmup
  • Tiered compilation has gone back and forth.
  • Peak performance is reached sooner, even without

C2 compiling all the methods

  • Added deviation has been very large
  • C2 is slow
slide-108
SLIDE 108

Another war story: Metaspace

  • Runtime didn’t know about anonymous classes
  • Build b58-b74 were broken L
  • Compressed klass pointers gave us a fixed size 100 MB

default klass pointer chunk L

  • Metaspace allocated from metaspace pool subject to
  • fragmentation. Chunks went 5% full to different

classloaders

  • HotSpot did not hand back dealloced Metaspace

memory to the OS

slide-109
SLIDE 109

Future work – Nashorn

  • Optimistic code everywhere
  • Static analysis/IR
  • Field representations
  • Objects only, dual fields, sun.misc.TaggedArray

(TaggedObject?)

  • Parallelism
slide-110
SLIDE 110

Future work - JVM

  • Boxing removal (probably requires Local EA)
  • sun.misc.TaggedArray?
  • Intrinsify Math.addExact and friends
  • Done!
  • MethodHandle.invoke must be fast
  • LambdaForms
  • Caching for footprint?
  • Replacing LambdaForms with something else?
  • Get them out of class/bytecode land
slide-111
SLIDE 111

Future work - JVM

  • Is bytecode even the correct format to do this entire in
  • Pluggable frontends?
  • More magic: I probably really need to talk to my

compiler

  • Or have my compiler talk to me
slide-112
SLIDE 112

Nashorn current performance status

  • (Very) initial POC after 2.5 weeks of work:
  • Broke out octane.crypto.am3 – the hotspot in

the Crypto benchmark in octane.

  • Turned it into microbenchmark
slide-113
SLIDE 113

Nashorn current performance status

  • Runtime
  • Rhino (with –opt 9):

34.6 s

  • Nashorn tip:

10.8 s

  • V8

1.3 s

slide-114
SLIDE 114

Nashorn with optimistic types

  • Runtime
  • Rhino (with –opt 9):

34.6 s

  • Nashorn tip:

5.8 s

  • V8

1.3 s

slide-115
SLIDE 115

Add JVM math intrinsics…

  • Runtime
  • Rhino (with –opt 9):

34.6 s

  • Nashorn tip:

4.4 s

  • V8

1.3 s

slide-116
SLIDE 116

Patch JVM to keep more type info while inlining…

  • Runtime
  • Rhino (with –opt 9):

34.6 s

  • Nashorn tip:

2.5 s

  • V8

1.3 s

slide-117
SLIDE 117

Talk to us

  • Tweet us: @lagergren, @wickund, @asz,

@hannesw, @sundararajan_a

  • http://blogs.oracle.com/nashorn
  • nashorn-dev@openjdk.java.net
  • mlvm-dev@openjdk.java.net
slide-118
SLIDE 118

Thank you!

Q&A?

@lagergren

slide-119
SLIDE 119