nashorn war stories from a battle scarred veteran of
play

Nashorn War Stories (from a battle scarred veteran of invokedynamic ) - PowerPoint PPT Presentation

THE QUEST FOR DYNAMIC LANGUAGE PERFORMANCE <Insert Picture Here> ON THE JVM [NASHORN WAR STORIES] (from a battle scarred veteran of invokedynamic ) Nashorn War Stories (from a battle scarred veteran of invokedynamic ) Marcus Lagergren


  1. Type Specialization function am3(i,x,w,j,c,n) { var this_array = this.array; var w_array = w.array; var xl = x&0x3fff, xh = x>>14; while(--n >= 0) { var l = this_array[i]&0x3fff; var h = this_array[i++]>>14; var m = xh*l+h*xl; l = xl*l+((m&0x3fff)<<14)+w_array[j]+c; c = (l>>28)+(m>>14)+xh*h; w_array[j++] = l&0xfffffff; } return c; }

  2. Type Specialization – Prove ints function am3(i,x,w,j,c,n) { var this_array = this.array; var w_array = w.array; var xl = x&0x3fff , xh = x>>14 ; while(--n >= 0) { var l = this_array[i]&0x3fff ; var h = this_array[i++]>>14 ; var m = xh*l+h*xl; l = xl*l+ ((m&0x3fff)<<14) +w_array[j]+c; c = (l>>28) + (m>>14) +xh*h; w_array[j++] = l&0xfffffff ; } return c; }

  3. Type Specialization – Prove doubles function am3(i,x,w,j,c,n) { var this_array = this.array; var w_array = w.array; var xl = x&0x3fff , xh = x>>14 ; while(-- n >= 0) { var l = this_array[i]&0x3fff ; var h = this_array[i++]>>14 ; var m = xh*l+h*xl ; l = xl*l + ((m&0x3fff)<<14) +w_array[j]+c; c = (l>>28) + (m>>14) + xh*h ; w_array[ j ++] = l&0xfffffff ; } return c; }

  4. Static range analysis – fold doubles to ints function am3(i,x,w,j,c,n) { var this_array = this.array; var w_array = w.array; var xl = x&0x3fff , xh = x>>14 ; // xl = max 32 bits, xh: 18 bits while(-- n >= 0) { var l = this_array[i]&0x3fff ; // l max 12 bits var h = this_array[i++]>>14 ; // h max (32-14) = 18 bits var m = xh*l+h*xl ; // will never overflow l = xl*l + ((m&0x3fff)<<14) +w_array[j]+c; c = (l>>28) + (m>>14) + xh*h ; w_array[ j ++] = l&0xfffffff ; } return c; }

  5. Static range analysis function am3(i,x,w,j,c,n) { var this_array = this.array; var w_array = w.array; var xl = x&0x3fff , xh = x>>14 ; // xl = max 32 bits, xh: 18 bits while(-- n >= 0) { var l = this_array[i]&0x3fff ; // l max 12 bits var h = this_array[i++]>>14 ; // h max (32-14) = 18 bits var m = xh*l+h*xl ; // will never overflow l = xl*l + ((m&0x3fff)<<14) +w_array[j]+c; c = (l>>28) + (m>>14) + xh*h ; w_array[ j ++] = l&0xfffffff ; } return c; }

  6. Do we need our own inlining as well?

  7. Do we need our own inlining as well? We can statically prove a few primitive numbers from callsites to am3 . Not from all of them. Runtime callsite is really: (Ljava/lang/Object;IILjava/lang/Object;III)I Statically unprovable, though

  8. Summary – Static analysis Just ignore all primitive types – use boxing everywhere • and axxx instructions Way too slow. The JVM is nowhere near being able to • cope with that amount of boxing, and probably never will

  9. Summary – Static analysis Just ignore all primitive types – use boxing everywhere • and axxx instructions Way too slow. The JVM is nowhere near being able to • cope with that amount of boxing, and probably never will Use what primitives we can • Definitely gives us performance, depending on the • amount of statically provable primitives

  10. Summary – Static analysis Just ignore all primitive types – use boxing everywhere • and axxx instructions Way too slow. The JVM is nowhere near being able to • cope with that amount of boxing, and probably never will Use what primitives we can • Definitely gives us performance, depending on the • amount of statically provable primitives Add static range checking • Gives us another 30% or so •

  11. Summary – Static analysis Just ignore all primitive types – use boxing everywhere • and axxx instructions Way too slow. The JVM is nowhere near being able to • cope with that amount of boxing, and probably never will Use what primitives we can • Definitely gives us performance, depending on the • amount of statically provable primitives Add static range checking • Gives us another 30% or so • Augment CFG with usedef chains to establish param • types

  12. But soon … static analysis won’t get us further unless we build our own native JavaScript runtime

  13. But soon … static analysis won’t get us further unless we build our own native JavaScript runtime Become adaptive/dynamic/optimistic

  14. Statically provable callsites for am3 (Object, int, Object, Object, double, int, Object)Object • (Object, Object, Object, Object, double, int, int)Object • (Object, Object, double, Object, double, Object, double)Object • (Object, Object, Object, Object, double, int, int)Object • (Object, int, int, Object, double, int, Object)Object • (Object, int, Object, Object, Object, int, Object)Object •

  15. In fact they are … (Object, int, int, Object, int, int, int)Object • (Object, int, int, Object, int, int, int)Object • (Object, int, int, Object, int, int, int)Object • (Object, int, int, Object, int, int, int)Object • (Object, int, int, Object, int, int, int)Object • (Object, int, int, Object, int, int, int)Object •

  16. In fact they are … (Object, int, int, Object, int, int, int)Object • (Object, int, int, Object, int, int, int)Object • (Object, int, int, Object, int, int, int)Object • (Object, int, int, Object, int, int, int)Object • (Object, int, int, Object, int, int, int)Object • (Object, int, int, Object, int, int, int)Object • We know this when linking at runtime •

  17. In fact they are … (Object, int, int, Object, int, int, int)Object • (Object, int, int, Object, int, int, int)Object • (Object, int, int, Object, int, int, int)Object • (Object, int, int, Object, int, int, int)Object • (Object, int, int, Object, int, int, int)Object • (Object, int, int, Object, int, int, int)Object • We know this when linking at runtime • Use this signature to generate an optimistic version of am3 , guard the types • Just because it’s int right now, doesn’t mean it’s not undefined later. Guard • required.

  18. In fact they are … (Object, int, int, Object, int, int, int)Object • (Object, int, int, Object, int, int, int)Object • (Object, int, int, Object, int, int, int)Object • (Object, int, int, Object, int, int, int)Object • (Object, int, int, Object, int, int, int)Object • (Object, int, int, Object, int, int, int)Object • We know this when linking at runtime • Use this signature to generate an optimistic version of am3 , guard the types • Just because it’s int right now, doesn’t mean it’s not undefined later. Guard • required. x2 Performance •

  19. We really want to use ints where we can x++ pessimistic: x is double (if no static range analysis can prove • otherwise) Having a double as a loop counter is slow • Loop unrolling doesn’t work for non integer strides • Factor ~50 in improvement if replacing with ints • function f() { var x = 0; while (x < y) { x++; } return x; }

  20. We really want to use ints where we can All non-bitwise arithmetic can potentially overflow • The + operator is the worst, as it can take any object • Experiment: TypeScript frontend • A lot more performance with no further mods • Nashorn performs well with known primitive int types • function f() { var x = 0; while (x < y) { x++; // dadd? iadd with overflow check? } return x; }

  21. Using ints, problem 1 of 2 – Overflow check overhead static int addExact(int x, int y) { int result = x + y; if ((x ^ result) & (y ^ result) < 0) { throw new ArithmeticException(“int overflow”) } return result; } function f() { var x = 0; while (x < y) { x = addExact(x, 1); } return x; } This is actually pretty much as slow as the dadd alone Not sometimes, but often.

  22. Solution: Intrinsify math operations Java 8: addExact/subExact/mulExact • Intrinsify them • Basically and addExact is just • add eax, edx jo fail ret fail: //slow stuff < 10-15% slower than just the iadd when it doesn’t fault • Twice the speed of the non-intrinsified version with xor s • Only slightly faster than dadd , but enables everything •

  23. Solution: Intrinsify math operations

  24. function f() { iconst_0 var x = 0; istore_0 while (x < y) { while: x = addExact(x, 1); iload_0 } invokedynamic get y()I return x; if_icmpge exit } iload_0 iconst_1 invokestatic addExact //intrinsic goto while exit: istore_0 ireturn This is almost native-fast with add intrinsic and the int specialization.

  25. function f() { iconst_0 istore_0 var x = 0; invokedynamic get y()I //check primitive while (x < y) { istore_1 x = addExact(x, 1); while: } iload_0 return x; iload_1 // y } if_icmpge exit iload_0 iconst_1 invokestatic addExact //intrinsic goto while exit: istore_0 ireturn (One more optimization: is y loop invariant? It may be a getter with side effects or anything as this is JavaScript hell … Hotspot won’t be able to tell with the indy)

  26. iconst_0 istore_0 invokedynamic get y()I //check primitive istore_1 while: iload_0 iload_1 // y if_icmpge exit iload_0 iconst_1 invokestatic addExact //intrinsic goto while exit: istore_0 ireturn Native-fast

  27. We really want to use ints where we can Very common instance of same problem. function f() { return 17 + array[3]; } ... bipush 17 aload 2 //scope invokedynamic get:array(Ljava/lang/Object;)Ljava/lang/Object; aload 2 iconst_3 invokedynamic getElem(Ljava/lang/Object;I)Ljava/lang/Object; invokedynamic ADD:OIO_I(ILjava/lang/Object;)Ljava/lang/Object; areturn

  28. We really want to use ints where we can Very common instance of same problem. function f() { return 17 + array[3]; } ... bipush 17 aload 2 //scope invokedynamic get:array(Ljava/lang/Object;)Ljava/lang/Object; aload 2 iconst_3 invokedynamic getElem(Ljava/lang/Object;I)I invokestatic Math.addExact ireturn

  29. Using ints problem 2 of 2 – erroneous assumptions So what do we do if we overflow or miss an assumption? • Bytecode is strongly typed, so we can’t reuse the same • code Throw errors or add guards/version code •

  30. Using ints problem 2 of 2 – erroneous assumptions So what do we do if we overflow or miss an assumption? • Bytecode is strongly typed, so we can’t reuse the same • code Throw errors or add guards/version code • if (x < y) { x &= 1; if (x < 2) { x *= 2; if (k) { x += “string” //keep branching } } } return x; //hope this is an int

  31. So add a catch block, take a continuation and jump to a less specialized version of the code

  32. So add a catch block, take a continuation and jump to a less specialized version of the code Uh-oh …

  33. Continuations, you say? Start out with ... ALOAD w_array ILOAD j INVOKEDYNAMIC dyn:getElem(I)I ... IADD ...

  34. Continuations, you say? Mark callsite optimistic, tag it with a program point ... ALOAD w_array ILOAD j INVOKEDYNAMIC dyn:getElem(I)I [optimistic | pp 17] ... IADD ...

  35. Continuations, you say? Add a return value filter throwing an Exception if we return a non-int type public class UnwarrantedOptimismException extends Exception { ... public int getProgramRestartPointId() { ... }; public Object getReturnedValue() { ... }; }

  36. Continuations, you say? Send a message to the caller to regenerate the method try { ... ALOAD w_array ILOAD j // make sure bc stack is written to locals INVOKEDYNAMIC dyn:getElem(I)I [optimistic | pp 17] ... IADD ... } catch (UnwarrantedOptimismException e) { // ask linker to regenerate method throw new RewriteException(e.getId(), e.getReturnValue(), locals); }

  37. Continuations, you say? We know when we are relinking a rewritable method • Add a MethodHandles.catchException for • RewriteException Catch triggers recompilation, with the failed callsite made • more pessimistic. Also generates and invokes a “rest of” method • restOfMethod(RewriteException e) { // store to locals e.getLocals(); // ... // all code after invokedynamic that failed with // maximum pessimism // (can never throw UnwarrantedOptimismException) return pessimisticReturnValue; }

  38. The JVM situation

  39. JVM issues Java 7 • Pretty quickly started giving us the infamous • NoClassDefFoundError bug Circumvented by running with everything in • bootclasspath (Eww … ) Java 8 • A lot of C++ was reimplemented as LambdaForms • Initially, 10% of Java 7 performance. L •

  40. print(Math.round(0.5)); WTF?

  41. JVM issues

  42. JVM issues Many inlining problems • Even, traditionally, for normal Java code – add a code • line, 50% of performance disappears Seen that from time to time with HotSpot • Relevant in our quick paths in Nashorn too • LambdaForms & MethodHandles • Tremendous pressure on inlining, lambda form • classes also on metaspace Discovered a few very old bugs in C2 inliner • E.g: dead nodes counted as size. •

  43. JVM issues

  44. JVM issues

  45. JVM issues LambdaForms compile a lot of code, generate a lot of • metaspace stress If we have to have LambdaForms, they might not be able • to remain in bytecode land? Inlining, despite tweaking has a lot of problems that • remain to be solved Boxing removal boxing removal boxing removal • (probably enabled by local escape analysis) •

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend