Parsing Challenges in Java 8 . Erik Hogeman, Jesper qvist, Grel - - PowerPoint PPT Presentation

parsing challenges in java 8
SMART_READER_LITE
LIVE PREVIEW

Parsing Challenges in Java 8 . Erik Hogeman, Jesper qvist, Grel - - PowerPoint PPT Presentation

. Parsing Challenges in Java 8 . Erik Hogeman, Jesper qvist, Grel Hedin . Department of Computer Science Lund University . . . JastAddJ . JastAddJ is a full source-to-bytecode modular Java compiler each Java version is a separate


slide-1
SLIDE 1

.

Parsing Challenges in Java 8

. Erik Hogeman, Jesper Öqvist, Görel Hedin .

Department of Computer Science Lund University

.

slide-2
SLIDE 2

. .

JastAddJ.

JastAddJ is a full source-to-bytecode modular Java compiler each Java version is a separate module Java 8 was implemented by Erik Hogeman for his Master's Thesis this talk is about the parsing challenges encountered

. . 2

slide-3
SLIDE 3

. .

Java 8.

Noteworthy features: Lambdas Method references Default methods

. . 3

slide-4
SLIDE 4

. .

Lambdas.

Java finally has anonymous functions! (x, y) -> x + y () -> { action1(); action2(); }

. . 4

slide-5
SLIDE 5

. .

Lambda Example.

Action listeners the old way: button.addActionListener(new ActionListener() { public void actionPerformed(ActionEvent e) { print("hello"); } }); The new way, using lambda: button.addActionListener( (e) -> print("hello") );

. . 5

slide-6
SLIDE 6

. .

Method References.

A way of using regular instance methods as lambdas: Greeter greeter = new MyGreeter(); greetButton.addActionListener( greeter::greet ); exitButton.addActionListener( greeter::exit );

. . 6

slide-7
SLIDE 7

. .

Default Methods.

Interfaces can have non-abstract methods: interface Greeter { default void greet(ActionEvent e) { print("greetings"); } default void exit(ActionEvent e) { print("goodbye"); } } class MyGreeter implements Greeter { // use default implementations }

. . 7

slide-8
SLIDE 8

. .

Parsing.

We use an LALR parser for JastAddJ Generated with the Beaver parser generator Parser grammar is composed from parts in separate modules

. . 8

slide-9
SLIDE 9

. .

Why an LR Parser Generator?.

Advantages of a generated LR parser: Provably fast Generator certifies unambiguous grammar Decent tool support Bit more powerful than LL

. . 9

slide-10
SLIDE 10

. .

Java 8 Parsing Challenges.

Ambiguous grammar specification Reduce-reduce conflicts between subexpressions Shift-reduce conflict Unlimited lookahead

. . 10

slide-11
SLIDE 11

. .

Ambiguous Grammar Specification.

Java spec (highly edited): Expression -> Lambda Expression -> ... -> Additive

  • > Multiplicative -> ... -> Cast

Cast -> (Type) Lambda Input: (T) (a, b) -> a * b; Possible parse 1: ((T) (a, b) -> a) * b; Possible parse 2: (T) ((a, b) -> a * b);

. . 11

slide-12
SLIDE 12

. .

The second one is desired. We achieved this by: changed the grammar lambda as primary expression lowered priority using precedence declarations

. . 12

slide-13
SLIDE 13

. .

Lambda Reduce-Reduce Conflict.

Lambda vs less-than expression: (T<A> s) -> { } // lambda (T<A) // less-than expression This is a reduce-reduce conflict. Similar conflict in Java 5 with type cast: (T<A>) s // generic type cast (T<A) // less-than expression In both cases the T terminal must be reduced to either RelationalExpression or ReferenceType.

. . 13

slide-14
SLIDE 14

. .

Lambda Reduce-Reduce Conflict.

We solved the reduce-reduce conflict by giving the related parsing productions explicit common prefixes: Relational -> Name < Shift Relational -> Relational < Shift ... ReferenceType -> Name < TypeArguments_1 This removed the need to reduce the Name token too early.

. . 14

slide-15
SLIDE 15

. .

Unlimited Lookahead.

f(T<A, B>::m) // method reference f(T<A, B> m) // less-than expression There is no reasonable fixed lookahead that will allow the parser to decide between a less-than expression, or method reference.

. . 15

slide-16
SLIDE 16

. .

Scanner Decorator.

. . . Scanner . . Scanner Decorator . . Parser . . . Lookahead Buffer . . tokens . tokens The Scanner Decorator looks ahead in the token stream when certain tokens are encountered, then potentially modifies the token stream. In the previous case it inserts a synthetic LT_TYPE token.

. . 16

slide-17
SLIDE 17

. .

Conclusions.

Java is not LR, but with some modifications we can make it LR(1) So far implemented nearly all of Java 8 features (parsing is complete) Techniques we used to solve parsing challenges: Duplicate grammar to avoid reduce-reduce conflicts Introduce priority declarations to fix ambiguous grammar Scanner decorator to enable infinite lookahead

. . 17

slide-18
SLIDE 18

. .

Questions!

. . 18

slide-19
SLIDE 19

. .

Default Modifier Shift-Reduce.

We parse all modifiers using the same production (for methods, interfaces, classes). This introduced a shift-reduce conflict in switch-statements: switch (x) { case 0: default class A() { }; case 1: break; default: }

. . 19

slide-20
SLIDE 20

. .

Intersection Type Cast.

In Java 8 cast expressions can have the form: (A & B & C) x This form conflicts with binary expressions: (A & B & C) The conflict is very similar to the lambda versus less-than expression conflict.

. . 20

slide-21
SLIDE 21

. .

Parsing Intersection Type Casts.

We solve this conflict using the Scanner Decorator. Whenever a left-parenthesis is encountered, the decorator inserts the synthetic INTERCAST token if it determines that it is part of an intersection type cast.

. . 21