open vocabulary learning on source code with a graph
play

Open Vocabulary Learning on Source Code with a Graph-Structured - PowerPoint PPT Presentation

Open Vocabulary Learning on Source Code with a Graph-Structured Cache Milan Cvitkovic Badal Singh Anima Anandkumar Caltech, Amazon Web Services Amazon Web Services Caltech ICML, 2019-6-12 Open Vocabulary Learning Goal: Models that can


  1. Open Vocabulary Learning on Source Code with a Graph-Structured Cache Milan Cvitkovic Badal Singh Anima Anandkumar Caltech, Amazon Web Services Amazon Web Services Caltech ICML, 2019-6-12

  2. Open Vocabulary Learning Goal: Models that can reason over flexible sets of inputs and outputs Standard, closed vocabulary model Open vocabulary 1 of 400k word embeddings → 1 of 400k words Any words → Any words

  3. Open Vocabulary Learning Motivation: Tasks on source code Example: Variable naming Input int <NAME-ME> = assertArraysAreSameLength(expected, Output actuals, header); ‘ expected_length’ for (int i = 0; i < <NAME-ME>; i++) { Object expected = Array.get(expected, i); Needs an open vocabulary In our data, 28% of variable names contain out–of–vocabulary word

  4. Graph-Structured Cache Strategy: Represent distinct words and usages with graph structure, process with GNN def get_jupyter_addr(): Original input jupyter_addr = ‘localhost’ if is_serving() else None return jupyter_addr jupyter get addr serving Edge Indicating Word Use Same input, represented using a Graph-Structured Cache <word> <word> <word> <word> <word> <word> <word> <word> <word> <word> <word> <word> <word> <word> <word> Edge Indicating Next Word

  5. Full Model for Tasks on Source Code Strategy from recent work [1] Input . . . . . . . . . /** SomeFile.java Field Method Declaration Method Declaration Reference Code Code add Foo Parameter add Foo Parameter public void addFoo(Foo foo){ Block Block this.myBaz.add(foo); Next Node } foo Method Call foo Method Call Field Field add Name Expr add Name Expr Access Access myBaz foo myBaz foo Last Use Parse code Augment AST with into AST semantic information [1] Allamanis et al. “Learning to Represent Programs with Graphs.” ICLR 2018

  6. Full Model for Tasks on Source Code Input . . . . . . . . . . . . . . . /** SomeFile.java Field Method Declaration Field Method Declaration Reference Method Declaration Reference Code Code add Foo Parameter add Foo Parameter Code public void addFoo(Foo foo){ Block Block add Foo Parameter foo Block this.myBaz.add(foo); Next Node Next Node } foo Method Call add foo Method Call foo Method Call my Field Field add Name Expr add Name Expr Field Access Access add Name Expr Access baz myBaz foo myBaz foo Last Use Word Use myBaz foo Last Use Parse code Augment AST with Add Graph-Structured into AST semantic information Cache Our main contribution to prior work

  7. Full Model for Tasks on Source Code Input . . . . . . . . . . . . . . . /** SomeFile.java Field Method Declaration Field Method Declaration Reference Method Declaration Reference Output Code Code add Foo Parameter add Foo Parameter Code public void addFoo(Foo foo){ Block Block add Foo Parameter (Depends on task) foo Block this.myBaz.add(foo); Next Node Next Node } foo Method Call add foo Method Call foo Method Call my Field Field add Name Expr add Name Expr Field Access Access add Name Expr Access baz myBaz foo myBaz foo Last Use Word Use myBaz foo Last Use Parse code Augment AST with Add Graph-Structured Convert all nodes to vectors, into AST semantic information Cache process with GNN

  8. Experiment: Variable Naming Task ● Full-name reproduction accuracy (and top 5 accuracy): For other tasks and experiments, see our poster or paper

  9. Takeaways Graph-Structured Caches are an appealing strategy for open vocabulary learning ○ Whatever your current embedding strategy, GSC + GNN can augment it ○ No free lunch! About 30% training slowdown. ○ But helps in all cases we tried, sometimes significantly

  10. Acknowledgments ● Badal Singh, Anima Anandkumar ● Miltos Allamanis ● Hyokun Yun ● Haibin Lin Our code, for use on your code https://github.com/mwcvitkovic/Open-Vocabulary-Learning-on-Source-Code-with-a-Graph-Structured-Cache--Code-Preprocessor https://github.com/mwcvitkovic/Open-Vocabulary-Learning-on-Source-Code-with-a-Graph-Structured-Cache

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend