Open Vocabulary Learning on Source Code with a Graph-Structured - PowerPoint PPT Presentation

Open Vocabulary Learning on Source Code with a Graph-Structured Cache Milan Cvitkovic Badal Singh Anima Anandkumar Caltech, Amazon Web Services Amazon Web Services Caltech ICML, 2019-6-12

Open Vocabulary Learning Goal: Models that can reason over flexible sets of inputs and outputs Standard, closed vocabulary model Open vocabulary 1 of 400k word embeddings → 1 of 400k words Any words → Any words

Open Vocabulary Learning Motivation: Tasks on source code Example: Variable naming Input int <NAME-ME> = assertArraysAreSameLength(expected, Output actuals, header); ‘ expected_length’ for (int i = 0; i < <NAME-ME>; i++) { Object expected = Array.get(expected, i); Needs an open vocabulary In our data, 28% of variable names contain out–of–vocabulary word

Graph-Structured Cache Strategy: Represent distinct words and usages with graph structure, process with GNN def get_jupyter_addr(): Original input jupyter_addr = ‘localhost’ if is_serving() else None return jupyter_addr jupyter get addr serving Edge Indicating Word Use Same input, represented using a Graph-Structured Cache <word> <word> <word> <word> <word> <word> <word> <word> <word> <word> <word> <word> <word> <word> <word> Edge Indicating Next Word

Full Model for Tasks on Source Code Strategy from recent work [1] Input . . . . . . . . . /** SomeFile.java Field Method Declaration Method Declaration Reference Code Code add Foo Parameter add Foo Parameter public void addFoo(Foo foo){ Block Block this.myBaz.add(foo); Next Node } foo Method Call foo Method Call Field Field add Name Expr add Name Expr Access Access myBaz foo myBaz foo Last Use Parse code Augment AST with into AST semantic information [1] Allamanis et al. “Learning to Represent Programs with Graphs.” ICLR 2018

Full Model for Tasks on Source Code Input . . . . . . . . . . . . . . . /** SomeFile.java Field Method Declaration Field Method Declaration Reference Method Declaration Reference Code Code add Foo Parameter add Foo Parameter Code public void addFoo(Foo foo){ Block Block add Foo Parameter foo Block this.myBaz.add(foo); Next Node Next Node } foo Method Call add foo Method Call foo Method Call my Field Field add Name Expr add Name Expr Field Access Access add Name Expr Access baz myBaz foo myBaz foo Last Use Word Use myBaz foo Last Use Parse code Augment AST with Add Graph-Structured into AST semantic information Cache Our main contribution to prior work

Full Model for Tasks on Source Code Input . . . . . . . . . . . . . . . /** SomeFile.java Field Method Declaration Field Method Declaration Reference Method Declaration Reference Output Code Code add Foo Parameter add Foo Parameter Code public void addFoo(Foo foo){ Block Block add Foo Parameter (Depends on task) foo Block this.myBaz.add(foo); Next Node Next Node } foo Method Call add foo Method Call foo Method Call my Field Field add Name Expr add Name Expr Field Access Access add Name Expr Access baz myBaz foo myBaz foo Last Use Word Use myBaz foo Last Use Parse code Augment AST with Add Graph-Structured Convert all nodes to vectors, into AST semantic information Cache process with GNN

Experiment: Variable Naming Task ● Full-name reproduction accuracy (and top 5 accuracy): For other tasks and experiments, see our poster or paper

Takeaways Graph-Structured Caches are an appealing strategy for open vocabulary learning ○ Whatever your current embedding strategy, GSC + GNN can augment it ○ No free lunch! About 30% training slowdown. ○ But helps in all cases we tried, sometimes significantly

Acknowledgments ● Badal Singh, Anima Anandkumar ● Miltos Allamanis ● Hyokun Yun ● Haibin Lin Our code, for use on your code https://github.com/mwcvitkovic/Open-Vocabulary-Learning-on-Source-Code-with-a-Graph-Structured-Cache--Code-Preprocessor https://github.com/mwcvitkovic/Open-Vocabulary-Learning-on-Source-Code-with-a-Graph-Structured-Cache

Open Vocabulary Learning on Source Code with a Graph-Structured - PowerPoint PPT Presentation

Open Vocabulary Learning on Source Code with a Graph-Structured Cache Milan Cvitkovic Badal Singh Anima Anandkumar Caltech, Amazon Web Services Amazon Web Services Caltech ICML, 2019-6-12 Open Vocabulary Learning Goal: Models that can

in practice source code source code javac scalac groovyc jrubyc 0xCAFEBABE byte code

Vocabulary and Reading in Secondary School (VaRiSS) Jessie Ricketts Royal Holloway Vocabulary

VOCABULARY ATI TEAS ENGLISH AND LANGUAGE USAGE VOCABULARY Vocabulary questions on this part of

VOCABULARY ATI TEAS ENGLISH AND LANGUAGE USAGE VOCABULARY Vocabulary questions on this part of

Building Science Vocabulary: Seeds of Science Roots of Reading Goal Review our model for

Teaching Vocabulary Pre-Teaching Vocabulary + Pre-Teaching Vocabulary: An Example for 2 nd -5 th

Make Money With Open Source What is Open Source? Community Free software vs. open source

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

and Retrieval Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H.

What s in a Word? Academic Vocabulary Development for ELLs CCRC 2014 1 Essential

Vocabulary Word #1 flinched : (verb) to make a quick, nervous movement. Ellie the elephant flinched

THE LORDSHIP OF JESUS RD THE LORDSHIP OF JESUS RD VOCABULARY THE LORDSHIP OF JESUS RD

Vocabulary Word #1 fury : (noun) wild or violent anger. In his fury , he could not answer the math

Open Source Databases Peter Zaitsev, CEO Percona What a Year! Huge changes for Open Source and

Open Source and Google Summer of Code TM plus the Google Highly Open Participation Contest TM

Automating Your Lights with Open Source Combining Open Source Hardware with Free and Open Source

Source code analysis and transformation Martin Monperrus Creative Commons Attribution License

IP3 2 0 1 7 A NA LYT IC S FO R MEDIA February 6, 2018 I P3 2017 Ove rvie w IP3 2017

Scala Macros for Mortals, or: How I Learned To Stop Worrying and Mumbling WTF?!?! Brendan

Rust Macros Ryan Eberhardt and Armin Namavari June 2, 2020 Logistics CS110L shouldnt be your

A Generalized Framework for Auto-tuning Stencil Computations Shoaib Kamil 1,3 , Cy Chan 4 , Samuel

Miri An interpreter for Rusts mid-level intermediate representation Scott Olson Supervisor:

gscc A General Search and Compare Compiler gscc is a text manipulation language that rivals

CMPS 112: Spring 2019 Comparative Programming Languages Lexing and Parsing Owen