Program Synthesis for Character Level Language Modelling Pavol - PowerPoint PPT Presentation

Program Synthesis for Character Level Language Modelling Pavol Bielik Veselin Raychev Martin Vechev Department of Computer Science, ETH Zurich, Switzerland CSC2547, Winter 2018 Pavol Bielik, Veselin Raychev, Martin Vechev ( Department of Computer Science, ETH Zurich, Switzerland) Program Synthesis for Character Level Language Modelling CSC2547, Winter 2018 1 / 11

Motivation Neural networks are not as e ff ective on structured tasks (e.g., program synthesis). Neural network weights are di ffi cult to interpret. It is di ffi cult to define sub-models for di ff erent circumstances. Pavol Bielik, Veselin Raychev, Martin Vechev ( Department of Computer Science, ETH Zurich, Switzerland) Program Synthesis for Character Level Language Modelling CSC2547, Winter 2018 2 / 11

TChar TChar is a domain-specific language (DSL) for writing programs that define probabilistic n-gram models and variants. Variants include models trained on subsets of data, queried only when certain conditions are met, used to make certain classes of predictions, etc. Submodels can be composed into a larger model using if-then statements. Pavol Bielik, Veselin Raychev, Martin Vechev ( Department of Computer Science, ETH Zurich, Switzerland) Program Synthesis for Character Level Language Modelling CSC2547, Winter 2018 3 / 11

Example Let f be a function (program) from TChar that takes a prediction position t in a text x and returns a context to predict with. Say x = Dogs are th t For example, say f ( t , x ) = x s if x t − 1 is whitespace else x t − 2 x t − 1 , where x s is the first character of the previous word. Then predict x t using distribution P ( x t | f ( t , x )). Pavol Bielik, Veselin Raychev, Martin Vechev ( Department of Computer Science, ETH Zurich, Switzerland) Program Synthesis for Character Level Language Modelling CSC2547, Winter 2018 4 / 11

Example Let f be a function (program) from TChar that takes a prediction position t in a text x and returns a context to predict with. Say x = Dogs are th t For example, say f ( t , x ) = x s if x t − 1 is whitespace else x t − 2 x t − 1 , where x s is the first character of the previous word. Then predict x t using distribution P ( x t | f ( t , x )). This is just a trigram language model with special behavior for starting characters! Pavol Bielik, Veselin Raychev, Martin Vechev ( Department of Computer Science, ETH Zurich, Switzerland) Program Synthesis for Character Level Language Modelling CSC2547, Winter 2018 4 / 11

Building Blocks SimpleProgram: Use Move and Write instructions to condition the prediction (1), update the program state (2), or determine which branch to choose (3). (e.g., LEFT WRITE CHAR LEFT WRITE CHAR provides context for trigram language model). Pavol Bielik, Veselin Raychev, Martin Vechev ( Department of Computer Science, ETH Zurich, Switzerland) Program Synthesis for Character Level Language Modelling CSC2547, Winter 2018 5 / 11

Building Blocks SimpleProgram: Use Move and Write instructions to condition the prediction (1), update the program state (2), or determine which branch to choose (3). (e.g., LEFT WRITE CHAR LEFT WRITE CHAR provides context for trigram language model). SwitchProgram: Use switch statements to conditionally select appropriate subprograms (e.g., use switch LEFT WRITE CHAR ) to separately handle newline, tabs, special characters, and upper-case characters.) Pavol Bielik, Veselin Raychev, Martin Vechev ( Department of Computer Science, ETH Zurich, Switzerland) Program Synthesis for Character Level Language Modelling CSC2547, Winter 2018 5 / 11

Building Blocks SimpleProgram: Use Move and Write instructions to condition the prediction (1), update the program state (2), or determine which branch to choose (3). (e.g., LEFT WRITE CHAR LEFT WRITE CHAR provides context for trigram language model). SwitchProgram: Use switch statements to conditionally select appropriate subprograms (e.g., use switch LEFT WRITE CHAR ) to separately handle newline, tabs, special characters, and upper-case characters.) StateProgram : Update the current state and determine which program to execute next based on current state (e.g., use LEFT WRITE CHAR LEFT WRITE CHAR that updates state on */ to handle comments separately). Pavol Bielik, Veselin Raychev, Martin Vechev ( Department of Computer Science, ETH Zurich, Switzerland) Program Synthesis for Character Level Language Modelling CSC2547, Winter 2018 5 / 11

Learning Given a validation set D and regularization penalty Ω , the learning process is to find a program p ∗ ∈ TChar : p ∗ = arg min p [ − log P ( p | D ) + λ · Ω ( p )] TChar consists of branches and SimplePrograms . Branches are synthesized use the ID3+ algorithm. SimplePrograms are synthesized with a combination of brute-force (for programs up to 5 instructions), genetic programming and MCMC methods. Pavol Bielik, Veselin Raychev, Martin Vechev ( Department of Computer Science, ETH Zurich, Switzerland) Program Synthesis for Character Level Language Modelling CSC2547, Winter 2018 6 / 11

Experiments Linux Kernel and Hutter Prize Wikipedia datasets are used for evaluation. Metrics used are bits-per-character (entropy of p ( x t | x < t ) and error rate (number of mistakes)). TChar model is compared to various n-gram models (4-, 7-, 10-, and 15-gram) and LSTMs of various sizes. Pavol Bielik, Veselin Raychev, Martin Vechev ( Department of Computer Science, ETH Zurich, Switzerland) Program Synthesis for Character Level Language Modelling CSC2547, Winter 2018 7 / 11

Experiments For Linux Kernel , TChar model reduces error rate of best baseline (15–gram model) by 35%, reduces BPC by 25%, and is several times faster to train and query than an LSTM! Pavol Bielik, Veselin Raychev, Martin Vechev ( Department of Computer Science, ETH Zurich, Switzerland) Program Synthesis for Character Level Language Modelling CSC2547, Winter 2018 8 / 11

Experiments TChar model is not as good on unstructured data: on Wikipedia , its error rate is roughly the same as for the Linux Kernel dataset, but it is outperformed here by LSTMs. Pavol Bielik, Veselin Raychev, Martin Vechev ( Department of Computer Science, ETH Zurich, Switzerland) Program Synthesis for Character Level Language Modelling CSC2547, Winter 2018 9 / 11

Advantages + Program f drawn from TChar can be read by humans; much more interpretable than weights of a neural network. + Calculating P ( x t | f ( t , x )) is e ffi cient: use a hashtable to look up how frequently x appears in the context of f ( t , x ). + TChar model outperforms LSTMs and n-gram models on structured data. Pavol Bielik, Veselin Raychev, Martin Vechev ( Department of Computer Science, ETH Zurich, Switzerland) Program Synthesis for Character Level Language Modelling CSC2547, Winter 2018 10 / 11

Disadvantages & Future Work – TChar model is outperformed by LSTMs on unstructured data. – TChar has limited expressiveness, unlike DNNs. – However, increasing the expressiveness of TChar can in theory make the synthesis problem intractable or even undecidable. Pavol Bielik, Veselin Raychev, Martin Vechev ( Department of Computer Science, ETH Zurich, Switzerland) Program Synthesis for Character Level Language Modelling CSC2547, Winter 2018 11 / 11

Program Synthesis for Character Level Language Modelling Pavol - PowerPoint PPT Presentation

Program Synthesis for Character Level Language Modelling Pavol Bielik Veselin Raychev Martin Vechev Department of Computer Science, ETH Zurich, Switzerland CSC2547, Winter 2018 Pavol Bielik, Veselin Raychev, Martin Vechev ( Department of

Design Elements Issue Task Force March 12, 2014 1 Historic Character 2 Historic Character 3

Curriculum on Character Development L1/A: Character in Leadership Character Development Agenda

Curriculum on Character Development Character in Leadership Character Development Agenda

Character-level Language Models With Word-level Learning Arvid Frydenlund March 16, 2018

SYNTHESIS OF SUPER SYNTHESIS OF SUPER NANOPOROUS SYNTHESIS OF SUPER SYNTHESIS OF

Character Eyes: Seeing Language through Character-Level Taggers Yuval Pinter Marc Marone Jacob

From Program Synthesis to Optimal Program . . . Optimal Program Synthesis Logical Interpretation

Character Education at Character Education at Northampton Academy An Academy of Character and

CANTERBURY TALES: POWERPOINT CHARACTER PRESENTATION CHARACTER PRESENTER PHYSICAL CHARACTER

- Character set - Character escape conventions - Canonical form - Line editing conventions

Strings II Review Strings are stored character by character. Can access each character

Total Synthesis of the Polycyclic Total Synthesis of the Polycyclic Total Synthesis of the

Chemical Synthesis Techniques Chemical Synthesis Techniques Chemical Synthesis Techniques

Text-to-Speech Synthesis Bernd Mbius Language Science and Technology Saarland University

Modelling and Synthesis of User Interfaces for Complex, Web-Based Modelling Environments Jacob

Synthesis of Ranking Functions and Synthesis of Inductive Invariants and Synthesis of

An approach to modeling short messages in spatio- temporal networks Amosse EDOUARD, PhD student

Logic in Computer Science, Artificial Intelligence and Multi-agent Systems Introduction Valentin

Reasoning with Names Ian Stark Laboratory for Foundations of Computer Science School of

Online Event Recognition from Moving Vehicles E Tsilionis 1 , N Koutroumanis 2 , P Nikitopoulos 2

Some New Scripts for the Wrapper Volker RW Schaa Gesellschaft fr Schwerionenforschung mbH

#6: Strings and Lists SAMS SENIOR CS TRACK Last Time Used control flow to change the actions a

Word Tutorial 3 Creating a Multiple- Page Report COMPREHENSIVE Objectives XP XP Format

Compiler Construction Lecture 2: Lexical Analysis I (Introduction) Thomas Noll Lehrstuhl f ur