Neural Symbolic Machines Semantic Parsing on Freebase with Weak - PowerPoint PPT Presentation

Neural Symbolic Machines Semantic Parsing on Freebase with Weak Supervision Chen Liang, Jonathan Berant, Quoc Le, Kenneth Forbus, Ni Lao

Overview ● Motivation: Semantic Parsing and Program Induction ● Neural Symbolic Machines ○ Key-Variable Memory ○ Code Assistance ○ Augmented REINFORCE ● Experiments and analysis

Semantic Parsing: Language to Programs Answer T T N N E E T T A A L L Program / Natural Language Question/Instruction Logical Form Goal Full supervision (hard to collect) Weak supervision (easy to collect) [Berant, et al 2013; Liang 2013]

Question Answering with Knowledge Base GO Largest city in US? (Hop V1 CityIn) NYC (Argmax V2 Population) RETURN Freebase, DBpedia, YAGO , NELL 1. Compositionality 2. Large Search Space Freebase: 23K predicates, 82M entities, 417M triplets

WebQuestionsSP Dataset 5,810 questions Google Suggest API & Amazon MTurk 1 ● Remove invalid QA pairs 2 ● ● 3,098 training examples, 1,639 testing examples remaining ● Open-domain, and contains grammatical error ● Multiple entities as answer => macro-averaged F1 Multiple entities Grammatical error • What do Michelle Obama do for a living? writer, lawyer • What character did Natalie Portman play in Star Wars? Padme Amidala • What currency do you use in Costa Rica? Costa Rican colon • What did Obama study in school? political science • What killed Sammy Davis Jr? throat cancer [Berant et al, 2013; Yih et al, 2016]

(Scalable) Neural Program Induction ● The learned operations are not as ● Impressive works to show NN can scalable and precise. learn addition and sorting, but... [Reed & Freitas 2015] ● Why not use existing modules that are scalable, precise and interpretable? [Zaremba & Sutskever 2016]

Neural Symbolic Machines Weak Neural Symbolic supervision Knowledge Base Question Program Manager Programmer Computer Answer Predefined Output Functions Abstract Scalable Precise Non-differentiable

Simple Seq2Seq model is not enough ) Return Population ( Hop R2 R0 ) ( Argmax !CityIn Argmax Population ) R1 ) Largest city in US GO ( ( Hop !CityIn R0 1. Compositionality 2. Large Search Space 23K predicates, 82M entities, 417M triplets 2.Code Assistance 1.Key-Variable Memory 3.Augmented REINFORCE

Key-Variable Memory for Compositionality m.NYC Key Variable Key Variable Key Variable Execute Execute Execute ( Argmax R2 Population ) ( Hop R1 !CityIn ) v 1 R1(m.USA) v 1 R1(m.USA) ... ... Return v 2 R2(list of US cities) v 3 R3(m.NYC) Entity Resolver ) Return Population ( Hop ( R2 R1 ) Argmax !CityIn Argmax Population ) R2 ) Largest city in US ( GO ( !CityIn Hop R1 ● A linearised bottom-up derivation of the recursive program.

Key-Variable Memory: Save Intermediate Value Key Variable Value (Embedding) (Symbol) (Data in Computer) V 0 R0 m.USA V 1 R1 [m.SF, m.NYC, ...] Expression is finished. Result ( Hop R0 !CityIn ) Computer Execution GO ( Hop R0 !CityIn

Key-Variable Memory: Reuse Intermediate Value Key Variable Value (Embedding) (Symbol) (Data in Computer) V 0 R0 m.USA Softmax V 1 R1 [m.SF, m.NYC, ...] Neural Symbolic ) ( Argmax ) Argmax ( !CityIn

Code Assistance: Prune Search Space Pen and paper IDE

Code Assistance: Syntactic Constraint Decoder Vocab V 0 R0 V 1 R1 Variables: <10 ... ... E 0 Hop Softmax E 1 Argmax Functions: <10 ... ... P 0 CityIn ( Predicates: 23K P 1 BornIn ... ... GO (

Code Assistance: Syntactic Constraint Decoder Vocab Last token is ‘(’, so V 0 R0 has to output a V 1 R1 Variables: <10 function name next. ... ... E 0 Hop Softmax E 1 Argmax Functions: <10 ... ... P 0 CityIn ( Predicates: 23K P 1 BornIn ... ... GO (

Code Assistance: Semantic Constraint Decoder Vocab V 0 R0 V 1 R1 Variables: <10 ... ... E 0 Hop Softmax E 1 Argmax Functions: <10 ... ... P 0 CityIn ( Hop R0 Predicates: 23K P 1 BornIn ... ... GO ( Hop R0

Code Assistance: Semantic Constraint Decoder Vocab Given definition of Hop , need to output V 0 R0 a predicate that is V 1 R1 Variables: <10 connected to R2 ... ... ( m.USA ). E 0 Hop Softmax E 1 Argmax Functions: <10 ... ... P 0 CityIn ( Hop R0 Valid Predicates: Predicates: 23K P 1 BornIn <100 ... ... GO ( Hop R0

REINFORCE Training Samples Policy gradient Sampling update Updated 1. High variance 2. Cold start problem Requires a lot of Model Without supervised (expensive) samples pretraining, the gradients at the beginning

Iterative Maximum Likelihood Training (Hard EM) Approximate Gold Programs Maximum likelihood Beam search update Updated 2.Lack of negative examples 1.Spurious program Model Mistake SibilingsOf for Mistake PlaceOfBirth ParentsOf . for PlaceOfDeath .

Augmented REINFORCE (1 − α) Top k in beam Approximate α Gold Programs Policy gradient Beam search update 2. Mix in approximate gold 1.Reduce variance Updated programs to bootstrap and at the cost of bias Model stabilize training

Distributed Architecture ● 200 actors, 1 learner, 50 Knowledge Graph servers Actor 1 QA pairs 1 Solutions 1 KG server 1 Actor 2 QA pairs 2 Solutions 2 …... Learner …... …... …... KG server m Actor n QA pairs n Solutions n Model checkpoint

Generated Programs ● Question : “what college did russell wilson go to?” ● Generated program : (hop v1 /people/person/education) (hop v2 /education/education/institution) (filter v3 v0 /common/topic/notable_types ) <EOP> In which v0 = “College/University” (m.01y2hnl) v1 = “Russell Wilson” (m.05c10yf) ● Distribution of the length of generated programs

New State-of-the-Art on WebQuestionsSP ● First end-to-end neural network to achieve SOTA on semantic parsing with weak supervision over large knowledge base ● The performance is approaching SOTA with full supervision

Augmented REINFORCE ● REINFORCE get stuck at local maxima ● Iterative ML training is not directly optimizing the F1 score ● Augmented REINFORCE obtains the best performances

Weak Symbolic Neural supervision Knowledge Base Question Programs Manager Programmer Computer Predefined Answer Outputs Functions Key-Variable Code Augmented Memory Assistance REINFORCE Thanks!

Backup Slides

Semantic Parsing as Program Induction Learning classifiers Learning programs Semantic parsing: learning to write programs (given natural language instructions/questions) [Graves et al, 2016; Silicon Valley, Season 4]

Related Topic: Neural Program Induction Learning classifiers Learning programs Semantic parsing: learning to write programs (given natural language instructions/questions) [Graves et al, 2016; Silicon Valley, Season 4]

Iterative Maximum Likelihood Training Approximate Gold Programs Maximum Reward-Augmented Beam Search Likelihood Model 1.Spurious program 2.Lack of negative examples Mistake Mistake SibilingsOf for PlaceOfBirth ParentsOf . for PlaceOfDeath .

Key-Variable Memory: Reuse Intermediate Value Key Variable Value (Embedding) (Symbol) (Data in Computer) V 0 R0 m.USA Softmax V 1 R1 [m.SF, m.NYC, ...] ) ( Argmax ) Argmax ( !CityIn

Generated Programs ● Question : “what college did russell wilson go to?” ● Generated program : (hop v1 /people/person/education) (hop v2 /education/education/institution) (filter v3 v0 /common/topic/notable_types ) <EOP> In which v0 = “College/University” (m.01y2hnl) v1 = “Russell Wilson” (m.05c10yf) ● Distribution of the length of generated programs

REINFORCE 1. High variance Requires a lot of (expensive) samples Repeat Sampling Learner Actor Policy gradient Samples 2. Bootstrap problem Small gradients at the beginning

Iterative Maximum Likelihood Training 1.Spurious program Repeat Mistake PlaceOfBirth for PlaceOfDeath . Reward-Augmented Beam Search Learner Actor Maximum Likelihood Approximate Gold Programs 2.Lack of negative examples Mistake SibilingsOf for ParentsOf .

Neural Symbolic Machines Semantic Parsing on Freebase with Weak - PowerPoint PPT Presentation

Neural Symbolic Machines Semantic Parsing on Freebase with Weak Supervision Chen Liang, Jonathan Berant, Quoc Le, Kenneth Forbus, Ni Lao Overview Motivation: Semantic Parsing and Program Induction Neural Symbolic Machines Key-Variable

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Neural-Symbolic Integration Strategies Neural-Symbolic Integration Unification Hybrid

Decidability Decidability and Symbolic Symbolic Verification Symbolic Symbolic Verification

Closed Loop Neural-Symbolic Learning via Integrating Neural Perception, Grammar Parsing, and

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Finite State Machines (FSM) Chapter 8 State Machines Introduction State Machines Mealy and

Neural-Symbolic Systems for Human-like Computing Artur dAvila Garcez City, University of

20 Advanced Topics 2: Hybrid Neural-symbolic Models In the previous chapters, we learned about

Finite State Machines (FSM) AKA Finite State Automat on State Machines Introduction State

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Hierarchical Exact Symbolic Analysis y y of Large Analog Integrated Circuits By Symbolic Stamps

Lazy Heap Analysis with Symbolic Memory Graphs Alexander Driemeyer Outline 1. Motivation 2.

Symbolic data analysis Symbolic data analysis Clustering of large data sets of mixed units

Block-sw itched Netw orks: A New Paradigm for W ireless Transport Ming Li, Devesh Agraw al,

Conference Call #: (309) 944-9515 Password: 8468690 The Dude! Yo-Yo Ma Expressing the

Objectives Attendees will be able to: Describe at least 3 behavioral economics techniques to

Searching Consider the problem of searching an array for a given value Hashing If the

Turtle: Safe and Private Data Sharing Bogdan C. Popescu Petr Matejka Bruno Crispo Andrew S.

States on a (Data) Plane Jennifer Rexford Traditional data planes are stateless 1 Software

Designing Autonomic Wireless Multi-hop Networks for Delay-Sensitive Applications Peter Hsien-Po

Routing An Engineering Approach to Computer Networking An Engineering Approach to Computer