Reasoning and Learning Guy Van den Broeck UC Berkeley EECS Feb 11, - PowerPoint PPT Presentation

Towards a New Synthesis of Reasoning and Learning Guy Van den Broeck UC Berkeley EECS Feb 11, 2019

Outline: Reasoning ∩ Learning 1. Deep Learning with Symbolic Knowledge 2. Probabilistic and Logistic Circuits 3. High-Level Probabilistic Reasoning

Deep Learning with Symbolic Knowledge R L

Motivation: Vision [Lu, W. L., Ting, J. A., Little, J. J., & Murphy, K. P. (2013). Learning to track and identify players from broadcast sports videos.]

Motivation: Robotics    [Wong, L. L., Kaelbling, L. P., & Lozano-Perez, T., Collision-free state estimation. ICRA 2012]

Motivation: Language • Non-local dependencies: “At least one verb in each sentence” • Sentence compression “If a modifier is kept, its subject is also kept” … and many more! [Chang, M., Ratinov, L., & Roth, D. (2008). Constraints as prior knowledge], [Ganchev, K., Gillenwater, J., & Taskar, B. (2010). Posterior regularization for structured latent variable models]

Motivation: Deep Learning [Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska- Barwińska , A., et al.. (2016). Hybrid computing using a neural network with dynamic external memory. Nature , 538 (7626), 471-476.]

Motivation: Deep Learning … but …  [Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska- Barwińska , A., et al.. (2016). Hybrid computing using a neural network with dynamic external memory. Nature , 538 (7626), 471-476.]

Learning with Symbolic Knowledge Data + Constraints (Background Knowledge) (Physics) 1. Must take at least one of Probability ( P ) or Logic ( L ). 2. Probability ( P ) is a prerequisite for AI ( A ). 3. The prerequisites for KR ( K ) is either AI ( A ) or Logic ( L ).

Learning with Symbolic Knowledge Data + Constraints (Background Knowledge) (Physics) Learn ML Model Today‟s machine learning tools don‟t take knowledge as input! 

Deep Learning + Data Constraints with Deep Neural Learn Symbolic Knowledge Network Neural Network Logical Constraint Output Input Output is probability vector p , not Boolean logic!

Semantic Loss Q: How close is output p to satisfying constraint α ? Answer: Semantic loss function L( α , p ) • Axioms, for example: – If p is Boolean then L( p,p ) = 0 – If α implies β then L( α , p ) ≥ L(β , p ) ( α more strict ) • Implied Properties: SEMANTIC Loss! – If α is equivalent to β then L( α , p ) = L( β , p ) – If p is Boolean and satisfies α then L( α , p ) = 0

Semantic Loss: Definition Theorem: Axioms imply unique semantic loss: Probability of getting state x after flipping coins with probabilities p Probability of satisfying α after flipping coins with probabilities p

Simple Example: Exactly-One • Data must have some label We agree this must be one of the 10 digits: • Exactly-one constraint 𝒚 𝟐 ∨ 𝒚 𝟑 ∨ 𝒚 𝟒 ¬𝒚 𝟐 ∨ ¬𝒚 𝟑 → For 3 classes: ¬𝒚 𝟑 ∨ ¬𝒚 𝟒 • Semantic loss: ¬𝒚 𝟐 ∨ ¬𝒚 𝟒 Only 𝒚 𝒋 = 𝟐 after flipping coins Exactly one true 𝒚 after flipping coins

Semi-Supervised Learning • Intuition: Unlabeled data must have some label Cf. entropy minimization, manifold learning • Minimize exactly-one semantic loss on unlabeled data Train with 𝑓𝑦𝑗𝑡𝑢𝑗𝑜𝑕 𝑚𝑝𝑡𝑡 + 𝑥 ∙ 𝑡𝑓𝑛𝑏𝑜𝑢𝑗𝑑 𝑚𝑝𝑡𝑡

MNIST Experiment Competitive with state of the art in semi-supervised deep learning

FASHION Experiment Outperforms Ladder Nets! Same conclusion on CIFAR10

But what about real constraints? • Path constraint cf. Nature paper vs . • Example: 4x4 grids 2 24 = 184 paths + 16,777,032 non-paths • Easily encoded as logical constraints  [Nishino et al., Choi et al.]

How to Compute Semantic Loss? • In general: #P-hard 

Reasoning Tool: Logical Circuits Representation of logical sentences: 𝐷 ∧ ¬𝐸 ∨ ¬𝐷 ∧ 𝐸 C XOR D

Reasoning Tool: Logical Circuits 1 Representation of 0 1 logical sentences: 1 1 0 1 Input: 0 1 0 1 0 1 1 0 0 0 1 1 0 Bottom-up Evaluation 1 0 1 0

Tractable for Logical Inference • Is there a solution? (SAT) – SAT( 𝛽 ∨ 𝛾 ) iff SAT( 𝛽 ) or SAT( 𝛾 ) ( always ) – SAT( 𝛽 ∧ 𝛾 ) iff ???

Decomposable Circuits Decomposable A B,C,D

Tractable for Logical Inference • Is there a solution? (SAT) ✓ – SAT( 𝛽 ∨ 𝛾 ) iff SAT( 𝛽 ) or SAT( 𝛾 ) ( always ) – SAT( 𝛽 ∧ 𝛾 ) iff SAT( 𝛽 ) and SAT( 𝛾 ) ( decomposable ) • How many solutions are there? (#SAT) • Complexity linear in circuit size 

Deterministic Circuits Deterministic C XOR D

Deterministic Circuits Deterministic C XOR D C ⇔ D

How many solutions are there? (#SAT) x 16 8 8 8 8 1 1 4 4 4 + 2 2 2 2 1 1 1 1 1 1 1 1 1 1

Tractable for Logical Inference • Is there a solution? (SAT) ✓ • How many solutions are there? (#SAT) ✓ … and much more … • Complexity linear in circuit size  • Compilation into circuit by – ↓ exhaustive SAT solver – ↑ conjoin/disjoin/negate [Darwiche and Marquis, JAIR 2002]

How to Compute Semantic Loss? • In general: #P-hard  • With a logical circuit for α : Linear  • Example: exactly-one constraint: L( α , p ) = L( , p ) = - log( ) • Why? Decomposability and determinism!

Predict Shortest Paths Add semantic loss for path constraint Is output Is prediction Are individual a path? the shortest path? edge predictions This is the real task! correct? (same conclusion for predicting sushi preferences, see paper)

Conclusions 1 • Knowledge is (hidden) everywhere in ML • Semantic loss makes logic differentiable • Performs well semi-supervised • Requires hard reasoning in general – Reasoning can be encapsulated in a circuit – No overhead during learning • Performs well on structured prediction • A little bit of reasoning goes a long way!

Probabilistic and Logistic Circuits R L

A False Dilemma? Classical AI Methods Neural Networks Hungry? $25? Restau Sleep? rant? … “Black Box” Clear Modeling Assumption Well-understood Empirical performance

Inspiration: Probabilistic Circuits Can we turn logic circuits into a statistical model ?

Probabilistic Circuits 𝐐𝐬(𝑩, 𝑪, 𝑫, 𝑬) = 𝟏. 𝟏𝟘𝟕 Probability on edges 0 .096 .8 x .3 .194 .096 1 0 Bottom-up evaluation .01 .24 0 (.1x1) + (.9x0) .3 0 .1 .8 Input: 0 0 1 0 1 0 1 0 1 0

Properties, Properties, Properties! • Read conditional independencies from structure • Interpretable parameters (XAI) (conditional probabilities of logical sentences) • Closed-form parameter learning • Efficient reasoning – MAP inference : most-likely assignment to x given y (otherwise NP-hard) – Computing conditional probabilities Pr(x|y) (otherwise #P-hard) – Algorithms linear in circuit size 

Discrete Density Estimation Q: “Help! I need to learn a discrete probability distribution…” A: Learn probabilistic circuits! Strongly outperforms • Bayesian network learners • Markov network learners Competitive with SPN learners LearnPSDD state of the art (State of the art for approximate on 6 datasets! inference in discrete factor graphs)

But what if I only want to classify Y? Pr 𝑍 𝐵, 𝐶, 𝐷, 𝐸) Pr(𝑍, 𝐵, 𝐶, 𝐷, 𝐸)

𝐐𝐬 𝒁 = 𝟐 𝑩, 𝑪, 𝑫, 𝑬) Logistic 𝟐 Circuits = 𝟐 + 𝒇𝒚𝒒(−𝟐. 𝟘) = 𝟏. 𝟗𝟕𝟘 Weights on edges Logistic function 0 1 on output weight Bottom-up evaluation Input: 1 0 1 0 1 0

Alternative Semantics Represents Pr 𝑍 𝐵, 𝐶, 𝐷, 𝐸 • Take all „hot‟ wires • Sum their weights • Push through logistic function

Special Case: Logistic Regression Logistic Regression 1 Pr 𝑍 = 1 𝐵, 𝐶, 𝐷, 𝐸 = 1 + ex p( − 𝐵 ∗ 𝜄 𝐵 − ¬𝐵 ∗ 𝜄 ¬𝐵 − 𝐶 ∗ 𝜄 𝐶 − ⋯ ) What about other logistic circuits in more general forms?

Parameter Learning Reduce to logistic regression: Features associated with each wire “Global Circuit Flow” features Learning parameters θ is convex optimization!

Logistic Circuit Structure Learning Generate Calculate candidate Gradient operations Variance Execute the best operation

Comparable Accuracy with Neural Nets

Significantly Smaller in Size

Better Data Efficiency

Logistic vs. Probabilistic Circuits Pr 𝑍 𝐵, 𝐶, 𝐷, 𝐸 Probabilities become log-odds Pr(𝑍, 𝐵, 𝐶, 𝐷, 𝐸)

Interpretable?

Conclusions 2 Statistical ML “Probability” Connectionism “Deep” Symbolic AI “Logic” Logistic Circuits

High-Level Probabilistic Inference R L

Simple Reasoning Problem ... ? 1/4 Probability that Card1 is Hearts? [Van den Broeck; AAAI- KRR‟15]

Automated Reasoning Let us automate this: 1. Probabilistic graphical model (e.g., factor graph) 2. Probabilistic inference algorithm (e.g., variable elimination or junction tree) [Van den Broeck; AAAI- KRR‟15]

Reasoning and Learning Guy Van den Broeck UC Berkeley EECS Feb 11, - PowerPoint PPT Presentation

Towards a New Synthesis of Reasoning and Learning Guy Van den Broeck UC Berkeley EECS Feb 11, 2019 Outline: Reasoning Learning 1. Deep Learning with Symbolic Knowledge 2. Probabilistic and Logistic Circuits 3. High-Level Probabilistic

Automated Reasoning Course Presentation Summary Automated Reasoning Motivations Course Plan

Evidential and Causal Reasoning Much reasoning in AI can be seen as evidential reasoning ,

CHAPTER-4 1 LOGIC AND REASONING ! Knowledge and ! Reasoning in Knowledge- Reasoning Based

SECTION 1: Introductions Code Reasoning Forward Reasoning CODE REASONING +

Probabilistic Reasoning; Probabilistic Reasoning; Network-based reasoning Network-based

Reasoning and Meta-reasoning Sonia Marin IT-University of Copenhagen, Denmark 85-211

Reasoning Skills Alicia Foy Gifted Specialist 3/21/19 1 www.FLDOE.org Objectives Student

Automated Reasoning: Some Successes and New Challenges Predrag Jani ci c

Surface Reasoning Lecture 1: Reasoning with Monotonicity Thomas Icard June 18-22, 2012 Thomas

Models for Inexact Reasoning Models for Inexact Reasoning Reasoning with Certainty Factors: The

Models for Inexact Reasoning Reasoning with Subjective Pseudo Reasoning with Subjective Pseudo

Reasoning and Learning Guy Van den Broeck Northeastern University April 22, 2019 Outline:

Principles of Knowledge Representation and Reasoning May 20 & 23, 2008 Nonmonotonic

Automated Reasoning for System Security and Privacy Laura Kovcs Chalmers Automated Reasoning

Logics for Data and Knowledge Representation 5. Reasoning in ALC Luciano Serafini FBK-irst,

Mathematical Logic Introduction to Reasoning and Automated Reasoning. Hilbert-style Propositional

Stone Representation Theorem for Boolean Algebras in the Topos of (Pre)Sheaves on a Monoid S.

Lecture Slides for MAT-60556 Part VII: Algebraic logic Henri Hansen October 9, 2014 1

CS 221 Lecture Tuesday, 13 September 2011 Todays Agenda 1. Announcements 2. Boolean

Another example of truth table Logic function: 3 inputs : A, B, C 3 outputs : D, E, F

(Pre-)Algebras for Linguistics 5. Prelattices Carl Pollard Linguistics 680: Formal Foundations

Monadic second order logic as a model companion Sam van Gool samvangool@me.com University of

Duality Theory in Logic Sumit Sourabh ILLC, Universiteit van Amsterdam Cool Logic 14th December

Boolean Algebra History In 1854 George Boole introduced systematic treatment of logic and

Reasoning and Learning Guy Van den Broeck UC Berkeley EECS Feb 11, - PowerPoint PPT Presentation

Towards a New Synthesis of Reasoning and Learning Guy Van den Broeck UC Berkeley EECS Feb 11, 2019 Outline: Reasoning Learning 1. Deep Learning with Symbolic Knowledge 2. Probabilistic and Logistic Circuits 3. High-Level Probabilistic

Automated Reasoning Course Presentation Summary Automated Reasoning Motivations Course Plan

Evidential and Causal Reasoning Much reasoning in AI can be seen as evidential reasoning ,

CHAPTER-4 1 LOGIC AND REASONING ! Knowledge and ! Reasoning in Knowledge- Reasoning Based

SECTION 1: Introductions Code Reasoning Forward Reasoning CODE REASONING +

Probabilistic Reasoning; Probabilistic Reasoning; Network-based reasoning Network-based

Reasoning and Meta-reasoning Sonia Marin IT-University of Copenhagen, Denmark 85-211

Reasoning Skills Alicia Foy Gifted Specialist 3/21/19 1 www.FLDOE.org Objectives Student

Automated Reasoning: Some Successes and New Challenges Predrag Jani ci c

Surface Reasoning Lecture 1: Reasoning with Monotonicity Thomas Icard June 18-22, 2012 Thomas

Models for Inexact Reasoning Models for Inexact Reasoning Reasoning with Certainty Factors: The

Models for Inexact Reasoning Reasoning with Subjective Pseudo Reasoning with Subjective Pseudo

Reasoning and Learning Guy Van den Broeck Northeastern University April 22, 2019 Outline:

Principles of Knowledge Representation and Reasoning May 20 &amp; 23, 2008 Nonmonotonic

Automated Reasoning for System Security and Privacy Laura Kovcs Chalmers Automated Reasoning

Logics for Data and Knowledge Representation 5. Reasoning in ALC Luciano Serafini FBK-irst,

Mathematical Logic Introduction to Reasoning and Automated Reasoning. Hilbert-style Propositional

Stone Representation Theorem for Boolean Algebras in the Topos of (Pre)Sheaves on a Monoid S.

Lecture Slides for MAT-60556 Part VII: Algebraic logic Henri Hansen October 9, 2014 1

CS 221 Lecture Tuesday, 13 September 2011 Todays Agenda 1. Announcements 2. Boolean

Another example of truth table Logic function: 3 inputs : A, B, C 3 outputs : D, E, F

(Pre-)Algebras for Linguistics 5. Prelattices Carl Pollard Linguistics 680: Formal Foundations

Monadic second order logic as a model companion Sam van Gool samvangool@me.com University of

Duality Theory in Logic Sumit Sourabh ILLC, Universiteit van Amsterdam Cool Logic 14th December

Boolean Algebra History In 1854 George Boole introduced systematic treatment of logic and

Principles of Knowledge Representation and Reasoning May 20 & 23, 2008 Nonmonotonic