Probabilistic and Logistic Circuits: A New Synthesis of Logic and - PowerPoint PPT Presentation

Probabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning Guy Van den Broeck HRL/ACTIONS @ KR Oct 28, 2018

Foundation: Logical Circuit Languages

Negation Normal Form Circuits Δ = (sun ∧ rain ⇒ rainbow) [Darwiche 2002]

Decomposable Circuits Decomposable [Darwiche 2002]

Tractable for Logical Inference • Is there a solution? (SAT) ✓ – SAT( 𝛽 ∨ 𝛾 ) iff SAT( 𝛽 ) or SAT( 𝛾 ) ( always ) – SAT( 𝛽 ∧ 𝛾 ) iff SAT( 𝛽 ) and SAT( 𝛾 ) ( decomposable ) • How many solutions are there? (#SAT) • Complexity linear in circuit size 

Deterministic Circuits Deterministic [Darwiche 2002]

How many solutions are there? (#SAT)

How many solutions are there? (#SAT) Arithmetic Circuit

Tractable for Logical Inference • Is there a solution? (SAT) ✓ ✓ • How many solutions are there? (#SAT) • Stricter languages (e.g., BDD, SDD): ✓ – Equivalence checking ✓ – Conjoin/disjoint/negate circuits • Complexity linear in circuit size  • Compilation into circuit language by either – ↓ exhaustive SAT solver – ↑ conjoin/disjoin/negate

Learning with Logical Constraints

Motivation: Video [Lu, W. L., Ting, J. A., Little, J. J., & Murphy, K. P. (2013). Learning to track and identify players from broadcast sports videos.]

Motivation: Robotics [Wong, L. L., Kaelbling, L. P., & Lozano-Perez, T., Collision-free state estimation. ICRA 2012]

Motivation: Language • Non-local dependencies: At least one verb in each sentence • Sentence compression If a modifier is kept, its subject is also kept • Information extraction • Semantic role labeling … and many more! [Chang, M., Ratinov, L., & Roth, D. (2008). Constraints as prior knowledge],…, [ Chang, M. W., Ratinov, L., & Roth, D. (2012). Structured learning with constrained conditional models.], [https://en.wikipedia.org/wiki/Constrained_conditional_model]

Motivation: Deep Learning [Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska- Barwińska , A., et al.. (2016). Hybrid computing using a neural network with dynamic external memory. Nature , 538 (7626), 471-476.]

Running Example Courses: Data • Logic (L) • Knowledge Representation (K) • Probability (P) • Artificial Intelligence (A) Constraints • Must take at least one of Probability or Logic. • Probability is a prerequisite for AI. • The prerequisites for KR is either AI or Logic.

Structured Space unstructured structured L K P A L K P A 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 • Must take at least one of 0 0 1 0 0 0 1 0 Probability ( P ) or Logic ( L ). 0 0 1 1 0 0 1 1 • Probability is a prerequisite 0 1 0 0 0 1 0 0 for AI ( A ). 0 1 0 1 0 1 0 1 • 0 1 1 0 The prerequisites for KR ( K ) is 0 1 1 0 0 1 1 1 0 1 1 1 either AI or Logic. 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 7 out of 16 instantiations 1 0 1 1 1 0 1 1 are impossible 1 1 0 0 1 1 0 0 1 1 0 1 1 1 0 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1

Boolean Constraints unstructured structured L K P A L K P A 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 1 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 1 0 1 0 1 0 1 1 0 0 1 1 0 0 1 1 1 0 1 1 1 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 7 out of 16 instantiations 1 0 1 1 1 0 1 1 are impossible 1 1 0 0 1 1 0 0 1 1 0 1 1 1 0 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1

Learning in Structured Spaces + Data Constraints (Background Knowledge) (Physics) Learn ML Model Today‟s machine learning tools don‟t take knowledge as input! 

Deep Learning with Logical Constraints

Deep Learning + Data Constraints with Deep Neural Learn Logical Knowledge Network Neural Network Logical Constraint Output Input Output is probability vector p , not Boolean logic!

Semantic Loss Q: How close is output p to satisfying constraint? Answer: Semantic loss function L( α , p ) • Axioms, for example: – If p is Boolean then L( p,p ) = 0 – If α implies β then L( α , p ) ≥ L(β , p ) ( α more strict ) • Properties: SEMANTIC – If α is equivalent to β then L( α , p ) = L( β , p ) Loss! – If p is Boolean and satisfies α then L( α , p ) = 0

Semantic Loss: Definition Theorem: Axioms imply unique semantic loss: Probability of getting x after flipping coins with prob. p Probability of satisfying α after flipping coins with prob. p

Example: Exactly-One • Data must have some label We agree this must be one of the 10 digits: • Exactly-one constraint 𝒚 𝟐 ∨ 𝒚 𝟑 ∨ 𝒚 𝟒 ¬𝒚 𝟐 ∨ ¬𝒚 𝟑 → For 3 classes: ¬𝒚 𝟑 ∨ ¬𝒚 𝟒 • Semantic loss: ¬𝒚 𝟐 ∨ ¬𝒚 𝟒 Only 𝒚 𝒋 = 𝟐 after flipping coins Exactly one true 𝒚 after flipping coins

Semi-Supervised Learning • Intuition: Unlabeled data must have some label • Minimize exactly-one semantic loss on unlabeled data Train with 𝑓𝑦𝑗𝑡𝑢𝑗𝑜𝑕 𝑚𝑝𝑡𝑡 + 𝑥 ∙ 𝑡𝑓𝑛𝑏𝑜𝑢𝑗𝑑 𝑚𝑝𝑡𝑡

MNIST Experiment Competitive with state of the art in semi-supervised deep learning

FASHION Experiment Same conclusion on CIFAR10 Outperforms Ladder Nets!

What about real constraints? Paths cf. Nature paper Good variable assignment Bad variable assignment (does not represent route) (represents route) 184 16,777,032 Unstructured probability space: 184+16,777,032 = 2 24 Space easily encoded in logical constraints  [Nishino et al.]

How to Compute Semantic Loss? • In general: #P-hard  • With a logical circuit for α : Linear! • Example: exactly-one constraint: L( α , p ) = L( , p ) = - log( ) • Why? Decomposability and determinism!

Predict Shortest Paths Add semantic loss for path constraint Is output Is prediction Are individual a path? the shortest path? edge predictions This is the real task! correct? (same conclusion for predicting sushi preferences, see paper)

Probabilistic Circuits

Logical Circuits  L K L  P A  P   L   P  A  L  K L   P  L P P K  K A  A A  A Can we represent a distribution over the solutions to the constraint?

Recall: Decomposability  L K L  P A  P   L   P  A  L  K L   P  L P P K  K A  A A  A AND gates have disjoint input circuits

Recall: Determinism L ⊥ ¬ P ⊥ ¬ L ⊥ L ⊥ ¬ P ⊥ ¬ L K P A L ¬ P ¬ A P ¬ L ¬ K P K ¬ K A ¬ A A ¬ A Input: L, K, P, A are true and ¬L, ¬K, ¬P, ¬A are false Property: OR gates have at most one true input wire

PSDD: Probabilistic SDD 0.1 0.6 0.3 1 0 1 0 1 0 0.6 0.4 1 0 1 0 L ⊥ ¬ P ⊥ L ⊥ ¬ P ⊥ ¬ L ⊥ ¬ L K P A L ¬ P ¬ A P ¬ L ¬ K P 0.8 0.2 0.25 0.75 0.9 0.1 K ¬ K A ¬ A A ¬ A Syntax: assign a normalized probability to each OR gate input

PSDD: Probabilistic SDD 0.1 0.6 0.3 0.6 1 0 1 0 1 0 0.4 1 0 1 0 L ⊥ ¬ P ⊥ ¬ L ⊥ L ⊥ ¬ P ⊥ ¬ L K P A L ¬ P ¬ A P ¬ L ¬ K P Input: L, K, P, A 0.8 0.2 0.75 0.25 0.9 0.1 are true A ¬ A A ¬ A K ¬ K Pr( L,K,P,A ) = 0.3 x 1 x 0.8 x 0.4 x 0.25 = 0.024

Each node represents a normalized distribution! 0.1 0.6 0.3 1 0 1 0 1 0 0.6 0.4 1 0 1 0  L K L  P A  P   L  K L   P   L   P  A L P P 0.8 0.2 0.25 0.75 0.9 0.1 A  A A  A A  A Can read probabilistic independences off the circuit structure

Tractable for Probabilistic Inference • MAP inference : Find most-likely assignment to x given y (otherwise NP-hard) • Computing conditional probabilities Pr(x|y) (otherwise #P-hard) • Sample from Pr(x|y) • Algorithms linear in circuit size  (pass up, pass down, similar to backprop)

Parameters are Interpretable 0.1 0.6 0.3 Probability of course P given L 1 0 1 0 1 0 0.6 0.4 1 0 1 0  L  K L   P   L K L  P A  P   L   P  A L P P 0.8 0.2 0.25 0.75 0.9 0.1 K  K A  A A  A Student takes course P Student takes course L Explainable AI DARPA Program

Learning Probabilistic Circuit Parameters

Learning Algorithms • Closed form max likelihood from complete data • One pass over data to estimate Pr(x|y) Not a lot to say: very easy!  • Where does the structure come from? For now: simply compiled from constraint…

Combinatorial Objects: Rankings rank sushi rank sushi 1 fatty tuna 1 shrimp 10 items : 2 sea urchin 2 sea urchin 3,628,800 3 salmon roe 3 salmon roe rankings 4 shrimp 4 fatty tuna 5 tuna 5 tuna 20 items : 6 squid 6 squid 2,432,902,008,176,640,000 7 tuna roll 7 tuna roll rankings 8 see eel 8 see eel 9 egg 9 egg cucumber cucumber 10 10 roll roll

Combinatorial Objects: Rankings • Predict Boolean Variables: rank sushi A ij - item i at position j 1 fatty tuna 2 sea urchin • Constraints: 3 salmon roe 4 shrimp each item i assigned to 5 tuna a unique position ( n constraints) 6 squid 7 tuna roll 8 see eel each position j assigned 9 egg a unique item ( n constraints) cucumber 10 roll

Probabilistic and Logistic Circuits: A New Synthesis of Logic and - PowerPoint PPT Presentation

Probabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning Guy Van den Broeck HRL/ACTIONS @ KR Oct 28, 2018 Foundation: Logical Circuit Languages Negation Normal Form Circuits = (sun rain rainbow)

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Probabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning Guy Van den

2015 Schield Logistic MLE1A Excel2013 10/29/2015 V0D V0D V0D 2015 Schield Logistic MLE 1A

2015 Schield Logistic MLE1C Excel2013 8/18/2016 V0D V0D V0D 2015 Schield Logistic MLE 1C

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Excel2013: Model Logistic MLE 1Y1X Sept 2015 V1A V1A V1A Excel2013 Model Logistic MLE 1Y1X

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Probabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning Guy Van den

From Probabilistic Circuits to Probabilistic Programs and Back Guy Van den Broeck PROBPROG - Oct

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Lecture 14: Boolean Circuits I Arijit Bishnu 17.04.2010 Introduction Boolean Circuits and P

Stevenage Circuits Group Incorporating: Stevenage Circuits Tru-Lon Printed Circuits March 2011

Outline Rollbacks idiosyncrasies and remedies CSCI 8220 Error Handling Dynamic Memory

How Software Projects Fail Software Failures Software appears, by its nature, to be difficult to

P1 P1 ORI ORIEN ENTATIO TION 2020 N 2020 1 OB OBJEC JECTIVES TIVES Welcome parents to

Results of stable Collision Data Merve Ince Emine Gurpinar, Yalcin Guler Shuichi Kunori, Isa

SPRITZ_PLAYGROUND spritzers - CTF team spritzers play Capture The Flag competitions (not these)

textual economy in generation the SPUD system Magdalena Wolska 9 czerwca 2005 PTT 2005 SPUD

Philippine Geothermal Status Drilled a total of approximately 800 exploratory and production

B7 One of the most competitive blocks in Round 2.1 with Ehcatl 5 bidders. Huge hydrocarbon