Machine Learning Guy Van den Broeck UCSD May 14, 2018 Overview - PowerPoint PPT Presentation

Combinatorial Objects: Rankings rank sushi rank sushi A ij item i at position j 1 fatty tuna 1 shrimp (n items require n 2 2 sea urchin 2 sea urchin Boolean variables) 3 salmon roe 3 salmon roe 4 shrimp 4 fatty tuna 5 tuna 5 tuna 6 squid 6 squid 7 tuna roll 7 tuna roll 8 see eel 8 see eel 9 egg 9 egg cucumber cucumber 10 10 roll roll

Combinatorial Objects: Rankings rank sushi rank sushi A ij item i at position j 1 fatty tuna 1 shrimp (n items require n 2 2 sea urchin 2 sea urchin Boolean variables) 3 salmon roe 3 salmon roe 4 shrimp 4 fatty tuna An item may be assigned 5 tuna 5 tuna to more than one position 6 squid 6 squid 7 tuna roll 7 tuna roll A position may contain 8 see eel 8 see eel more than one item 9 egg 9 egg cucumber cucumber 10 10 roll roll

Encoding Rankings in Logic A ij : item i at position j pos 1 pos 2 pos 3 pos 4 item 1 A 11 A 12 A 13 A 14 item 2 A 21 A 22 A 23 A 24 item 3 A 31 A 32 A 33 A 34 item 4 A 41 A 42 A 43 A 44

Encoding Rankings in Logic A ij : item i at position j constraint: each item i assigned to a unique position ( n constraints) pos 1 pos 2 pos 3 pos 4 item 1 A 11 A 12 A 13 A 14 item 2 A 21 A 22 A 23 A 24 item 3 A 31 A 32 A 33 A 34 item 4 A 41 A 42 A 43 A 44

Encoding Rankings in Logic A ij : item i at position j constraint: each item i assigned to a unique position ( n constraints) pos 1 pos 2 pos 3 pos 4 item 1 A 11 A 12 A 13 A 14 item 2 A 21 A 22 A 23 A 24 constraint: each position j assigned item 3 A 31 A 32 A 33 A 34 a unique item ( n constraints) item 4 A 41 A 42 A 43 A 44

Structured Space for Paths cf. Nature paper

Structured Space for Paths cf. Nature paper Good variable assignment (represents route) 184

Structured Space for Paths cf. Nature paper Good variable assignment Bad variable assignment (represents route) (does not represent route) 184 16,777,032

Structured Space for Paths cf. Nature paper Good variable assignment Bad variable assignment (represents route) (does not represent route) 184 16,777,032 Space easily encoded in logical constraints  [Nishino et al.]

Structured Space for Paths cf. Nature paper Good variable assignment Bad variable assignment (represents route) (does not represent route) 184 16,777,032 Space easily encoded in logical constraints  [Nishino et al.] Unstructured probability space: 184+16,777,032 = 2 24

Logical Circuits

Logical Circuits  L K L  P A  P   L   P  A  L  K L   P  L P P K  K A  A A  A

Property: Decomposability  L K L  P A  P   L   P  A  L  K L   P  L P P K  K A  A A  A

Property: Decomposability  L K L  P A  P   L   P  A  L  K L   P  L P P K  K A  A A  A Property: AND gates have disjoint input circuits

Property: Determinism L ⊥ ¬ P ⊥ ¬ L ⊥ L ⊥ ¬ P ⊥ ¬ L K P A ¬ P ¬ A ¬ L ¬ K P L P K ¬ K A ¬ A A ¬ A Input: L, K, P, A are true and ¬L, ¬K, ¬P, ¬A are false

Property: Determinism L ⊥ ¬ P ⊥ ¬ L ⊥ L ⊥ ¬ P ⊥ ¬ L K P A ¬ P ¬ A ¬ L ¬ K P L P K ¬ K A ¬ A A ¬ A Input: L, K, P, A are true and ¬L, ¬K, ¬P, ¬A are false Property: OR gates have at most one true input wire

Tractable for Logical Inference • Is structured space empty? (SAT) • Count size of structured space (#SAT) • Check equivalence of spaces

Tractable for Logical Inference • Is structured space empty? (SAT) • Count size of structured space (#SAT) • Check equivalence of spaces • Algorithms linear in circuit size  (pass up, pass down, similar to backprop)

Tractable for Logical Inference • Is structured space empty? (SAT) • Count size of structured space (#SAT) • Check equivalence of spaces • Algorithms linear in circuit size  (pass up, pass down, similar to backprop) • Compilation by exhaustive SAT solvers

Semantic Loss for Deep Learning

+ Data Constraints Deep Structured (Background Knowledge) (Physics) Output Prediction Deep Neural Learn Network

+ Data Constraints Deep Structured (Background Knowledge) (Physics) Output Prediction Deep Neural Learn Network Neural Network Logical Constraint Output Input

Semantic Loss

Semantic Loss • Output is probability vector p , not logic! How close is output to satisfying constraint?

Semantic Loss • Output is probability vector p , not logic! How close is output to satisfying constraint? • Answer: Semantic loss function L( α , p )

Semantic Loss • Output is probability vector p , not logic! How close is output to satisfying constraint? • Answer: Semantic loss function L( α , p ) • Axioms, for example:

Semantic Loss • Output is probability vector p , not logic! How close is output to satisfying constraint? • Answer: Semantic loss function L( α , p ) • Axioms, for example: – If p is Boolean then L( p,p ) = 0

Semantic Loss • Output is probability vector p , not logic! How close is output to satisfying constraint? • Answer: Semantic loss function L( α , p ) • Axioms, for example: – If p is Boolean then L( p,p ) = 0 – If α implies β then L( α , p ) ≥ L(β , p )

Semantic Loss • Output is probability vector p , not logic! How close is output to satisfying constraint? • Answer: Semantic loss function L( α , p ) • Axioms, for example: – If p is Boolean then L( p,p ) = 0 – If α implies β then L( α , p ) ≥ L(β , p ) • Properties:

Semantic Loss • Output is probability vector p , not logic! How close is output to satisfying constraint? • Answer: Semantic loss function L( α , p ) • Axioms, for example: – If p is Boolean then L( p,p ) = 0 – If α implies β then L( α , p ) ≥ L(β , p ) • Properties: – If α is equivalent to β then L( α , p ) = L( β , p )

Semantic Loss • Output is probability vector p , not logic! How close is output to satisfying constraint? • Answer: Semantic loss function L( α , p ) • Axioms, for example: – If p is Boolean then L( p,p ) = 0 – If α implies β then L( α , p ) ≥ L(β , p ) • Properties: – If α is equivalent to β then L( α , p ) = L( β , p ) – If p is Boolean and satisfies α then L( α , p ) = 0

Semantic Loss • Output is probability vector p , not logic! How close is output to satisfying constraint? • Answer: Semantic loss function L( α , p ) • Axioms, for example: – If p is Boolean then L( p,p ) = 0 – If α implies β then L( α , p ) ≥ L(β , p ) SEMANTIC • Properties: Loss! – If α is equivalent to β then L( α , p ) = L( β , p ) – If p is Boolean and satisfies α then L( α , p ) = 0

Semantic Loss: Definition Theorem: Axioms imply unique semantic loss: Probability of getting x after flipping coins with prob. p Probability of satisfying α after flipping coins with prob. p

How to Compute Semantic Loss?

How to Compute Semantic Loss? • In general: #P-hard 

How to Compute Semantic Loss? • In general: #P-hard  • With a logical circuit for α : Linear!

How to Compute Semantic Loss? • In general: #P-hard  • With a logical circuit for α : Linear! 𝒚 𝟐 ∨ 𝒚 𝟑 ∨ 𝒚 𝟒 ¬𝒚 𝟐 ∨ ¬𝒚 𝟑 • Example: exactly-one constraint: ¬𝒚 𝟑 ∨ ¬𝒚 𝟒 ¬𝒚 𝟐 ∨ ¬𝒚 𝟒

How to Compute Semantic Loss? • In general: #P-hard  • With a logical circuit for α : Linear! 𝒚 𝟐 ∨ 𝒚 𝟑 ∨ 𝒚 𝟒 ¬𝒚 𝟐 ∨ ¬𝒚 𝟑 • Example: exactly-one constraint: ¬𝒚 𝟑 ∨ ¬𝒚 𝟒 ¬𝒚 𝟐 ∨ ¬𝒚 𝟒 L( α , p ) = L( , p )

How to Compute Semantic Loss? • In general: #P-hard  • With a logical circuit for α : Linear! 𝒚 𝟐 ∨ 𝒚 𝟑 ∨ 𝒚 𝟒 ¬𝒚 𝟐 ∨ ¬𝒚 𝟑 • Example: exactly-one constraint: ¬𝒚 𝟑 ∨ ¬𝒚 𝟒 ¬𝒚 𝟐 ∨ ¬𝒚 𝟒 L( α , p ) = L( , p ) = - log( )

How to Compute Semantic Loss? • In general: #P-hard  • With a logical circuit for α : Linear! 𝒚 𝟐 ∨ 𝒚 𝟑 ∨ 𝒚 𝟒 ¬𝒚 𝟐 ∨ ¬𝒚 𝟑 • Example: exactly-one constraint: ¬𝒚 𝟑 ∨ ¬𝒚 𝟒 ¬𝒚 𝟐 ∨ ¬𝒚 𝟒 L( α , p ) = L( , p ) = - log( ) • Why? Decomposability and determinism!

Supervised Learning • Predict shortest paths • Add semantic loss to objective

Supervised Learning • Predict shortest paths • Add semantic loss to objective Is output a path?

Supervised Learning • Predict shortest paths • Add semantic loss to objective Is output Does output a path? have true edges?

Supervised Learning • Predict shortest paths • Add semantic loss to objective Is output Is output Does output a path? the true path? have true edges?

rank sushi 1 fatty tuna Supervised Learning 2 sea urchin 3 salmon roe 4 shrimp 5 tuna • Predict sushi preferences 6 squid 7 tuna roll • Add semantic loss to objective 8 see eel 9 egg 10 cucumber roll

rank sushi 1 fatty tuna Supervised Learning 2 sea urchin 3 salmon roe 4 shrimp 5 tuna • Predict sushi preferences 6 squid 7 tuna roll • Add semantic loss to objective 8 see eel 9 egg 10 cucumber roll Is output a ranking?

rank sushi 1 fatty tuna Supervised Learning 2 sea urchin 3 salmon roe 4 shrimp 5 tuna • Predict sushi preferences 6 squid 7 tuna roll • Add semantic loss to objective 8 see eel 9 egg 10 cucumber roll Is output Does output a ranking? correctly rank individual sushis?

rank sushi 1 fatty tuna Supervised Learning 2 sea urchin 3 salmon roe 4 shrimp 5 tuna • Predict sushi preferences 6 squid 7 tuna roll • Add semantic loss to objective 8 see eel 9 egg 10 cucumber roll Is output Is output Does output a ranking? the true ranking? correctly rank individual sushis?

Semi-Supervised Learning • Unlabeled data must have some label

Semi-Supervised Learning • Unlabeled data must have some label • Low semantic loss with exactly-one constraint

FASHION

CIFAR10

Semantic Loss Conclusions • Cares about meaning not syntax • Elegant axiomatic approach

Semantic Loss Conclusions • Cares about meaning not syntax • Elegant axiomatic approach • If you have complex output constraints Use logical circuits to enforce them

Semantic Loss Conclusions • Cares about meaning not syntax • Elegant axiomatic approach • If you have complex output constraints Use logical circuits to enforce them If you have unlabeled data (no constraints) Get a lot of signal by minimizing semantic loss of exactly-one

Probabilistic Circuits

Logical Circuits  L K L  P A  P   L   P  A  L  K L   P  P L P K  K A  A A  A

PSDD: Probabilistic SDD 0.1 0.6 0.3 1 0 1 0 0.6 1 0 1 0 1 0 0.4 L ⊥ P A ¬ P ⊥ ¬ L ⊥ L ⊥ ¬ P ⊥ ¬ L K ¬ L ¬ K L ¬ P ¬ A P P 0.8 0.2 0.25 0.75 0.9 0.1 K ¬ K A ¬ A A ¬ A

PSDD: Probabilistic SDD 0.1 0.6 0.3 0.6 1 0 1 0 1 0 0.4 1 0 1 0 L ⊥ ¬ P ⊥ L ⊥ ¬ P ⊥ ¬ L ⊥ ¬ L K P A L ¬ P ¬ A P ¬ L ¬ K P Input: L, K, P, A 0.8 0.2 0.75 0.25 0.9 0.1 are true K ¬ K A ¬ A A ¬ A

Pr( L,K,P,A ) PSDD: = 0.3 x 1 Probabilistic SDD x 0.8 x 0.4 x 0.25 = 0.024 0.1 0.6 0.3 0.6 1 0 1 0 1 0 0.4 1 0 1 0 L ⊥ ¬ P ⊥ L ⊥ ¬ P ⊥ ¬ L ⊥ ¬ L K P A L ¬ P ¬ A P ¬ L ¬ K P Input: L, K, P, A 0.8 0.2 0.75 0.25 0.9 0.1 are true K ¬ K A ¬ A A ¬ A

PSDD nodes induce a normalized distribution! 0.1 0.6 0.3 1 0 1 0 1 0 0.6 0.4 1 0 1 0  L K L  P A  P   L  K L   P   L   P  A L P P 0.8 0.2 0.25 0.75 0.9 0.1 A  A A  A A  A

PSDD nodes induce a normalized distribution! 0.1 0.6 0.3 1 0 1 0 1 0 0.6 0.4 1 0 1 0  L K L  P A  P   L  K L   P   L   P  A L P P 0.8 0.2 0.25 0.75 0.9 0.1 A  A A  A A  A Can read probabilistic independences off the circuit structure

Tractable for Probabilistic Inference • MAP inference : Find most-likely assignment (otherwise NP-complete) • Computing conditional probabilities Pr(x|y) (otherwise PP-complete) • Sample from Pr(x|y)

Machine Learning Guy Van den Broeck UCSD May 14, 2018 Overview - PowerPoint PPT Presentation

Probabilistic Circuits: A New Synthesis of Logic and Machine Learning Guy Van den Broeck UCSD May 14, 2018 Overview Statistical ML Probability Connectionism Deep Symbolic AI Logic Probabilistic Circuits References

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Image Comparison Techniques Sonal Mahajan and William G.J. Halfond Department of Computer Science

GARDEN COMPANIONS: Plants.with a purpose! Image source: COG PWW 2011, The Organic Backyard: A

COMPETENCY CENTER 10 years with EPAM. Development background. Leading initiatives development at

Ouverture Avere uno strumento per tradurre casi duso in test di integrazione garantisce: Un

A community of teachers for an active pedagogy in OR Grenoble, France Nadia Brauner, Hadrien

SemEval 2012 STS task http://www.cs.york.ac.uk/semeval-2012/task6/ Eneko Agirre Daniel Cer

San Francisco Garden Show: Creating Paradise ~ Gardening for connections to nature Saturday, April

st r t t tst