End-to-End Differentiable Proving Tim Rockt aschel Whiteson - PowerPoint PPT Presentation

Outline 1 Link prediction & symbolic vs. neural representations 2 Regularize neural representations using logical rules Model-agnostic but slow (Rockt¨ aschel et al., 2015) Fast but restricted (Demeester et al., 2016) Tim Rockt¨ aschel End-to-End Differentiable Proving 6/30

Outline 1 Link prediction & symbolic vs. neural representations 2 Regularize neural representations using logical rules Model-agnostic but slow (Rockt¨ aschel et al., 2015) Fast but restricted (Demeester et al., 2016) Model-agnostic and fast (Minervini et al., 2017) Tim Rockt¨ aschel End-to-End Differentiable Proving 6/30

Outline 1 Link prediction & symbolic vs. neural representations 2 Regularize neural representations using logical rules Model-agnostic but slow (Rockt¨ aschel et al., 2015) Fast but restricted (Demeester et al., 2016) Model-agnostic and fast (Minervini et al., 2017) 3 End-to-end differentiable proving (Rockt¨ aschel and Riedel, 2017) Tim Rockt¨ aschel End-to-End Differentiable Proving 6/30

Outline 1 Link prediction & symbolic vs. neural representations 2 Regularize neural representations using logical rules Model-agnostic but slow (Rockt¨ aschel et al., 2015) Fast but restricted (Demeester et al., 2016) Model-agnostic and fast (Minervini et al., 2017) 3 End-to-end differentiable proving (Rockt¨ aschel and Riedel, 2017) Explicit multi-hop reasoning using neural networks Tim Rockt¨ aschel End-to-End Differentiable Proving 6/30

Outline 1 Link prediction & symbolic vs. neural representations 2 Regularize neural representations using logical rules Model-agnostic but slow (Rockt¨ aschel et al., 2015) Fast but restricted (Demeester et al., 2016) Model-agnostic and fast (Minervini et al., 2017) 3 End-to-end differentiable proving (Rockt¨ aschel and Riedel, 2017) Explicit multi-hop reasoning using neural networks Inducing rules using gradient descent Tim Rockt¨ aschel End-to-End Differentiable Proving 6/30

Outline 1 Link prediction & symbolic vs. neural representations 2 Regularize neural representations using logical rules Model-agnostic but slow (Rockt¨ aschel et al., 2015) Fast but restricted (Demeester et al., 2016) Model-agnostic and fast (Minervini et al., 2017) 3 End-to-end differentiable proving (Rockt¨ aschel and Riedel, 2017) Explicit multi-hop reasoning using neural networks Inducing rules using gradient descent 4 Outlook & Summary Tim Rockt¨ aschel End-to-End Differentiable Proving 6/30

Notation Constant : homer , bart , lisa etc. (lowercase) Tim Rockt¨ aschel End-to-End Differentiable Proving 7/30

Notation Constant : homer , bart , lisa etc. (lowercase) Variable : X , Y etc. (uppercase, universally quantified) Tim Rockt¨ aschel End-to-End Differentiable Proving 7/30

Notation Constant : homer , bart , lisa etc. (lowercase) Variable : X , Y etc. (uppercase, universally quantified) Term : constant or variable Restricted to function-free terms in this talk Tim Rockt¨ aschel End-to-End Differentiable Proving 7/30

Notation Constant : homer , bart , lisa etc. (lowercase) Variable : X , Y etc. (uppercase, universally quantified) Term : constant or variable Restricted to function-free terms in this talk Predicate : fatherOf , parentOf etc. function from terms to a Boolean Tim Rockt¨ aschel End-to-End Differentiable Proving 7/30

Notation Constant : homer , bart , lisa etc. (lowercase) Variable : X , Y etc. (uppercase, universally quantified) Term : constant or variable Restricted to function-free terms in this talk Predicate : fatherOf , parentOf etc. function from terms to a Boolean Atom : predicate and terms, e.g., parentOf ( X , bart ) Tim Rockt¨ aschel End-to-End Differentiable Proving 7/30

Notation Constant : homer , bart , lisa etc. (lowercase) Variable : X , Y etc. (uppercase, universally quantified) Term : constant or variable Restricted to function-free terms in this talk Predicate : fatherOf , parentOf etc. function from terms to a Boolean Atom : predicate and terms, e.g., parentOf ( X , bart ) Literal : atom or negated or atom, e.g., not parentOf ( bart , lisa ) Tim Rockt¨ aschel End-to-End Differentiable Proving 7/30

Notation Constant : homer , bart , lisa etc. (lowercase) Variable : X , Y etc. (uppercase, universally quantified) Term : constant or variable Restricted to function-free terms in this talk Predicate : fatherOf , parentOf etc. function from terms to a Boolean Atom : predicate and terms, e.g., parentOf ( X , bart ) Literal : atom or negated or atom, e.g., not parentOf ( bart , lisa ) Rule : head :– body . head : atom body : (possibly empty) list of literals representing conjunction Restricted to Horn clauses in this talk Tim Rockt¨ aschel End-to-End Differentiable Proving 7/30

Notation Constant : homer , bart , lisa etc. (lowercase) Variable : X , Y etc. (uppercase, universally quantified) Term : constant or variable Restricted to function-free terms in this talk Predicate : fatherOf , parentOf etc. function from terms to a Boolean Atom : predicate and terms, e.g., parentOf ( X , bart ) Literal : atom or negated or atom, e.g., not parentOf ( bart , lisa ) Rule : head :– body . head : atom body : (possibly empty) list of literals representing conjunction Restricted to Horn clauses in this talk Fact : ground rule (no free variables) with empty body, e.g., parentOf ( homer , bart ) . Tim Rockt¨ aschel End-to-End Differentiable Proving 7/30

Link Prediction Real world knowledge bases (like Freebase, DBPedia, YAGO, etc.) are incomplete! Das et al. (2017) 8/30

Link Prediction Real world knowledge bases (like Freebase, DBPedia, YAGO, etc.) are incomplete! placeOfBirth attribute is missing for 71% of people! Das et al. (2017) 8/30

Link Prediction Real world knowledge bases (like Freebase, DBPedia, YAGO, etc.) are incomplete! placeOfBirth attribute is missing for 71% of people! Commonsense knowledge often not stated explicitly Das et al. (2017) 8/30

Link Prediction Real world knowledge bases (like Freebase, DBPedia, YAGO, etc.) are incomplete! placeOfBirth attribute is missing for 71% of people! Commonsense knowledge often not stated explicitly Weak logical relationships that can be used for inferring facts Das et al. (2017) 8/30

Link Prediction Real world knowledge bases (like Freebase, DBPedia, YAGO, etc.) are incomplete! placeOfBirth attribute is missing for 71% of people! Commonsense knowledge often not stated explicitly Weak logical relationships that can be used for inferring facts melinda livesIn? seattle Das et al. (2017) 8/30

Link Prediction Real world knowledge bases (like Freebase, DBPedia, YAGO, etc.) are incomplete! placeOfBirth attribute is missing for 71% of people! Commonsense knowledge often not stated explicitly Weak logical relationships that can be used for inferring facts spouseOf melinda bill livesIn? seattle Das et al. (2017) 8/30

Link Prediction Real world knowledge bases (like Freebase, DBPedia, YAGO, etc.) are incomplete! placeOfBirth attribute is missing for 71% of people! Commonsense knowledge often not stated explicitly Weak logical relationships that can be used for inferring facts spouseOf melinda bill chairmanOf microsoft livesIn? seattle Das et al. (2017) 8/30

Link Prediction Real world knowledge bases (like Freebase, DBPedia, YAGO, etc.) are incomplete! placeOfBirth attribute is missing for 71% of people! Commonsense knowledge often not stated explicitly Weak logical relationships that can be used for inferring facts spouseOf melinda bill chairmanOf microsoft headquarteredIn livesIn? seattle Das et al. (2017) 8/30

Symbolic Representations Symbols (constants and predicates) do not share any information: grandpaOf � = grandfatherOf Tim Rockt¨ aschel End-to-End Differentiable Proving 9/30

Symbolic Representations Symbols (constants and predicates) do not share any information: grandpaOf � = grandfatherOf No notion of similarity: apple ∼ orange , professorAt ∼ lecturerAt Tim Rockt¨ aschel End-to-End Differentiable Proving 9/30

Symbolic Representations Symbols (constants and predicates) do not share any information: grandpaOf � = grandfatherOf No notion of similarity: apple ∼ orange , professorAt ∼ lecturerAt No generalization beyond what can be symbolically inferred: isFruit ( apple ), apple ∼ organge , isFruit ( orange )? Tim Rockt¨ aschel End-to-End Differentiable Proving 9/30

Symbolic Representations Symbols (constants and predicates) do not share any information: grandpaOf � = grandfatherOf No notion of similarity: apple ∼ orange , professorAt ∼ lecturerAt No generalization beyond what can be symbolically inferred: isFruit ( apple ), apple ∼ organge , isFruit ( orange )? Hard to work with language, vision and other modalities ‘‘is a film based on the novel of the same name by’’ ( X , Y ) Tim Rockt¨ aschel End-to-End Differentiable Proving 9/30

Symbolic Representations Symbols (constants and predicates) do not share any information: grandpaOf � = grandfatherOf No notion of similarity: apple ∼ orange , professorAt ∼ lecturerAt No generalization beyond what can be symbolically inferred: isFruit ( apple ), apple ∼ organge , isFruit ( orange )? Hard to work with language, vision and other modalities ‘‘is a film based on the novel of the same name by’’ ( X , Y ) But... leads to powerful inference mechanisms and proofs for predictions: fatherOf ( abe , homer ) . parentOf ( homer , lisa ) . parentOf ( homer , bart ) . grandfatherOf ( X , Y ) :– fatherOf ( X , Z ) , parentOf ( Z , Y ) . grandfatherOf ( abe , Q )? { Q / lisa } , { Q / bart } Tim Rockt¨ aschel End-to-End Differentiable Proving 9/30

Symbolic Representations Symbols (constants and predicates) do not share any information: grandpaOf � = grandfatherOf No notion of similarity: apple ∼ orange , professorAt ∼ lecturerAt No generalization beyond what can be symbolically inferred: isFruit ( apple ), apple ∼ organge , isFruit ( orange )? Hard to work with language, vision and other modalities ‘‘is a film based on the novel of the same name by’’ ( X , Y ) But... leads to powerful inference mechanisms and proofs for predictions: fatherOf ( abe , homer ) . parentOf ( homer , lisa ) . parentOf ( homer , bart ) . grandfatherOf ( X , Y ) :– fatherOf ( X , Z ) , parentOf ( Z , Y ) . grandfatherOf ( abe , Q )? { Q / lisa } , { Q / bart } Fairly easy to debug and trivial to incorporate domain knowledge: Show to domain expert and let her change/add rules and facts Tim Rockt¨ aschel End-to-End Differentiable Proving 9/30

Neural Representations Lower-dimensional fixed-length vector representations of symbols (predicates and constants): v apple , v orange , v fatherOf , . . . ∈ R k Tim Rockt¨ aschel End-to-End Differentiable Proving 10/30

Neural Representations Lower-dimensional fixed-length vector representations of symbols (predicates and constants): v apple , v orange , v fatherOf , . . . ∈ R k Can capture similarity and even semantic hierarchy of symbols: v grandpaOf = v grandfatherOf , v apple ∼ v orange , v apple < v fruit Tim Rockt¨ aschel End-to-End Differentiable Proving 10/30

Neural Representations Lower-dimensional fixed-length vector representations of symbols (predicates and constants): v apple , v orange , v fatherOf , . . . ∈ R k Can capture similarity and even semantic hierarchy of symbols: v grandpaOf = v grandfatherOf , v apple ∼ v orange , v apple < v fruit Can be trained from raw task data (e.g. facts in a knowledge base) Tim Rockt¨ aschel End-to-End Differentiable Proving 10/30

Neural Representations Lower-dimensional fixed-length vector representations of symbols (predicates and constants): v apple , v orange , v fatherOf , . . . ∈ R k Can capture similarity and even semantic hierarchy of symbols: v grandpaOf = v grandfatherOf , v apple ∼ v orange , v apple < v fruit Can be trained from raw task data (e.g. facts in a knowledge base) Can be compositional v ‘‘is the father of’’ = RNN θ ( v is , v the , v father , v of ) Tim Rockt¨ aschel End-to-End Differentiable Proving 10/30

Neural Representations Lower-dimensional fixed-length vector representations of symbols (predicates and constants): v apple , v orange , v fatherOf , . . . ∈ R k Can capture similarity and even semantic hierarchy of symbols: v grandpaOf = v grandfatherOf , v apple ∼ v orange , v apple < v fruit Can be trained from raw task data (e.g. facts in a knowledge base) Can be compositional v ‘‘is the father of’’ = RNN θ ( v is , v the , v father , v of ) But... need large amount of training data Tim Rockt¨ aschel End-to-End Differentiable Proving 10/30

Neural Representations Lower-dimensional fixed-length vector representations of symbols (predicates and constants): v apple , v orange , v fatherOf , . . . ∈ R k Can capture similarity and even semantic hierarchy of symbols: v grandpaOf = v grandfatherOf , v apple ∼ v orange , v apple < v fruit Can be trained from raw task data (e.g. facts in a knowledge base) Can be compositional v ‘‘is the father of’’ = RNN θ ( v is , v the , v father , v of ) But... need large amount of training data No direct way of incorporating prior knowledge v grandfatherOf ( X , Y ) :– v fatherOf ( X , Z ) , v parentOf ( Z , Y ) . Tim Rockt¨ aschel End-to-End Differentiable Proving 10/30

State-of-the-art Neural Link Prediction livesIn ( melinda , seattle )? = f θ ( v livesIn , v melinda , v seattle ) Tim Rockt¨ aschel End-to-End Differentiable Proving 11/30

State-of-the-art Neural Link Prediction livesIn ( melinda , seattle )? = f θ ( v livesIn , v melinda , v seattle ) DistMult (Yang et al., 2015) v s , v i , v j ∈ R k Tim Rockt¨ aschel End-to-End Differentiable Proving 11/30

State-of-the-art Neural Link Prediction livesIn ( melinda , seattle )? = f θ ( v livesIn , v melinda , v seattle ) DistMult (Yang et al., 2015) v s , v i , v j ∈ R k f θ ( v s , v i , v j ) = v ⊤ s ( v i ⊙ v j ) � = v sk v ik v jk k Tim Rockt¨ aschel End-to-End Differentiable Proving 11/30

State-of-the-art Neural Link Prediction livesIn ( melinda , seattle )? = f θ ( v livesIn , v melinda , v seattle ) DistMult (Yang et al., 2015) ComplEx (Trouillon et al., 2016) v s , v i , v j ∈ R k v s , v i , v j ∈ C k f θ ( v s , v i , v j ) = v ⊤ s ( v i ⊙ v j ) � = v sk v ik v jk k Tim Rockt¨ aschel End-to-End Differentiable Proving 11/30

State-of-the-art Neural Link Prediction livesIn ( melinda , seattle )? = f θ ( v livesIn , v melinda , v seattle ) DistMult (Yang et al., 2015) ComplEx (Trouillon et al., 2016) v s , v i , v j ∈ R k v s , v i , v j ∈ C k f θ ( v s , v i , v j ) = v ⊤ f θ ( v s , v i , v j ) = s ( v i ⊙ v j ) real( v s ) ⊤ (real( v i ) ⊙ real( v j )) � = v sk v ik v jk + real( v s ) ⊤ (imag( v i ) ⊙ imag( v j )) k + imag( v s ) ⊤ (real( v i ) ⊙ imag( v j )) − imag( v s ) ⊤ (imag( v i ) ⊙ real( v j )) Tim Rockt¨ aschel End-to-End Differentiable Proving 11/30

State-of-the-art Neural Link Prediction livesIn ( melinda , seattle )? = f θ ( v livesIn , v melinda , v seattle ) DistMult (Yang et al., 2015) ComplEx (Trouillon et al., 2016) v s , v i , v j ∈ R k v s , v i , v j ∈ C k f θ ( v s , v i , v j ) = v ⊤ f θ ( v s , v i , v j ) = s ( v i ⊙ v j ) real( v s ) ⊤ (real( v i ) ⊙ real( v j )) � = v sk v ik v jk + real( v s ) ⊤ (imag( v i ) ⊙ imag( v j )) k + imag( v s ) ⊤ (real( v i ) ⊙ imag( v j )) − imag( v s ) ⊤ (imag( v i ) ⊙ real( v j )) Training Loss � L = − y log ( σ ( f θ ( v s , v i , v j ))) − (1 − y ) log (1 − σ ( f θ ( v s , v i , v j ))) r s ( e i , e j ) , y ∈ T Tim Rockt¨ aschel End-to-End Differentiable Proving 11/30

State-of-the-art Neural Link Prediction livesIn ( melinda , seattle )? = f θ ( v livesIn , v melinda , v seattle ) DistMult (Yang et al., 2015) ComplEx (Trouillon et al., 2016) v s , v i , v j ∈ R k v s , v i , v j ∈ C k f θ ( v s , v i , v j ) = v ⊤ f θ ( v s , v i , v j ) = s ( v i ⊙ v j ) real( v s ) ⊤ (real( v i ) ⊙ real( v j )) � = v sk v ik v jk + real( v s ) ⊤ (imag( v i ) ⊙ imag( v j )) k + imag( v s ) ⊤ (real( v i ) ⊙ imag( v j )) − imag( v s ) ⊤ (imag( v i ) ⊙ real( v j )) Training Loss � L = − y log ( σ ( f θ ( v s , v i , v j ))) − (1 − y ) log (1 − σ ( f θ ( v s , v i , v j ))) r s ( e i , e j ) , y ∈ T Learn v s , v i , v j from data Tim Rockt¨ aschel End-to-End Differentiable Proving 11/30

State-of-the-art Neural Link Prediction livesIn ( melinda , seattle )? = f θ ( v livesIn , v melinda , v seattle ) DistMult (Yang et al., 2015) ComplEx (Trouillon et al., 2016) v s , v i , v j ∈ R k v s , v i , v j ∈ C k f θ ( v s , v i , v j ) = v ⊤ f θ ( v s , v i , v j ) = s ( v i ⊙ v j ) real( v s ) ⊤ (real( v i ) ⊙ real( v j )) � = v sk v ik v jk + real( v s ) ⊤ (imag( v i ) ⊙ imag( v j )) k + imag( v s ) ⊤ (real( v i ) ⊙ imag( v j )) − imag( v s ) ⊤ (imag( v i ) ⊙ real( v j )) Training Loss � L = − y log ( σ ( f θ ( v s , v i , v j ))) − (1 − y ) log (1 − σ ( f θ ( v s , v i , v j ))) r s ( e i , e j ) , y ∈ T Learn v s , v i , v j from data Obtain gradients ∇ v s L , ∇ v i L , ∇ v j L by backprop Tim Rockt¨ aschel End-to-End Differentiable Proving 11/30

Regularization by Propositional Logic u 4 u 5 u 6 sigm sigm sigm Link Predictor u 1 u 2 u 3 dot dot dot � parentOf � � homer , bart � � motherOf � � fatherOf � Rockt¨ aschel et al. (2015), NAACL 12/30

Regularization by Propositional Logic fatherOf ( X , Y ) :– parentOf ( X , Y ) , ¬ motherOf ( X , Y ) u 4 u 5 u 6 sigm sigm sigm Link Predictor u 1 u 2 u 3 dot dot dot � parentOf � � homer , bart � � motherOf � � fatherOf � Rockt¨ aschel et al. (2015), NAACL 12/30

Regularization by Propositional Logic fatherOf ( X , Y ) :– parentOf ( X , Y ) , ¬ motherOf ( X , Y )  f θ ( s , i , j ) if F = s ( i , j )    1 − � A � if F = ¬ A    p ( F ) = � F � = � A � � B � if F = A ∧ B  � A � + � B � − � A � � B � if F = A ∨ B     � B � ( � A � − 1) + 1 if F = A :– B  u 4 u 5 u 6 sigm sigm sigm Link Predictor u 1 u 2 u 3 dot dot dot � parentOf � � homer , bart � � motherOf � � fatherOf � Rockt¨ aschel et al. (2015), NAACL 12/30

Regularization by Propositional Logic u 11 • + 1 fatherOf ( X , Y ) :– parentOf ( X , Y ) , ¬ motherOf ( X , Y ) Differentiable Rule u 10  ∗ f θ ( s , i , j ) if F = s ( i , j )  u 9   1 − � A � if F = ¬ A   ∗  p ( F ) = � F � = � A � � B � if F = A ∧ B u 7 u 8 • − 1 1 − •  � A � + � B � − � A � � B � if F = A ∨ B     � B � ( � A � − 1) + 1 if F = A :– B  u 4 u 5 u 6 sigm sigm sigm Link Predictor u 1 u 2 u 3 dot dot dot � parentOf � � homer , bart � � motherOf � � fatherOf � Rockt¨ aschel et al. (2015), NAACL 12/30

Regularization by Propositional Logic loss Loss − log u 11 • + 1 fatherOf ( X , Y ) :– parentOf ( X , Y ) , ¬ motherOf ( X , Y ) Differentiable Rule u 10  ∗ f θ ( s , i , j ) if F = s ( i , j )  u 9   1 − � A � if F = ¬ A   ∗  p ( F ) = � F � = � A � � B � if F = A ∧ B u 7 u 8 • − 1 1 − •  � A � + � B � − � A � � B � if F = A ∨ B     � B � ( � A � − 1) + 1 if F = A :– B  u 4 u 5 u 6 sigm sigm sigm Link Predictor L � � fatherOf ( homer , bart ) :– u 1 u 2 u 3 parentOf ( homer , bart ) ∧ dot dot dot ¬ motherOf ( homer , bart ) � � � parentOf � � homer , bart � � motherOf � � fatherOf � Rockt¨ aschel et al. (2015), NAACL 12/30

Regularization by Propositional Logic L ( f ) = − log ( � ∀ X , Y : f ( X , Y ) � ) = − � ( e i , e j ) ∈C 2 log � f ( e i , e j ) � loss Loss − log u 11 • + 1 fatherOf ( X , Y ) :– parentOf ( X , Y ) , ¬ motherOf ( X , Y ) Differentiable Rule u 10  ∗ f θ ( s , i , j ) if F = s ( i , j )  u 9   1 − � A � if F = ¬ A   ∗  p ( F ) = � F � = � A � � B � if F = A ∧ B u 7 u 8 • − 1 1 − •  � A � + � B � − � A � � B � if F = A ∨ B     � B � ( � A � − 1) + 1 if F = A :– B  u 4 u 5 u 6 sigm sigm sigm Link Predictor L � � fatherOf ( homer , bart ) :– u 1 u 2 u 3 parentOf ( homer , bart ) ∧ dot dot dot ¬ motherOf ( homer , bart ) � � � parentOf � � homer , bart � � motherOf � � fatherOf � Rockt¨ aschel et al. (2015), NAACL 12/30

Zero-shot Learning Results Neural Link Prediction (LP) weighted Mean Average Precision 38 40 33 21 20 10 3 0 Tim Rockt¨ aschel End-to-End Differentiable Proving 13/30

Zero-shot Learning Results Neural Link Prediction (LP) Deduction weighted Mean Average Precision 38 40 33 21 20 10 3 0 Tim Rockt¨ aschel End-to-End Differentiable Proving 13/30

Zero-shot Learning Results Neural Link Prediction (LP) Deduction Deduction after LP weighted Mean Average Precision 38 40 33 21 20 10 3 0 Tim Rockt¨ aschel End-to-End Differentiable Proving 13/30

Zero-shot Learning Results Neural Link Prediction (LP) Deduction Deduction after LP Deduction before LP weighted Mean Average Precision 38 40 33 21 20 10 3 0 Tim Rockt¨ aschel End-to-End Differentiable Proving 13/30

Zero-shot Learning Results Neural Link Prediction (LP) Deduction Deduction after LP Deduction before LP Regularization weighted Mean Average Precision 38 40 33 21 20 10 3 0 Tim Rockt¨ aschel End-to-End Differentiable Proving 13/30

Lifted Regularization by Implications Every father is a parent Every mother is a parent mother of father of parent of 0 Demeester et al. (2016), EMNLP 14/30

Lifted Regularization by Implications Every father is a parent Every mother is a parent implied by father of mother of father of parent of 0 Demeester et al. (2016), EMNLP 14/30

Lifted Regularization by Implications Every father is a parent Every mother is a parent After Before implied by father of mother of parent of father of father of parent of 0 0 Demeester et al. (2016), EMNLP 14/30

Lifted Regularization by Implications Every father is a parent Every mother is a parent After Before implied by father of mother of parent of father of father of mother of parent of 0 0 Demeester et al. (2016), EMNLP 14/30

Lifted Regularization by Implications Every father is a parent Generalises to similar relations ( e.g. dad) Every mother is a parent Generalises to similar relations ( e.g. mum) After Before implied by father of mother of parent of father of father of dad of mother of parent of mum of 0 0 Demeester et al. (2016), EMNLP 14/30

Lifted Regularization by Implications Every father is a parent Generalises to similar relations ( e.g. dad) Every mother is a parent Generalises to similar relations ( e.g. mum) Every parent is a relative No training facts needed! After Before implied by father of relative of mother of parent of father of father of dad of mother of parent of mum of 0 0 Demeester et al. (2016), EMNLP 14/30

Lifted Regularization by Implications ∀ X , Y : h ( X , Y ) :– b ( X , Y ) Every father is a parent Generalises to similar relations ( e.g. dad) ∀ ( e i , e j ) ∈ C 2 : � h � ⊤ � e i , e j � ≥ � b � ⊤ � e i , e j � Every mother is a parent Generalises to similar relations ( e.g. mum) ∀ ( e i , e j ) ∈ C 2 : � e i , e j � ∈ R k Every parent is a relative No training facts needed! � h � ≥ � b � , + After Before implied by father of relative of mother of parent of father of father of dad of mother of parent of mum of 0 0 Demeester et al. (2016), EMNLP 14/30

Adversarial Regularization Clause A : h ( X , Y ) :– b 1 ( X , Z ) ∧ b 2 ( Z , Y ) Regularization by propositional rules needs grounding – does not scale to large domains! y y x x z z Link Predictor Link Predictor Link Predictor φ h ( x , y ) φ b 1 ( x , z ) φ b 2 ( z , y ) J I [ φ h ( x , y ) :– φ b 1 ( x , z ) ∧ φ b 2 ( z , y )] Inconsistency Loss Minervini et al. (2017), UAI 14/30

Adversarial Regularization Clause A : h ( X , Y ) :– b 1 ( X , Z ) ∧ b 2 ( Z , Y ) Regularization by propositional rules needs grounding – does not scale to large domains! Lifted regularization only supports direct implications y y x x z z Link Predictor Link Predictor Link Predictor φ h ( x , y ) φ b 1 ( x , z ) φ b 2 ( z , y ) J I [ φ h ( x , y ) :– φ b 1 ( x , z ) ∧ φ b 2 ( z , y )] Inconsistency Loss Minervini et al. (2017), UAI 14/30

Adversarial Regularization Clause A : h ( X , Y ) :– b 1 ( X , Z ) ∧ b 2 ( Z , Y ) Regularization by propositional rules needs Adversary grounding – does not scale to large domains! y x z Lifted regularization only supports direct Adversarial Set S implications Idea: let grounding be generated by an y y x x z z adversary and optimize minimax game... Link Predictor Link Predictor Link Predictor φ h ( x , y ) φ b 1 ( x , z ) φ b 2 ( z , y ) J I [ φ h ( x , y ) :– φ b 1 ( x , z ) ∧ φ b 2 ( z , y )] Inconsistency Loss Minervini et al. (2017), UAI 14/30

Adversarial Regularization Clause A : h ( X , Y ) :– b 1 ( X , Z ) ∧ b 2 ( Z , Y ) Regularization by propositional rules needs Adversary grounding – does not scale to large domains! y x z Lifted regularization only supports direct Adversarial Set S implications Idea: let grounding be generated by an y y x x z z adversary and optimize minimax game... Adversary finds maximally violating Link Predictor Link Predictor Link Predictor grounding for a given rule φ h ( x , y ) φ b 1 ( x , z ) φ b 2 ( z , y ) J I [ φ h ( x , y ) :– φ b 1 ( x , z ) ∧ φ b 2 ( z , y )] Inconsistency Loss Minervini et al. (2017), UAI 14/30

End-to-End Differentiable Proving Tim Rockt aschel Whiteson - PowerPoint PPT Presentation

End-to-End Differentiable Proving Tim Rockt aschel Whiteson Research Lab, University of Oxford Twitter: @ rockt tim.rocktaschel@cs.ox.ac.uk http://rockt.github.com Logic and Learning Workshop at The Alan Turing Institute January 12, 2018

Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases Tim Rockt aschel University

An Enriched Perspective on Differentiable Stacks Benjamin MacAdam Joint work with Jonathan

Proving that Artinian implies Noetherian without proving that Artinian implies finite length

Symbolic Computation and Theorem Proving in Program Analysis Laura Kov acs Chalmers

Visual theorem proving with the Incredible Proof Machine The idea Theorem Proving without

4.5: Proving the Correctness of Grammars In this section, we consider techniques for proving the

Learning theorem proving through self-play Stanisaw Purga Overview AlphaZero Proving

Learning with Differentiable Perturbed Optimizers Quentin Berthet Youth in High-dimensions -

Learning with Differentiable Perturbed Optimizers Quentin Berthet Optimization for ML - CIRM -

The Differentiable Curry Martin Abadi, Dan Belov, Gordon Plotkin, Richard Wei, Dimitrios Vytiniotis

Differentiable Rendering for Mesh and Implicit Surface Weikai Chen Tencent America GAMES

Learning to map between ferns with differentiable binary embedding networks Maximilian Blendowski

Relay : a high level differentiable IR Jared Roesch TVMConf December 12th, 2018 1 This

Reparameterization Gradient for Non-differentiable Models Wonyeol Lee Hangyeol Yu Hongseok

Differentiable Cloth Simulation for Inverse Problems Junbang Liang 1 Content Motivation

GPU-accelerated End-to-end Differentiable Planning and Reasoning Tim Rockt aschel Whiteson

E-CAT QX DEMO Stockholm 24 November 2017 MatsLewan.se PROGRAM - Introduction and description of

Title of the Invention Medical Diagnostic Slide with Sensors and Internet of Things (IOT)

Patents 102(a) Novelty A person shall be entitled to a patent unless - (a) the

Incentives and Behavior Prof. Dr. Heiner Schumacher KU Leuven 4. Risk Preferences II Prof. Dr.

Mitigation Working Group Webinar GHG Inventory and Modeling for GGRA MWG Meeting April 2, 2020

CO pollution episodes observed at Rishiri Island explored by global CTM simulations and AIRS

RESPONSE TO INTERVENTION Data Warehouse Audit: A big picture look at K-12 instruction,

The LNG Inventory Routing Problem with Pick-Up Contracts Henrik Andersson Marielle Christiansen