end to end differentiable proving
play

End-to-End Differentiable Proving Tim Rockt aschel Whiteson - PowerPoint PPT Presentation

End-to-End Differentiable Proving Tim Rockt aschel Whiteson Research Lab, University of Oxford Twitter: @ rockt tim.rocktaschel@cs.ox.ac.uk http://rockt.github.com Logic and Learning Workshop at The Alan Turing Institute January 12, 2018


  1. Outline 1 Link prediction & symbolic vs. neural representations 2 Regularize neural representations using logical rules Model-agnostic but slow (Rockt¨ aschel et al., 2015) Fast but restricted (Demeester et al., 2016) Tim Rockt¨ aschel End-to-End Differentiable Proving 6/30

  2. Outline 1 Link prediction & symbolic vs. neural representations 2 Regularize neural representations using logical rules Model-agnostic but slow (Rockt¨ aschel et al., 2015) Fast but restricted (Demeester et al., 2016) Model-agnostic and fast (Minervini et al., 2017) Tim Rockt¨ aschel End-to-End Differentiable Proving 6/30

  3. Outline 1 Link prediction & symbolic vs. neural representations 2 Regularize neural representations using logical rules Model-agnostic but slow (Rockt¨ aschel et al., 2015) Fast but restricted (Demeester et al., 2016) Model-agnostic and fast (Minervini et al., 2017) 3 End-to-end differentiable proving (Rockt¨ aschel and Riedel, 2017) Tim Rockt¨ aschel End-to-End Differentiable Proving 6/30

  4. Outline 1 Link prediction & symbolic vs. neural representations 2 Regularize neural representations using logical rules Model-agnostic but slow (Rockt¨ aschel et al., 2015) Fast but restricted (Demeester et al., 2016) Model-agnostic and fast (Minervini et al., 2017) 3 End-to-end differentiable proving (Rockt¨ aschel and Riedel, 2017) Explicit multi-hop reasoning using neural networks Tim Rockt¨ aschel End-to-End Differentiable Proving 6/30

  5. Outline 1 Link prediction & symbolic vs. neural representations 2 Regularize neural representations using logical rules Model-agnostic but slow (Rockt¨ aschel et al., 2015) Fast but restricted (Demeester et al., 2016) Model-agnostic and fast (Minervini et al., 2017) 3 End-to-end differentiable proving (Rockt¨ aschel and Riedel, 2017) Explicit multi-hop reasoning using neural networks Inducing rules using gradient descent Tim Rockt¨ aschel End-to-End Differentiable Proving 6/30

  6. Outline 1 Link prediction & symbolic vs. neural representations 2 Regularize neural representations using logical rules Model-agnostic but slow (Rockt¨ aschel et al., 2015) Fast but restricted (Demeester et al., 2016) Model-agnostic and fast (Minervini et al., 2017) 3 End-to-end differentiable proving (Rockt¨ aschel and Riedel, 2017) Explicit multi-hop reasoning using neural networks Inducing rules using gradient descent 4 Outlook & Summary Tim Rockt¨ aschel End-to-End Differentiable Proving 6/30

  7. Notation Constant : homer , bart , lisa etc. (lowercase) Tim Rockt¨ aschel End-to-End Differentiable Proving 7/30

  8. Notation Constant : homer , bart , lisa etc. (lowercase) Variable : X , Y etc. (uppercase, universally quantified) Tim Rockt¨ aschel End-to-End Differentiable Proving 7/30

  9. Notation Constant : homer , bart , lisa etc. (lowercase) Variable : X , Y etc. (uppercase, universally quantified) Term : constant or variable Restricted to function-free terms in this talk Tim Rockt¨ aschel End-to-End Differentiable Proving 7/30

  10. Notation Constant : homer , bart , lisa etc. (lowercase) Variable : X , Y etc. (uppercase, universally quantified) Term : constant or variable Restricted to function-free terms in this talk Predicate : fatherOf , parentOf etc. function from terms to a Boolean Tim Rockt¨ aschel End-to-End Differentiable Proving 7/30

  11. Notation Constant : homer , bart , lisa etc. (lowercase) Variable : X , Y etc. (uppercase, universally quantified) Term : constant or variable Restricted to function-free terms in this talk Predicate : fatherOf , parentOf etc. function from terms to a Boolean Atom : predicate and terms, e.g., parentOf ( X , bart ) Tim Rockt¨ aschel End-to-End Differentiable Proving 7/30

  12. Notation Constant : homer , bart , lisa etc. (lowercase) Variable : X , Y etc. (uppercase, universally quantified) Term : constant or variable Restricted to function-free terms in this talk Predicate : fatherOf , parentOf etc. function from terms to a Boolean Atom : predicate and terms, e.g., parentOf ( X , bart ) Literal : atom or negated or atom, e.g., not parentOf ( bart , lisa ) Tim Rockt¨ aschel End-to-End Differentiable Proving 7/30

  13. Notation Constant : homer , bart , lisa etc. (lowercase) Variable : X , Y etc. (uppercase, universally quantified) Term : constant or variable Restricted to function-free terms in this talk Predicate : fatherOf , parentOf etc. function from terms to a Boolean Atom : predicate and terms, e.g., parentOf ( X , bart ) Literal : atom or negated or atom, e.g., not parentOf ( bart , lisa ) Rule : head :– body . head : atom body : (possibly empty) list of literals representing conjunction Restricted to Horn clauses in this talk Tim Rockt¨ aschel End-to-End Differentiable Proving 7/30

  14. Notation Constant : homer , bart , lisa etc. (lowercase) Variable : X , Y etc. (uppercase, universally quantified) Term : constant or variable Restricted to function-free terms in this talk Predicate : fatherOf , parentOf etc. function from terms to a Boolean Atom : predicate and terms, e.g., parentOf ( X , bart ) Literal : atom or negated or atom, e.g., not parentOf ( bart , lisa ) Rule : head :– body . head : atom body : (possibly empty) list of literals representing conjunction Restricted to Horn clauses in this talk Fact : ground rule (no free variables) with empty body, e.g., parentOf ( homer , bart ) . Tim Rockt¨ aschel End-to-End Differentiable Proving 7/30

  15. Link Prediction Real world knowledge bases (like Freebase, DBPedia, YAGO, etc.) are incomplete! Das et al. (2017) 8/30

  16. Link Prediction Real world knowledge bases (like Freebase, DBPedia, YAGO, etc.) are incomplete! placeOfBirth attribute is missing for 71% of people! Das et al. (2017) 8/30

  17. Link Prediction Real world knowledge bases (like Freebase, DBPedia, YAGO, etc.) are incomplete! placeOfBirth attribute is missing for 71% of people! Commonsense knowledge often not stated explicitly Das et al. (2017) 8/30

  18. Link Prediction Real world knowledge bases (like Freebase, DBPedia, YAGO, etc.) are incomplete! placeOfBirth attribute is missing for 71% of people! Commonsense knowledge often not stated explicitly Weak logical relationships that can be used for inferring facts Das et al. (2017) 8/30

  19. Link Prediction Real world knowledge bases (like Freebase, DBPedia, YAGO, etc.) are incomplete! placeOfBirth attribute is missing for 71% of people! Commonsense knowledge often not stated explicitly Weak logical relationships that can be used for inferring facts melinda livesIn? seattle Das et al. (2017) 8/30

  20. Link Prediction Real world knowledge bases (like Freebase, DBPedia, YAGO, etc.) are incomplete! placeOfBirth attribute is missing for 71% of people! Commonsense knowledge often not stated explicitly Weak logical relationships that can be used for inferring facts spouseOf melinda bill livesIn? seattle Das et al. (2017) 8/30

  21. Link Prediction Real world knowledge bases (like Freebase, DBPedia, YAGO, etc.) are incomplete! placeOfBirth attribute is missing for 71% of people! Commonsense knowledge often not stated explicitly Weak logical relationships that can be used for inferring facts spouseOf melinda bill chairmanOf microsoft livesIn? seattle Das et al. (2017) 8/30

  22. Link Prediction Real world knowledge bases (like Freebase, DBPedia, YAGO, etc.) are incomplete! placeOfBirth attribute is missing for 71% of people! Commonsense knowledge often not stated explicitly Weak logical relationships that can be used for inferring facts spouseOf melinda bill chairmanOf microsoft headquarteredIn livesIn? seattle Das et al. (2017) 8/30

  23. Symbolic Representations Symbols (constants and predicates) do not share any information: grandpaOf � = grandfatherOf Tim Rockt¨ aschel End-to-End Differentiable Proving 9/30

  24. Symbolic Representations Symbols (constants and predicates) do not share any information: grandpaOf � = grandfatherOf No notion of similarity: apple ∼ orange , professorAt ∼ lecturerAt Tim Rockt¨ aschel End-to-End Differentiable Proving 9/30

  25. Symbolic Representations Symbols (constants and predicates) do not share any information: grandpaOf � = grandfatherOf No notion of similarity: apple ∼ orange , professorAt ∼ lecturerAt No generalization beyond what can be symbolically inferred: isFruit ( apple ), apple ∼ organge , isFruit ( orange )? Tim Rockt¨ aschel End-to-End Differentiable Proving 9/30

  26. Symbolic Representations Symbols (constants and predicates) do not share any information: grandpaOf � = grandfatherOf No notion of similarity: apple ∼ orange , professorAt ∼ lecturerAt No generalization beyond what can be symbolically inferred: isFruit ( apple ), apple ∼ organge , isFruit ( orange )? Hard to work with language, vision and other modalities ‘‘is a film based on the novel of the same name by’’ ( X , Y ) Tim Rockt¨ aschel End-to-End Differentiable Proving 9/30

  27. Symbolic Representations Symbols (constants and predicates) do not share any information: grandpaOf � = grandfatherOf No notion of similarity: apple ∼ orange , professorAt ∼ lecturerAt No generalization beyond what can be symbolically inferred: isFruit ( apple ), apple ∼ organge , isFruit ( orange )? Hard to work with language, vision and other modalities ‘‘is a film based on the novel of the same name by’’ ( X , Y ) But... leads to powerful inference mechanisms and proofs for predictions: fatherOf ( abe , homer ) . parentOf ( homer , lisa ) . parentOf ( homer , bart ) . grandfatherOf ( X , Y ) :– fatherOf ( X , Z ) , parentOf ( Z , Y ) . grandfatherOf ( abe , Q )? { Q / lisa } , { Q / bart } Tim Rockt¨ aschel End-to-End Differentiable Proving 9/30

  28. Symbolic Representations Symbols (constants and predicates) do not share any information: grandpaOf � = grandfatherOf No notion of similarity: apple ∼ orange , professorAt ∼ lecturerAt No generalization beyond what can be symbolically inferred: isFruit ( apple ), apple ∼ organge , isFruit ( orange )? Hard to work with language, vision and other modalities ‘‘is a film based on the novel of the same name by’’ ( X , Y ) But... leads to powerful inference mechanisms and proofs for predictions: fatherOf ( abe , homer ) . parentOf ( homer , lisa ) . parentOf ( homer , bart ) . grandfatherOf ( X , Y ) :– fatherOf ( X , Z ) , parentOf ( Z , Y ) . grandfatherOf ( abe , Q )? { Q / lisa } , { Q / bart } Fairly easy to debug and trivial to incorporate domain knowledge: Show to domain expert and let her change/add rules and facts Tim Rockt¨ aschel End-to-End Differentiable Proving 9/30

  29. Neural Representations Lower-dimensional fixed-length vector representations of symbols (predicates and constants): v apple , v orange , v fatherOf , . . . ∈ R k Tim Rockt¨ aschel End-to-End Differentiable Proving 10/30

  30. Neural Representations Lower-dimensional fixed-length vector representations of symbols (predicates and constants): v apple , v orange , v fatherOf , . . . ∈ R k Can capture similarity and even semantic hierarchy of symbols: v grandpaOf = v grandfatherOf , v apple ∼ v orange , v apple < v fruit Tim Rockt¨ aschel End-to-End Differentiable Proving 10/30

  31. Neural Representations Lower-dimensional fixed-length vector representations of symbols (predicates and constants): v apple , v orange , v fatherOf , . . . ∈ R k Can capture similarity and even semantic hierarchy of symbols: v grandpaOf = v grandfatherOf , v apple ∼ v orange , v apple < v fruit Can be trained from raw task data (e.g. facts in a knowledge base) Tim Rockt¨ aschel End-to-End Differentiable Proving 10/30

  32. Neural Representations Lower-dimensional fixed-length vector representations of symbols (predicates and constants): v apple , v orange , v fatherOf , . . . ∈ R k Can capture similarity and even semantic hierarchy of symbols: v grandpaOf = v grandfatherOf , v apple ∼ v orange , v apple < v fruit Can be trained from raw task data (e.g. facts in a knowledge base) Can be compositional v ‘‘is the father of’’ = RNN θ ( v is , v the , v father , v of ) Tim Rockt¨ aschel End-to-End Differentiable Proving 10/30

  33. Neural Representations Lower-dimensional fixed-length vector representations of symbols (predicates and constants): v apple , v orange , v fatherOf , . . . ∈ R k Can capture similarity and even semantic hierarchy of symbols: v grandpaOf = v grandfatherOf , v apple ∼ v orange , v apple < v fruit Can be trained from raw task data (e.g. facts in a knowledge base) Can be compositional v ‘‘is the father of’’ = RNN θ ( v is , v the , v father , v of ) But... need large amount of training data Tim Rockt¨ aschel End-to-End Differentiable Proving 10/30

  34. Neural Representations Lower-dimensional fixed-length vector representations of symbols (predicates and constants): v apple , v orange , v fatherOf , . . . ∈ R k Can capture similarity and even semantic hierarchy of symbols: v grandpaOf = v grandfatherOf , v apple ∼ v orange , v apple < v fruit Can be trained from raw task data (e.g. facts in a knowledge base) Can be compositional v ‘‘is the father of’’ = RNN θ ( v is , v the , v father , v of ) But... need large amount of training data No direct way of incorporating prior knowledge v grandfatherOf ( X , Y ) :– v fatherOf ( X , Z ) , v parentOf ( Z , Y ) . Tim Rockt¨ aschel End-to-End Differentiable Proving 10/30

  35. State-of-the-art Neural Link Prediction livesIn ( melinda , seattle )? = f θ ( v livesIn , v melinda , v seattle ) Tim Rockt¨ aschel End-to-End Differentiable Proving 11/30

  36. State-of-the-art Neural Link Prediction livesIn ( melinda , seattle )? = f θ ( v livesIn , v melinda , v seattle ) DistMult (Yang et al., 2015) v s , v i , v j ∈ R k Tim Rockt¨ aschel End-to-End Differentiable Proving 11/30

  37. State-of-the-art Neural Link Prediction livesIn ( melinda , seattle )? = f θ ( v livesIn , v melinda , v seattle ) DistMult (Yang et al., 2015) v s , v i , v j ∈ R k f θ ( v s , v i , v j ) = v ⊤ s ( v i ⊙ v j ) � = v sk v ik v jk k Tim Rockt¨ aschel End-to-End Differentiable Proving 11/30

  38. State-of-the-art Neural Link Prediction livesIn ( melinda , seattle )? = f θ ( v livesIn , v melinda , v seattle ) DistMult (Yang et al., 2015) ComplEx (Trouillon et al., 2016) v s , v i , v j ∈ R k v s , v i , v j ∈ C k f θ ( v s , v i , v j ) = v ⊤ s ( v i ⊙ v j ) � = v sk v ik v jk k Tim Rockt¨ aschel End-to-End Differentiable Proving 11/30

  39. State-of-the-art Neural Link Prediction livesIn ( melinda , seattle )? = f θ ( v livesIn , v melinda , v seattle ) DistMult (Yang et al., 2015) ComplEx (Trouillon et al., 2016) v s , v i , v j ∈ R k v s , v i , v j ∈ C k f θ ( v s , v i , v j ) = v ⊤ f θ ( v s , v i , v j ) = s ( v i ⊙ v j ) real( v s ) ⊤ (real( v i ) ⊙ real( v j )) � = v sk v ik v jk + real( v s ) ⊤ (imag( v i ) ⊙ imag( v j )) k + imag( v s ) ⊤ (real( v i ) ⊙ imag( v j )) − imag( v s ) ⊤ (imag( v i ) ⊙ real( v j )) Tim Rockt¨ aschel End-to-End Differentiable Proving 11/30

  40. State-of-the-art Neural Link Prediction livesIn ( melinda , seattle )? = f θ ( v livesIn , v melinda , v seattle ) DistMult (Yang et al., 2015) ComplEx (Trouillon et al., 2016) v s , v i , v j ∈ R k v s , v i , v j ∈ C k f θ ( v s , v i , v j ) = v ⊤ f θ ( v s , v i , v j ) = s ( v i ⊙ v j ) real( v s ) ⊤ (real( v i ) ⊙ real( v j )) � = v sk v ik v jk + real( v s ) ⊤ (imag( v i ) ⊙ imag( v j )) k + imag( v s ) ⊤ (real( v i ) ⊙ imag( v j )) − imag( v s ) ⊤ (imag( v i ) ⊙ real( v j )) Training Loss � L = − y log ( σ ( f θ ( v s , v i , v j ))) − (1 − y ) log (1 − σ ( f θ ( v s , v i , v j ))) r s ( e i , e j ) , y ∈ T Tim Rockt¨ aschel End-to-End Differentiable Proving 11/30

  41. State-of-the-art Neural Link Prediction livesIn ( melinda , seattle )? = f θ ( v livesIn , v melinda , v seattle ) DistMult (Yang et al., 2015) ComplEx (Trouillon et al., 2016) v s , v i , v j ∈ R k v s , v i , v j ∈ C k f θ ( v s , v i , v j ) = v ⊤ f θ ( v s , v i , v j ) = s ( v i ⊙ v j ) real( v s ) ⊤ (real( v i ) ⊙ real( v j )) � = v sk v ik v jk + real( v s ) ⊤ (imag( v i ) ⊙ imag( v j )) k + imag( v s ) ⊤ (real( v i ) ⊙ imag( v j )) − imag( v s ) ⊤ (imag( v i ) ⊙ real( v j )) Training Loss � L = − y log ( σ ( f θ ( v s , v i , v j ))) − (1 − y ) log (1 − σ ( f θ ( v s , v i , v j ))) r s ( e i , e j ) , y ∈ T Learn v s , v i , v j from data Tim Rockt¨ aschel End-to-End Differentiable Proving 11/30

  42. State-of-the-art Neural Link Prediction livesIn ( melinda , seattle )? = f θ ( v livesIn , v melinda , v seattle ) DistMult (Yang et al., 2015) ComplEx (Trouillon et al., 2016) v s , v i , v j ∈ R k v s , v i , v j ∈ C k f θ ( v s , v i , v j ) = v ⊤ f θ ( v s , v i , v j ) = s ( v i ⊙ v j ) real( v s ) ⊤ (real( v i ) ⊙ real( v j )) � = v sk v ik v jk + real( v s ) ⊤ (imag( v i ) ⊙ imag( v j )) k + imag( v s ) ⊤ (real( v i ) ⊙ imag( v j )) − imag( v s ) ⊤ (imag( v i ) ⊙ real( v j )) Training Loss � L = − y log ( σ ( f θ ( v s , v i , v j ))) − (1 − y ) log (1 − σ ( f θ ( v s , v i , v j ))) r s ( e i , e j ) , y ∈ T Learn v s , v i , v j from data Obtain gradients ∇ v s L , ∇ v i L , ∇ v j L by backprop Tim Rockt¨ aschel End-to-End Differentiable Proving 11/30

  43. Regularization by Propositional Logic u 4 u 5 u 6 sigm sigm sigm Link Predictor u 1 u 2 u 3 dot dot dot � parentOf � � homer , bart � � motherOf � � fatherOf � Rockt¨ aschel et al. (2015), NAACL 12/30

  44. Regularization by Propositional Logic fatherOf ( X , Y ) :– parentOf ( X , Y ) , ¬ motherOf ( X , Y ) u 4 u 5 u 6 sigm sigm sigm Link Predictor u 1 u 2 u 3 dot dot dot � parentOf � � homer , bart � � motherOf � � fatherOf � Rockt¨ aschel et al. (2015), NAACL 12/30

  45. Regularization by Propositional Logic fatherOf ( X , Y ) :– parentOf ( X , Y ) , ¬ motherOf ( X , Y )  f θ ( s , i , j ) if F = s ( i , j )    1 − � A � if F = ¬ A    p ( F ) = � F � = � A � � B � if F = A ∧ B  � A � + � B � − � A � � B � if F = A ∨ B     � B � ( � A � − 1) + 1 if F = A :– B  u 4 u 5 u 6 sigm sigm sigm Link Predictor u 1 u 2 u 3 dot dot dot � parentOf � � homer , bart � � motherOf � � fatherOf � Rockt¨ aschel et al. (2015), NAACL 12/30

  46. Regularization by Propositional Logic u 11 • + 1 fatherOf ( X , Y ) :– parentOf ( X , Y ) , ¬ motherOf ( X , Y ) Differentiable Rule u 10  ∗ f θ ( s , i , j ) if F = s ( i , j )  u 9   1 − � A � if F = ¬ A   ∗  p ( F ) = � F � = � A � � B � if F = A ∧ B u 7 u 8 • − 1 1 − •  � A � + � B � − � A � � B � if F = A ∨ B     � B � ( � A � − 1) + 1 if F = A :– B  u 4 u 5 u 6 sigm sigm sigm Link Predictor u 1 u 2 u 3 dot dot dot � parentOf � � homer , bart � � motherOf � � fatherOf � Rockt¨ aschel et al. (2015), NAACL 12/30

  47. Regularization by Propositional Logic loss Loss − log u 11 • + 1 fatherOf ( X , Y ) :– parentOf ( X , Y ) , ¬ motherOf ( X , Y ) Differentiable Rule u 10  ∗ f θ ( s , i , j ) if F = s ( i , j )  u 9   1 − � A � if F = ¬ A   ∗  p ( F ) = � F � = � A � � B � if F = A ∧ B u 7 u 8 • − 1 1 − •  � A � + � B � − � A � � B � if F = A ∨ B     � B � ( � A � − 1) + 1 if F = A :– B  u 4 u 5 u 6 sigm sigm sigm Link Predictor L � � fatherOf ( homer , bart ) :– u 1 u 2 u 3 parentOf ( homer , bart ) ∧ dot dot dot ¬ motherOf ( homer , bart ) � � � parentOf � � homer , bart � � motherOf � � fatherOf � Rockt¨ aschel et al. (2015), NAACL 12/30

  48. Regularization by Propositional Logic L ( f ) = − log ( � ∀ X , Y : f ( X , Y ) � ) = − � ( e i , e j ) ∈C 2 log � f ( e i , e j ) � loss Loss − log u 11 • + 1 fatherOf ( X , Y ) :– parentOf ( X , Y ) , ¬ motherOf ( X , Y ) Differentiable Rule u 10  ∗ f θ ( s , i , j ) if F = s ( i , j )  u 9   1 − � A � if F = ¬ A   ∗  p ( F ) = � F � = � A � � B � if F = A ∧ B u 7 u 8 • − 1 1 − •  � A � + � B � − � A � � B � if F = A ∨ B     � B � ( � A � − 1) + 1 if F = A :– B  u 4 u 5 u 6 sigm sigm sigm Link Predictor L � � fatherOf ( homer , bart ) :– u 1 u 2 u 3 parentOf ( homer , bart ) ∧ dot dot dot ¬ motherOf ( homer , bart ) � � � parentOf � � homer , bart � � motherOf � � fatherOf � Rockt¨ aschel et al. (2015), NAACL 12/30

  49. Zero-shot Learning Results Neural Link Prediction (LP) weighted Mean Average Precision 38 40 33 21 20 10 3 0 Tim Rockt¨ aschel End-to-End Differentiable Proving 13/30

  50. Zero-shot Learning Results Neural Link Prediction (LP) Deduction weighted Mean Average Precision 38 40 33 21 20 10 3 0 Tim Rockt¨ aschel End-to-End Differentiable Proving 13/30

  51. Zero-shot Learning Results Neural Link Prediction (LP) Deduction Deduction after LP weighted Mean Average Precision 38 40 33 21 20 10 3 0 Tim Rockt¨ aschel End-to-End Differentiable Proving 13/30

  52. Zero-shot Learning Results Neural Link Prediction (LP) Deduction Deduction after LP Deduction before LP weighted Mean Average Precision 38 40 33 21 20 10 3 0 Tim Rockt¨ aschel End-to-End Differentiable Proving 13/30

  53. Zero-shot Learning Results Neural Link Prediction (LP) Deduction Deduction after LP Deduction before LP Regularization weighted Mean Average Precision 38 40 33 21 20 10 3 0 Tim Rockt¨ aschel End-to-End Differentiable Proving 13/30

  54. Lifted Regularization by Implications Every father is a parent Every mother is a parent mother of father of parent of 0 Demeester et al. (2016), EMNLP 14/30

  55. Lifted Regularization by Implications Every father is a parent Every mother is a parent implied by father of mother of father of parent of 0 Demeester et al. (2016), EMNLP 14/30

  56. Lifted Regularization by Implications Every father is a parent Every mother is a parent After Before implied by father of mother of parent of father of father of parent of 0 0 Demeester et al. (2016), EMNLP 14/30

  57. Lifted Regularization by Implications Every father is a parent Every mother is a parent After Before implied by father of mother of parent of father of father of mother of parent of 0 0 Demeester et al. (2016), EMNLP 14/30

  58. Lifted Regularization by Implications Every father is a parent Generalises to similar relations ( e.g. dad) Every mother is a parent Generalises to similar relations ( e.g. mum) After Before implied by father of mother of parent of father of father of dad of mother of parent of mum of 0 0 Demeester et al. (2016), EMNLP 14/30

  59. Lifted Regularization by Implications Every father is a parent Generalises to similar relations ( e.g. dad) Every mother is a parent Generalises to similar relations ( e.g. mum) Every parent is a relative No training facts needed! After Before implied by father of relative of mother of parent of father of father of dad of mother of parent of mum of 0 0 Demeester et al. (2016), EMNLP 14/30

  60. Lifted Regularization by Implications ∀ X , Y : h ( X , Y ) :– b ( X , Y ) Every father is a parent Generalises to similar relations ( e.g. dad) ∀ ( e i , e j ) ∈ C 2 : � h � ⊤ � e i , e j � ≥ � b � ⊤ � e i , e j � Every mother is a parent Generalises to similar relations ( e.g. mum) ∀ ( e i , e j ) ∈ C 2 : � e i , e j � ∈ R k Every parent is a relative No training facts needed! � h � ≥ � b � , + After Before implied by father of relative of mother of parent of father of father of dad of mother of parent of mum of 0 0 Demeester et al. (2016), EMNLP 14/30

  61. Adversarial Regularization Clause A : h ( X , Y ) :– b 1 ( X , Z ) ∧ b 2 ( Z , Y ) Regularization by propositional rules needs grounding – does not scale to large domains! y y x x z z Link Predictor Link Predictor Link Predictor φ h ( x , y ) φ b 1 ( x , z ) φ b 2 ( z , y ) J I [ φ h ( x , y ) :– φ b 1 ( x , z ) ∧ φ b 2 ( z , y )] Inconsistency Loss Minervini et al. (2017), UAI 14/30

  62. Adversarial Regularization Clause A : h ( X , Y ) :– b 1 ( X , Z ) ∧ b 2 ( Z , Y ) Regularization by propositional rules needs grounding – does not scale to large domains! Lifted regularization only supports direct implications y y x x z z Link Predictor Link Predictor Link Predictor φ h ( x , y ) φ b 1 ( x , z ) φ b 2 ( z , y ) J I [ φ h ( x , y ) :– φ b 1 ( x , z ) ∧ φ b 2 ( z , y )] Inconsistency Loss Minervini et al. (2017), UAI 14/30

  63. Adversarial Regularization Clause A : h ( X , Y ) :– b 1 ( X , Z ) ∧ b 2 ( Z , Y ) Regularization by propositional rules needs Adversary grounding – does not scale to large domains! y x z Lifted regularization only supports direct Adversarial Set S implications Idea: let grounding be generated by an y y x x z z adversary and optimize minimax game... Link Predictor Link Predictor Link Predictor φ h ( x , y ) φ b 1 ( x , z ) φ b 2 ( z , y ) J I [ φ h ( x , y ) :– φ b 1 ( x , z ) ∧ φ b 2 ( z , y )] Inconsistency Loss Minervini et al. (2017), UAI 14/30

  64. Adversarial Regularization Clause A : h ( X , Y ) :– b 1 ( X , Z ) ∧ b 2 ( Z , Y ) Regularization by propositional rules needs Adversary grounding – does not scale to large domains! y x z Lifted regularization only supports direct Adversarial Set S implications Idea: let grounding be generated by an y y x x z z adversary and optimize minimax game... Adversary finds maximally violating Link Predictor Link Predictor Link Predictor grounding for a given rule φ h ( x , y ) φ b 1 ( x , z ) φ b 2 ( z , y ) J I [ φ h ( x , y ) :– φ b 1 ( x , z ) ∧ φ b 2 ( z , y )] Inconsistency Loss Minervini et al. (2017), UAI 14/30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend