SLIDE 1
Monte Carlo Semantics
Robust Inference and Logical Pattern Processing Based on Integrated Deep and Shallow Semantics Richard Bergmair
University of Cambridge Computer Laboratory Natural Language Information Processing
Flatlands Meeting, Jun-06 2008
SLIDE 2 Why Inference?
◮ open-domain applications (QA, IE, IR, SUM) ◮ given two pieces of text T and H, find degree of logical
◮ entailment BK → (T → H) ◮ or similarity BK → (T ≡ H)
Example
(BK) ∀x : tall(x) ≡ high(x) (T) Including the 24m antenna, the Eiffel Tower is 325m high. ∴ (H) How tall is the Eiffel Tower?
SLIDE 3 Why Inference?
◮ open-domain applications (QA, IE, IR, SUM) ◮ given two pieces of text T and H, find degree of logical
◮ entailment BK → (T → H) ◮ or similarity BK → (T ≡ H)
Example
(BK) ∀x, y : acquire(x, y) → owns(x, y) (T) Yamaha had acquired the guitar brand Ibanez, through its takeover of Hoshino Gakki Group, earlier this week. ∴ (H)
SLIDE 4
What is Robust Inference?
...in an ideal world, we would have either
◮ (YES)
⊢ (BK1 → (BK2 → (. . . → (BKN → (T → H))))), ⊢ (BK1 → (BK2 → (. . . → (BKN → (T → ¬H)))));
◮ or (NO)
⊢ (BK1 → (BK2 → (. . . → (BKN → (T → H))))), ⊢ (BK1 → (BK2 → (. . . → (BKN → (T → ¬H))))). But if relevant knowledge is missing, say BK1, we could have
◮ (DON’T KNOW)
⊢ (BK2 → (. . . → (BKN → (T → H)))), ⊢ (BK2 → (. . . → (BKN → (T → ¬H)))).
SLIDE 5
What is Robust Inference?
In the DON’T KNOW situation, where ⊢ ϕ → ψ, while ⊢ ϕ → ¬ψ, we want to know whether or not ϕ → ψ > ϕ → ¬ψ, and, more generally, we want to know, whether for two candidate entailments ϕ1 → ψ1 and ϕ2 → ψ2, we have ϕ1 → ψ1 > ϕ2 → ψ2.
SLIDE 6
Outline
Problem Statement Propositional Model Theory & Graded Validity Shallow Inference: Bag-of-Words Encoding Deep Inference: Syllogistic Encoding Computation via the Monte Carlo Method
SLIDE 7
Outline
Problem Statement Propositional Model Theory & Graded Validity Shallow Inference: Bag-of-Words Encoding Deep Inference: Syllogistic Encoding Computation via the Monte Carlo Method
SLIDE 8
Model Theory: Classical Bivalent Logic
Definition
◮ Let Λ = p1, p2, . . . , pN be a propositional language. ◮ Let w = [w1, w2, . . . , wN] be a model.
The truth value · Λ
w is:
⊥ Λ
w = 0;
piΛ
w = wi for all i;
ϕ → ψΛ
w =
1 if ϕΛ
w = 1 and ψΛ w = 1,
if ϕΛ
w = 1 and ψΛ w = 0,
1 if ϕΛ
w = 0 and ψΛ w = 1,
1 if ϕΛ
w = 0 and ψΛ w = 0;
for all formulae ϕ and ψ over Λ.
SLIDE 9 Model Theory: Satisfiability, Validity
Definition
◮ ϕ is valid iff ϕw = 1 for all w ∈ W. ◮ ϕ is satisfiable iff ϕw = 1 for some w ∈ W.
Definition
ϕW = 1 |W|
ϕw.
Corollary
◮ ϕ is valid iff ϕW = 1. ◮ ϕ is satisfiable iff ϕW > 0.
SLIDE 10
Outline
Problem Statement Propositional Model Theory & Graded Validity Shallow Inference: Bag-of-Words Encoding Deep Inference: Syllogistic Encoding Computation via the Monte Carlo Method
SLIDE 11
Bag-of-Words Inference (1)
assume strictly bivalent models; Λ = {socrates, is, a, man, every}, |W| = 25; (T) socrates ∧ is ∧ a ∧ man ∴ (H) every ∧ man ∧ is ∧ socrates ; ΛT = {a}, |WT| = 21; ΛO = {socrates, is, man}, |WO| = 23; ΛH = {every}, |WH| = 21; 21 ∗ 23 ∗ 21 = 25;
SLIDE 12 Bag-of-Words Inference (2)
How to make this implication false?
◮ Choose the 1 out of 24 models from WT × WO which
makes the antecedent true.
◮ Choose any of the 21 − 1 models from WH which make the
consequent false. ...now compute an expected value. Count zero for the 1 ∗ (21 − 1) = 1 model that makes this implication false. Count
- ne, for the other 25 − 1. Now
T → HW = 1 − 1 25 = 0.96875,
T → HW = 1 − 2|ΛH| − 1 2|ΛT|+|ΛH|+|ΛO| .
SLIDE 13
Outline
Problem Statement Propositional Model Theory & Graded Validity Shallow Inference: Bag-of-Words Encoding Deep Inference: Syllogistic Encoding Computation via the Monte Carlo Method
SLIDE 14 Language: Syllogistic Syntax
Let Λ = {x1, x2, x3, y1, y2, y3}; All X are Y =(x1 →G y1) ∧ (x2 →G y2) ∧ (x3 →G y3) Some X are Y =(x1 ∧ y1) ∨ (x2 ∧ y2) ∨ (x3 ∧ y3) All X are not Y =¬ Some X are Y, Some X are not Y =¬ All X are Y, where ϕ →G ψ =
if ϕ ≤ ψ, ψ
SLIDE 15
Proof theory: A Modern Syllogism
∴ All X are X (S1), Some X are Y ∴ Some X are X (S2), All Y are Z All X are Y ∴ All X are Z (S3), All Y are Z Some Y are X ∴ Some X are Z (S4), Some X are Y ∴ Some Y are X (S5);
SLIDE 16
Proof theory: “Natural Logic”
∴ All (red X) are X (NL1), ∴ All cats are animals (NL2), Some X are (red Y) ∴ Some X are Y , Some X are cats ∴ Some X are animals , Some (red X) are Y ∴ Some X are Y , Some cats are Y ∴ Some animals are Y , All X are (red Y) ∴ All X are Y , All X are cats ∴ All X are animals , All X are Y ∴ All (red X) are Y , All animals are Y ∴ All cats are Y ;
SLIDE 17
Natural Logic Robustness Properties
Some X are Y ∴ Some X are (red Y) > Some X are Y ∴ Some X are (big (red Y)) , Some X are Y ∴ Some (red X) are Y > Some X are Y ∴ Some (big (red X)) are Y , All X are Y ∴ All X are (red Y) > All X are Y ∴ All X are (big (red Y)) , All (red X) are Y ∴ All X are Y > All (big (red X)) are Y ∴ All X are Y .
SLIDE 18
Outline
Problem Statement Propositional Model Theory & Graded Validity Shallow Inference: Bag-of-Words Encoding Deep Inference: Syllogistic Encoding Computation via the Monte Carlo Method
SLIDE 19 Model Theory: Satisfiability, Validity, Expectation
Definition
ϕW = 1 |W|
ϕw. How do we compute this in general?
Observation
◮ Draw w randomly from a uniform distribution over W.
Now ϕ is the probability that ϕ is true in w.
◮ If W ⊆ W is a random sample over population W, the
sample mean ϕW approaches the population mean ϕW as |W| approaches W.
SLIDE 20
Summary
This is work in progress, but could develop into a rich theoretic framework for robust textual inference and logical pattern processing.
◮ robust and practicable... ◮ ...in the worst case: does bag-of-words. ◮ justifiable from epistemology, logic, and linguistics; ◮ model theory enables inference via Monte Carlo method; ◮ proof theory is intuitive and well-understood;
is entailed in classical logic and entails natural logic.