Monte Carlo Semantics Robust Inference and Logical Pattern - - PowerPoint PPT Presentation

monte carlo semantics
SMART_READER_LITE
LIVE PREVIEW

Monte Carlo Semantics Robust Inference and Logical Pattern - - PowerPoint PPT Presentation

Monte Carlo Semantics Robust Inference and Logical Pattern Processing Based on Integrated Deep and Shallow Semantics Richard Bergmair University of Cambridge Computer Laboratory Natural Language Information Processing Flatlands Meeting, Jun-06


slide-1
SLIDE 1

Monte Carlo Semantics

Robust Inference and Logical Pattern Processing Based on Integrated Deep and Shallow Semantics Richard Bergmair

University of Cambridge Computer Laboratory Natural Language Information Processing

Flatlands Meeting, Jun-06 2008

slide-2
SLIDE 2

Why Inference?

◮ open-domain applications (QA, IE, IR, SUM) ◮ given two pieces of text T and H, find degree of logical

◮ entailment BK → (T → H) ◮ or similarity BK → (T ≡ H)

Example

(BK) ∀x : tall(x) ≡ high(x) (T) Including the 24m antenna, the Eiffel Tower is 325m high. ∴ (H) How tall is the Eiffel Tower?

slide-3
SLIDE 3

Why Inference?

◮ open-domain applications (QA, IE, IR, SUM) ◮ given two pieces of text T and H, find degree of logical

◮ entailment BK → (T → H) ◮ or similarity BK → (T ≡ H)

Example

(BK) ∀x, y : acquire(x, y) → owns(x, y) (T) Yamaha had acquired the guitar brand Ibanez, through its takeover of Hoshino Gakki Group, earlier this week. ∴ (H)

  • wns(Yamaha, Ibanez)
slide-4
SLIDE 4

What is Robust Inference?

...in an ideal world, we would have either

◮ (YES)

⊢ (BK1 → (BK2 → (. . . → (BKN → (T → H))))), ⊢ (BK1 → (BK2 → (. . . → (BKN → (T → ¬H)))));

◮ or (NO)

⊢ (BK1 → (BK2 → (. . . → (BKN → (T → H))))), ⊢ (BK1 → (BK2 → (. . . → (BKN → (T → ¬H))))). But if relevant knowledge is missing, say BK1, we could have

◮ (DON’T KNOW)

⊢ (BK2 → (. . . → (BKN → (T → H)))), ⊢ (BK2 → (. . . → (BKN → (T → ¬H)))).

slide-5
SLIDE 5

What is Robust Inference?

In the DON’T KNOW situation, where ⊢ ϕ → ψ, while ⊢ ϕ → ¬ψ, we want to know whether or not ϕ → ψ > ϕ → ¬ψ, and, more generally, we want to know, whether for two candidate entailments ϕ1 → ψ1 and ϕ2 → ψ2, we have ϕ1 → ψ1 > ϕ2 → ψ2.

slide-6
SLIDE 6

Outline

Problem Statement Propositional Model Theory & Graded Validity Shallow Inference: Bag-of-Words Encoding Deep Inference: Syllogistic Encoding Computation via the Monte Carlo Method

slide-7
SLIDE 7

Outline

Problem Statement Propositional Model Theory & Graded Validity Shallow Inference: Bag-of-Words Encoding Deep Inference: Syllogistic Encoding Computation via the Monte Carlo Method

slide-8
SLIDE 8

Model Theory: Classical Bivalent Logic

Definition

◮ Let Λ = p1, p2, . . . , pN be a propositional language. ◮ Let w = [w1, w2, . . . , wN] be a model.

The truth value · Λ

w is:

⊥ Λ

w = 0;

piΛ

w = wi for all i;

ϕ → ψΛ

w =

           1 if ϕΛ

w = 1 and ψΛ w = 1,

if ϕΛ

w = 1 and ψΛ w = 0,

1 if ϕΛ

w = 0 and ψΛ w = 1,

1 if ϕΛ

w = 0 and ψΛ w = 0;

for all formulae ϕ and ψ over Λ.

slide-9
SLIDE 9

Model Theory: Satisfiability, Validity

Definition

◮ ϕ is valid iff ϕw = 1 for all w ∈ W. ◮ ϕ is satisfiable iff ϕw = 1 for some w ∈ W.

Definition

ϕW = 1 |W|

  • w∈W

ϕw.

Corollary

◮ ϕ is valid iff ϕW = 1. ◮ ϕ is satisfiable iff ϕW > 0.

slide-10
SLIDE 10

Outline

Problem Statement Propositional Model Theory & Graded Validity Shallow Inference: Bag-of-Words Encoding Deep Inference: Syllogistic Encoding Computation via the Monte Carlo Method

slide-11
SLIDE 11

Bag-of-Words Inference (1)

assume strictly bivalent models; Λ = {socrates, is, a, man, every}, |W| = 25; (T) socrates ∧ is ∧ a ∧ man ∴ (H) every ∧ man ∧ is ∧ socrates ; ΛT = {a}, |WT| = 21; ΛO = {socrates, is, man}, |WO| = 23; ΛH = {every}, |WH| = 21; 21 ∗ 23 ∗ 21 = 25;

slide-12
SLIDE 12

Bag-of-Words Inference (2)

How to make this implication false?

◮ Choose the 1 out of 24 models from WT × WO which

makes the antecedent true.

◮ Choose any of the 21 − 1 models from WH which make the

consequent false. ...now compute an expected value. Count zero for the 1 ∗ (21 − 1) = 1 model that makes this implication false. Count

  • ne, for the other 25 − 1. Now

T → HW = 1 − 1 25 = 0.96875,

  • r, more generally,

T → HW = 1 − 2|ΛH| − 1 2|ΛT|+|ΛH|+|ΛO| .

slide-13
SLIDE 13

Outline

Problem Statement Propositional Model Theory & Graded Validity Shallow Inference: Bag-of-Words Encoding Deep Inference: Syllogistic Encoding Computation via the Monte Carlo Method

slide-14
SLIDE 14

Language: Syllogistic Syntax

Let Λ = {x1, x2, x3, y1, y2, y3}; All X are Y =(x1 →G y1) ∧ (x2 →G y2) ∧ (x3 →G y3) Some X are Y =(x1 ∧ y1) ∨ (x2 ∧ y2) ∨ (x3 ∧ y3) All X are not Y =¬ Some X are Y, Some X are not Y =¬ All X are Y, where ϕ →G ψ =

  • 1

if ϕ ≤ ψ, ψ

  • therwise.
slide-15
SLIDE 15

Proof theory: A Modern Syllogism

∴ All X are X (S1), Some X are Y ∴ Some X are X (S2), All Y are Z All X are Y ∴ All X are Z (S3), All Y are Z Some Y are X ∴ Some X are Z (S4), Some X are Y ∴ Some Y are X (S5);

slide-16
SLIDE 16

Proof theory: “Natural Logic”

∴ All (red X) are X (NL1), ∴ All cats are animals (NL2), Some X are (red Y) ∴ Some X are Y , Some X are cats ∴ Some X are animals , Some (red X) are Y ∴ Some X are Y , Some cats are Y ∴ Some animals are Y , All X are (red Y) ∴ All X are Y , All X are cats ∴ All X are animals , All X are Y ∴ All (red X) are Y , All animals are Y ∴ All cats are Y ;

slide-17
SLIDE 17

Natural Logic Robustness Properties

Some X are Y ∴ Some X are (red Y) > Some X are Y ∴ Some X are (big (red Y)) , Some X are Y ∴ Some (red X) are Y > Some X are Y ∴ Some (big (red X)) are Y , All X are Y ∴ All X are (red Y) > All X are Y ∴ All X are (big (red Y)) , All (red X) are Y ∴ All X are Y > All (big (red X)) are Y ∴ All X are Y .

slide-18
SLIDE 18

Outline

Problem Statement Propositional Model Theory & Graded Validity Shallow Inference: Bag-of-Words Encoding Deep Inference: Syllogistic Encoding Computation via the Monte Carlo Method

slide-19
SLIDE 19

Model Theory: Satisfiability, Validity, Expectation

Definition

ϕW = 1 |W|

  • w∈W

ϕw. How do we compute this in general?

Observation

◮ Draw w randomly from a uniform distribution over W.

Now ϕ is the probability that ϕ is true in w.

◮ If W ⊆ W is a random sample over population W, the

sample mean ϕW approaches the population mean ϕW as |W| approaches W.

slide-20
SLIDE 20

Summary

This is work in progress, but could develop into a rich theoretic framework for robust textual inference and logical pattern processing.

◮ robust and practicable... ◮ ...in the worst case: does bag-of-words. ◮ justifiable from epistemology, logic, and linguistics; ◮ model theory enables inference via Monte Carlo method; ◮ proof theory is intuitive and well-understood;

is entailed in classical logic and entails natural logic.