A Framework for Incorporating General Domain Knowledge into Latent - - PowerPoint PPT Presentation

a framework for incorporating general domain knowledge
SMART_READER_LITE
LIVE PREVIEW

A Framework for Incorporating General Domain Knowledge into Latent - - PowerPoint PPT Presentation

A Framework for Incorporating General Domain Knowledge into Latent Dirichlet Allocation using First-Order Logic David Andrzejewski 1 Xiaojin Zhu 2 Mark Craven 3 , 2 Benjamin Recht 2 1 Center for Applied Scientific Computing 2 Department of


slide-1
SLIDE 1

A Framework for Incorporating General Domain Knowledge into Latent Dirichlet Allocation using First-Order Logic

David Andrzejewski1 Xiaojin Zhu 2 Mark Craven 3,2 Benjamin Recht 2

1Center for Applied Scientific Computing 2Department of Computer Sciences

Lawrence Livermore National Laboratory (USA)

3Department of Biostatistics

and Medical Informatics University of Wisconsin–Madison (USA)

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 1 / 18

slide-2
SLIDE 2

Topic modeling with Latent Dirichlet Allocation (LDA)

Blei et al, JMLR 2003

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 2 / 18

slide-3
SLIDE 3

Topic modeling with Latent Dirichlet Allocation (LDA)

Blei et al, JMLR 2003 Human embryonic stem cell research may benefit patients with genetic risk factors... Patients at risk for drug- resistant infection...

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 2 / 18

slide-4
SLIDE 4

Topic modeling with Latent Dirichlet Allocation (LDA)

Blei et al, JMLR 2003 Human embryonic stem cell research may benefit patients with genetic risk factors... Patients at risk for drug- resistant infection...

Patients at risk for drug-resistant

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 2 / 18

slide-5
SLIDE 5

Topic modeling applications

Research trends (Wang & McCallum, 2006) Info retrieval (UMass) (also KDD 2011!) Author/document profiling

Scientific impact/influence (Gerrish & Blei, 2009) Match papers to reviewers (Mimno & McCallum, 2007)

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 3 / 18

slide-6
SLIDE 6

Topic modeling applications

Research trends (Wang & McCallum, 2006) Info retrieval (UMass) (also KDD 2011!) Author/document profiling

Scientific impact/influence (Gerrish & Blei, 2009) Match papers to reviewers (Mimno & McCallum, 2007)

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 3 / 18

slide-7
SLIDE 7

Topic modeling applications

Research trends (Wang & McCallum, 2006) Info retrieval (UMass) (also KDD 2011!) Author/document profiling

Scientific impact/influence (Gerrish & Blei, 2009) Match papers to reviewers (Mimno & McCallum, 2007)

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 3 / 18

slide-8
SLIDE 8

Unsupervised LDA

Extend the model? Add domain knowledge “These words words do (not) belong in the same topic” “I want a topic about X” “This topic is incompatible with this document” “These topics are incompatible - should not co-occur”

First-Order Logic latent Dirichlet Allocation (Fold·all)

Weighted knowledge base (KB) of first-order logic (FOL) rules (Markov Logic Networks, Richardson and Domingos 2006) Learned topics φ influenced by both

Word-document statistics (as in LDA) Domain knowledge rules (as in MLN)

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 4 / 18

slide-9
SLIDE 9

Unsupervised LDA

Extend the model? Add domain knowledge “These words words do (not) belong in the same topic” “I want a topic about X” “This topic is incompatible with this document” “These topics are incompatible - should not co-occur”

First-Order Logic latent Dirichlet Allocation (Fold·all)

Weighted knowledge base (KB) of first-order logic (FOL) rules (Markov Logic Networks, Richardson and Domingos 2006) Learned topics φ influenced by both

Word-document statistics (as in LDA) Domain knowledge rules (as in MLN)

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 4 / 18

slide-10
SLIDE 10

Unsupervised LDA

Extend the model? Add domain knowledge “These words words do (not) belong in the same topic” “I want a topic about X” “This topic is incompatible with this document” “These topics are incompatible - should not co-occur”

First-Order Logic latent Dirichlet Allocation (Fold·all)

Weighted knowledge base (KB) of first-order logic (FOL) rules (Markov Logic Networks, Richardson and Domingos 2006) Learned topics φ influenced by both

Word-document statistics (as in LDA) Domain knowledge rules (as in MLN)

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 4 / 18

slide-11
SLIDE 11

Unsupervised LDA

Extend the model? Add domain knowledge “These words words do (not) belong in the same topic” “I want a topic about X” “This topic is incompatible with this document” “These topics are incompatible - should not co-occur”

First-Order Logic latent Dirichlet Allocation (Fold·all)

Weighted knowledge base (KB) of first-order logic (FOL) rules (Markov Logic Networks, Richardson and Domingos 2006) Learned topics φ influenced by both

Word-document statistics (as in LDA) Domain knowledge rules (as in MLN)

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 4 / 18

slide-12
SLIDE 12

Unsupervised LDA

Extend the model? Add domain knowledge “These words words do (not) belong in the same topic” “I want a topic about X” “This topic is incompatible with this document” “These topics are incompatible - should not co-occur”

First-Order Logic latent Dirichlet Allocation (Fold·all)

Weighted knowledge base (KB) of first-order logic (FOL) rules (Markov Logic Networks, Richardson and Domingos 2006) Learned topics φ influenced by both

Word-document statistics (as in LDA) Domain knowledge rules (as in MLN)

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 4 / 18

slide-13
SLIDE 13

Unsupervised LDA

Extend the model? Add domain knowledge “These words words do (not) belong in the same topic” “I want a topic about X” “This topic is incompatible with this document” “These topics are incompatible - should not co-occur”

First-Order Logic latent Dirichlet Allocation (Fold·all)

Weighted knowledge base (KB) of first-order logic (FOL) rules (Markov Logic Networks, Richardson and Domingos 2006) Learned topics φ influenced by both

Word-document statistics (as in LDA) Domain knowledge rules (as in MLN)

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 4 / 18

slide-14
SLIDE 14

Unsupervised LDA

Extend the model? Add domain knowledge “These words words do (not) belong in the same topic” “I want a topic about X” “This topic is incompatible with this document” “These topics are incompatible - should not co-occur”

First-Order Logic latent Dirichlet Allocation (Fold·all)

Weighted knowledge base (KB) of first-order logic (FOL) rules (Markov Logic Networks, Richardson and Domingos 2006) Learned topics φ influenced by both

Word-document statistics (as in LDA) Domain knowledge rules (as in MLN)

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 4 / 18

slide-15
SLIDE 15

Unsupervised LDA

Extend the model? Add domain knowledge “These words words do (not) belong in the same topic” “I want a topic about X” “This topic is incompatible with this document” “These topics are incompatible - should not co-occur”

First-Order Logic latent Dirichlet Allocation (Fold·all)

Weighted knowledge base (KB) of first-order logic (FOL) rules (Markov Logic Networks, Richardson and Domingos 2006) Learned topics φ influenced by both

Word-document statistics (as in LDA) Domain knowledge rules (as in MLN)

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 4 / 18

slide-16
SLIDE 16

Unsupervised LDA

Extend the model? Add domain knowledge “These words words do (not) belong in the same topic” “I want a topic about X” “This topic is incompatible with this document” “These topics are incompatible - should not co-occur”

First-Order Logic latent Dirichlet Allocation (Fold·all)

Weighted knowledge base (KB) of first-order logic (FOL) rules (Markov Logic Networks, Richardson and Domingos 2006) Learned topics φ influenced by both

Word-document statistics (as in LDA) Domain knowledge rules (as in MLN)

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 4 / 18

slide-17
SLIDE 17

Unsupervised LDA

Extend the model? Add domain knowledge “These words words do (not) belong in the same topic” “I want a topic about X” “This topic is incompatible with this document” “These topics are incompatible - should not co-occur”

First-Order Logic latent Dirichlet Allocation (Fold·all)

Weighted knowledge base (KB) of first-order logic (FOL) rules (Markov Logic Networks, Richardson and Domingos 2006) Learned topics φ influenced by both

Word-document statistics (as in LDA) Domain knowledge rules (as in MLN)

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 4 / 18

slide-18
SLIDE 18

Unsupervised LDA

Extend the model? Add domain knowledge “These words words do (not) belong in the same topic” “I want a topic about X” “This topic is incompatible with this document” “These topics are incompatible - should not co-occur”

First-Order Logic latent Dirichlet Allocation (Fold·all)

Weighted knowledge base (KB) of first-order logic (FOL) rules (Markov Logic Networks, Richardson and Domingos 2006) Learned topics φ influenced by both

Word-document statistics (as in LDA) Domain knowledge rules (as in MLN)

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 4 / 18

slide-19
SLIDE 19

Representing LDA with logical predicates

Value Logical Predicate Description LDA zi = t Z(i, t) Latent topic wi = v W(i, v) Observed word di = j D(i, j) Observed document Unified way to capture metadata / annotations

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 5 / 18

slide-20
SLIDE 20

Representing LDA with logical predicates

Value Logical Predicate Description LDA zi = t Z(i, t) Latent topic wi = v W(i, v) Observed word di = j D(i, j) Observed document Attributes HasLabel(j, ℓ) Document label S(i, k) Observed sentence Unified way to capture metadata / annotations

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 5 / 18

slide-21
SLIDE 21

Representing LDA with logical predicates

Value Logical Predicate Description LDA zi = t Z(i, t) Latent topic wi = v W(i, v) Observed word di = j D(i, j) Observed document Attributes HasLabel(j, ℓ) Document label S(i, k) Observed sentence Unified way to capture metadata / annotations

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 5 / 18

slide-22
SLIDE 22

Encoding domain knowledge in First-Order Logic

CNF Knowledge Base KB = {(λ1, ψ1), . . . , (λL, ψL)}

Rule ψk Weight λk > 0 (“strength” of rule)

Example KB

Rule λk ∀ ψk Seed 5 i W(i, embryo) ⇒ Z(i, 3) Doc label 500 i, j D(i, j) ∧ HasLabel(j, +) ⇒ ¬Z(i, 3) Can specify “contradictory” domain knowledge!

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 6 / 18

slide-23
SLIDE 23

Encoding domain knowledge in First-Order Logic

CNF Knowledge Base KB = {(λ1, ψ1), . . . , (λL, ψL)}

Rule ψk Weight λk > 0 (“strength” of rule)

Example KB

Rule λk ∀ ψk Seed 5 i W(i, embryo) ⇒ Z(i, 3) Doc label 500 i, j D(i, j) ∧ HasLabel(j, +) ⇒ ¬Z(i, 3) Can specify “contradictory” domain knowledge!

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 6 / 18

slide-24
SLIDE 24

Encoding domain knowledge in First-Order Logic

CNF Knowledge Base KB = {(λ1, ψ1), . . . , (λL, ψL)}

Rule ψk Weight λk > 0 (“strength” of rule)

Example KB

Rule λk ∀ ψk Seed 5 i W(i, embryo) ⇒ Z(i, 3) Doc label 500 i, j D(i, j) ∧ HasLabel(j, +) ⇒ ¬Z(i, 3) Can specify “contradictory” domain knowledge!

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 6 / 18

slide-25
SLIDE 25

Encoding domain knowledge in First-Order Logic

CNF Knowledge Base KB = {(λ1, ψ1), . . . , (λL, ψL)}

Rule ψk Weight λk > 0 (“strength” of rule)

Example KB

Rule λk ∀ ψk Seed 5 i W(i, embryo) ⇒ Z(i, 3) Doc label 500 i, j D(i, j) ∧ HasLabel(j, +) ⇒ ¬Z(i, 3) Can specify “contradictory” domain knowledge!

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 6 / 18

slide-26
SLIDE 26

Encoding domain knowledge in First-Order Logic

CNF Knowledge Base KB = {(λ1, ψ1), . . . , (λL, ψL)}

Rule ψk Weight λk > 0 (“strength” of rule)

Example KB

Rule λk ∀ ψk Seed 5 i W(i, embryo) ⇒ Z(i, 3) Doc label 500 i, j D(i, j) ∧ HasLabel(j, +) ⇒ ¬Z(i, 3) Can specify “contradictory” domain knowledge!

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 6 / 18

slide-27
SLIDE 27

Encoding domain knowledge in First-Order Logic

CNF Knowledge Base KB = {(λ1, ψ1), . . . , (λL, ψL)}

Rule ψk Weight λk > 0 (“strength” of rule)

Example KB

Rule λk ∀ ψk Seed 5 i W(i, embryo) ⇒ Z(i, 3) Doc label 500 i, j D(i, j) ∧ HasLabel(j, +) ⇒ ¬Z(i, 3) Can specify “contradictory” domain knowledge!

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 6 / 18

slide-28
SLIDE 28

Propositionalization / Grounding

Example Cannot-Link rule ψCL

λk ∀ ψk 5 i, j, t W(i, neural) ∧ W(j, disorder) ⇒ ¬Z(i, t) ∨ ¬Z(j, t) G(ψCL) = set of ground formulas g for EVERY (i, j, t)

i, j ∈ {1, 2, . . . , N} t ∈ {1, 2, . . . , T}

✶g(z) =

  • 1

if g true under z else Each g ∈ G(ψCL) → λ✶g(z) term (as in MLN) Combinatorial explosion!

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 7 / 18

slide-29
SLIDE 29

Propositionalization / Grounding

Example Cannot-Link rule ψCL

λk ∀ ψk 5 i, j, t W(i, neural) ∧ W(j, disorder) ⇒ ¬Z(i, t) ∨ ¬Z(j, t) G(ψCL) = set of ground formulas g for EVERY (i, j, t)

i, j ∈ {1, 2, . . . , N} t ∈ {1, 2, . . . , T}

✶g(z) =

  • 1

if g true under z else Each g ∈ G(ψCL) → λ✶g(z) term (as in MLN) Combinatorial explosion!

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 7 / 18

slide-30
SLIDE 30

Propositionalization / Grounding

Example Cannot-Link rule ψCL

λk ∀ ψk 5 i, j, t W(i, neural) ∧ W(j, disorder) ⇒ ¬Z(i, t) ∨ ¬Z(j, t) G(ψCL) = set of ground formulas g for EVERY (i, j, t)

i, j ∈ {1, 2, . . . , N} t ∈ {1, 2, . . . , T}

✶g(z) =

  • 1

if g true under z else Each g ∈ G(ψCL) → λ✶g(z) term (as in MLN) Combinatorial explosion!

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 7 / 18

slide-31
SLIDE 31

Propositionalization / Grounding

Example Cannot-Link rule ψCL

λk ∀ ψk 5 i, j, t W(i, neural) ∧ W(j, disorder) ⇒ ¬Z(i, t) ∨ ¬Z(j, t) G(ψCL) = set of ground formulas g for EVERY (i, j, t)

i, j ∈ {1, 2, . . . , N} t ∈ {1, 2, . . . , T}

✶g(z) =

  • 1

if g true under z else Each g ∈ G(ψCL) → λ✶g(z) term (as in MLN) Combinatorial explosion!

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 7 / 18

slide-32
SLIDE 32

Propositionalization / Grounding

Example Cannot-Link rule ψCL

λk ∀ ψk 5 i, j, t W(i, neural) ∧ W(j, disorder) ⇒ ¬Z(i, t) ∨ ¬Z(j, t) G(ψCL) = set of ground formulas g for EVERY (i, j, t)

i, j ∈ {1, 2, . . . , N} t ∈ {1, 2, . . . , T}

✶g(z) =

  • 1

if g true under z else Each g ∈ G(ψCL) → λ✶g(z) term (as in MLN) Combinatorial explosion!

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 7 / 18

slide-33
SLIDE 33

Propositionalization / Grounding

Example Cannot-Link rule ψCL

λk ∀ ψk 5 i, j, t W(i, neural) ∧ W(j, disorder) ⇒ ¬Z(i, t) ∨ ¬Z(j, t) G(ψCL) = set of ground formulas g for EVERY (i, j, t)

i, j ∈ {1, 2, . . . , N} t ∈ {1, 2, . . . , T}

✶g(z) =

  • 1

if g true under z else Each g ∈ G(ψCL) → λ✶g(z) term (as in MLN) Combinatorial explosion!

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 7 / 18

slide-34
SLIDE 34

Propositionalization / Grounding

Example Cannot-Link rule ψCL

λk ∀ ψk 5 i, j, t W(i, neural) ∧ W(j, disorder) ⇒ ¬Z(i, t) ∨ ¬Z(j, t) G(ψCL) = set of ground formulas g for EVERY (i, j, t)

i, j ∈ {1, 2, . . . , N} t ∈ {1, 2, . . . , T}

✶g(z) =

  • 1

if g true under z else Each g ∈ G(ψCL) → λ✶g(z) term (as in MLN) Combinatorial explosion!

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 7 / 18

slide-35
SLIDE 35

LDA graphical model

P ∝ T

  • t

p(φt|β)  

D

  • j

p(θj|α)   N

  • i

φzi(wi)θdi(zi)

  • Andrzejewski (LLNL)

LDA with Logical Domain Knowledge IJCAI 2011 8 / 18

slide-36
SLIDE 36

LDA graphical model

P ∝ T

  • t

p(φt|β)  

D

  • j

p(θj|α)   N

  • i

φzi(wi)θdi(zi)

  • Andrzejewski (LLNL)

LDA with Logical Domain Knowledge IJCAI 2011 8 / 18

slide-37
SLIDE 37

LDA graphical model → Fold·all

P ∝ T

  • t

p(φt|β)  

D

  • j

p(θj|α)   N

  • i

φzi(wi)θdi(zi)

  • × exp

 

L

  • k
  • g∈G(ψk)

λk✶g(z, w, d, o)  

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 8 / 18

slide-38
SLIDE 38

MAP inference - Q(z, φ, θ)

Alternating Optimization with Mirror Descent

For each step

1

(φ, θ) ← argmax

φ,θ

Q(z, φ, θ) z fixed

2

z ← argmax

z

Q(z, φ, θ) (φ, θ) fixed

z \ zKB ← argmax with respect to (φ, θ) TRIVIAL zKB ← mirror descent HARD

Scalable approach to optimize zKB

1

Relax discrete problem to continuous

2

Optimize relaxed problem with stochastic gradient descent

3

Round relaxed z to recover final assignment

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 9 / 18

slide-39
SLIDE 39

MAP inference - Q(z, φ, θ)

Alternating Optimization with Mirror Descent

For each step

1

(φ, θ) ← argmax

φ,θ

Q(z, φ, θ) z fixed

2

z ← argmax

z

Q(z, φ, θ) (φ, θ) fixed

z \ zKB ← argmax with respect to (φ, θ) TRIVIAL zKB ← mirror descent HARD

Scalable approach to optimize zKB

1

Relax discrete problem to continuous

2

Optimize relaxed problem with stochastic gradient descent

3

Round relaxed z to recover final assignment

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 9 / 18

slide-40
SLIDE 40

MAP inference - Q(z, φ, θ)

Alternating Optimization with Mirror Descent

For each step

1

(φ, θ) ← argmax

φ,θ

Q(z, φ, θ) z fixed

2

z ← argmax

z

Q(z, φ, θ) (φ, θ) fixed

z \ zKB ← argmax with respect to (φ, θ) TRIVIAL zKB ← mirror descent HARD

Scalable approach to optimize zKB

1

Relax discrete problem to continuous

2

Optimize relaxed problem with stochastic gradient descent

3

Round relaxed z to recover final assignment

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 9 / 18

slide-41
SLIDE 41

MAP inference - Q(z, φ, θ)

Alternating Optimization with Mirror Descent

For each step

1

(φ, θ) ← argmax

φ,θ

Q(z, φ, θ) z fixed

2

z ← argmax

z

Q(z, φ, θ) (φ, θ) fixed

z \ zKB ← argmax with respect to (φ, θ) TRIVIAL zKB ← mirror descent HARD

Scalable approach to optimize zKB

1

Relax discrete problem to continuous

2

Optimize relaxed problem with stochastic gradient descent

3

Round relaxed z to recover final assignment

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 9 / 18

slide-42
SLIDE 42

MAP inference - Q(z, φ, θ)

Alternating Optimization with Mirror Descent

For each step

1

(φ, θ) ← argmax

φ,θ

Q(z, φ, θ) z fixed

2

z ← argmax

z

Q(z, φ, θ) (φ, θ) fixed

z \ zKB ← argmax with respect to (φ, θ) TRIVIAL zKB ← mirror descent HARD

Scalable approach to optimize zKB

1

Relax discrete problem to continuous

2

Optimize relaxed problem with stochastic gradient descent

3

Round relaxed z to recover final assignment

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 9 / 18

slide-43
SLIDE 43

MAP inference - Q(z, φ, θ)

Alternating Optimization with Mirror Descent

For each step

1

(φ, θ) ← argmax

φ,θ

Q(z, φ, θ) z fixed

2

z ← argmax

z

Q(z, φ, θ) (φ, θ) fixed

z \ zKB ← argmax with respect to (φ, θ) TRIVIAL zKB ← mirror descent HARD

Scalable approach to optimize zKB

1

Relax discrete problem to continuous

2

Optimize relaxed problem with stochastic gradient descent

3

Round relaxed z to recover final assignment

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 9 / 18

slide-44
SLIDE 44

MAP inference - Q(z, φ, θ)

Alternating Optimization with Mirror Descent

For each step

1

(φ, θ) ← argmax

φ,θ

Q(z, φ, θ) z fixed

2

z ← argmax

z

Q(z, φ, θ) (φ, θ) fixed

z \ zKB ← argmax with respect to (φ, θ) TRIVIAL zKB ← mirror descent HARD

Scalable approach to optimize zKB

1

Relax discrete problem to continuous

2

Optimize relaxed problem with stochastic gradient descent

3

Round relaxed z to recover final assignment

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 9 / 18

slide-45
SLIDE 45

MAP inference - Q(z, φ, θ)

Alternating Optimization with Mirror Descent

For each step

1

(φ, θ) ← argmax

φ,θ

Q(z, φ, θ) z fixed

2

z ← argmax

z

Q(z, φ, θ) (φ, θ) fixed

z \ zKB ← argmax with respect to (φ, θ) TRIVIAL zKB ← mirror descent HARD

Scalable approach to optimize zKB

1

Relax discrete problem to continuous

2

Optimize relaxed problem with stochastic gradient descent

3

Round relaxed z to recover final assignment

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 9 / 18

slide-46
SLIDE 46

Scalable zKB inference

argmax

z N

  • i

T

  • t

zit log φt(wi)θdi(t) +

L

  • k
  • g∈G(ψk)

λk✶g(z)

1

Continuous relaxation

zi = t Represent indicator function ✶g(z) as polynomial in zit Can calculate ∇Q

2

Stochastic gradient - sample a term from objective function Q

Logic: single ground formula g LDA: single corpus index i

3

Entropic Mirror Descent (Beck & Teboulle, 2003) zit ← zit exp (η∇zitf)

  • t′ zit′ exp (η∇zit′f)

4

Recover discrete z: zi = argmax

t

zit for i = 1, . . . , N

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 10 / 18

slide-47
SLIDE 47

Scalable zKB inference

argmax

z N

  • i

T

  • t

zit log φt(wi)θdi(t) +

L

  • k
  • g∈G(ψk)

λk✶g(z)

1

Continuous relaxation

zi = t → zit ∈ {0, 1} Represent indicator function ✶g(z) as polynomial in zit Can calculate ∇Q

2

Stochastic gradient - sample a term from objective function Q

Logic: single ground formula g LDA: single corpus index i

3

Entropic Mirror Descent (Beck & Teboulle, 2003) zit ← zit exp (η∇zitf)

  • t′ zit′ exp (η∇zit′f)

4

Recover discrete z: zi = argmax

t

zit for i = 1, . . . , N

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 10 / 18

slide-48
SLIDE 48

Scalable zKB inference

argmax

z N

  • i

T

  • t

zit log φt(wi)θdi(t) +

L

  • k
  • g∈G(ψk)

λk✶g(z)

1

Continuous relaxation

zi = t → zit ∈ {0, 1} → zit ∈ [0, 1] Represent indicator function ✶g(z) as polynomial in zit Can calculate ∇Q

2

Stochastic gradient - sample a term from objective function Q

Logic: single ground formula g LDA: single corpus index i

3

Entropic Mirror Descent (Beck & Teboulle, 2003) zit ← zit exp (η∇zitf)

  • t′ zit′ exp (η∇zit′f)

4

Recover discrete z: zi = argmax

t

zit for i = 1, . . . , N

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 10 / 18

slide-49
SLIDE 49

Scalable zKB inference

argmax

z N

  • i

T

  • t

zit log φt(wi)θdi(t) +

L

  • k
  • g∈G(ψk)

λk✶g(z)

1

Continuous relaxation

zi = t → zit ∈ {0, 1} → zit ∈ [0, 1] Represent indicator function ✶g(z) as polynomial in zit Can calculate ∇Q

2

Stochastic gradient - sample a term from objective function Q

Logic: single ground formula g LDA: single corpus index i

3

Entropic Mirror Descent (Beck & Teboulle, 2003) zit ← zit exp (η∇zitf)

  • t′ zit′ exp (η∇zit′f)

4

Recover discrete z: zi = argmax

t

zit for i = 1, . . . , N

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 10 / 18

slide-50
SLIDE 50

Scalable zKB inference

argmax

z N

  • i

T

  • t

zit log φt(wi)θdi(t) +

L

  • k
  • g∈G(ψk)

λk✶g(z)

1

Continuous relaxation

zi = t → zit ∈ {0, 1} → zit ∈ [0, 1] Represent indicator function ✶g(z) as polynomial in zit Can calculate ∇Q

2

Stochastic gradient - sample a term from objective function Q

Logic: single ground formula g LDA: single corpus index i

3

Entropic Mirror Descent (Beck & Teboulle, 2003) zit ← zit exp (η∇zitf)

  • t′ zit′ exp (η∇zit′f)

4

Recover discrete z: zi = argmax

t

zit for i = 1, . . . , N

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 10 / 18

slide-51
SLIDE 51

Scalable zKB inference

argmax

z N

  • i

T

  • t

zit log φt(wi)θdi(t) +

L

  • k
  • g∈G(ψk)

λk✶g(z)

1

Continuous relaxation

zi = t → zit ∈ {0, 1} → zit ∈ [0, 1] Represent indicator function ✶g(z) as polynomial in zit Can calculate ∇Q...but G(ψk) may be very large

2

Stochastic gradient - sample a term from objective function Q

Logic: single ground formula g LDA: single corpus index i

3

Entropic Mirror Descent (Beck & Teboulle, 2003) zit ← zit exp (η∇zitf)

  • t′ zit′ exp (η∇zit′f)

4

Recover discrete z: zi = argmax

t

zit for i = 1, . . . , N

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 10 / 18

slide-52
SLIDE 52

Scalable zKB inference

argmax

z N

  • i

T

  • t

zit log φt(wi)θdi(t) +

L

  • k
  • g∈G(ψk)

λk✶g(z)

1

Continuous relaxation

zi = t → zit ∈ {0, 1} → zit ∈ [0, 1] Represent indicator function ✶g(z) as polynomial in zit Can calculate ∇Q...but G(ψk) may be very large

2

Stochastic gradient - sample a term from objective function Q

Logic: single ground formula g LDA: single corpus index i

3

Entropic Mirror Descent (Beck & Teboulle, 2003) zit ← zit exp (η∇zitf)

  • t′ zit′ exp (η∇zit′f)

4

Recover discrete z: zi = argmax

t

zit for i = 1, . . . , N

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 10 / 18

slide-53
SLIDE 53

Scalable zKB inference

argmax

z N

  • i

T

  • t

zit log φt(wi)θdi(t) +

L

  • k
  • g∈G(ψk)

λk✶g(z)

1

Continuous relaxation

zi = t → zit ∈ {0, 1} → zit ∈ [0, 1] Represent indicator function ✶g(z) as polynomial in zit Can calculate ∇Q...but G(ψk) may be very large

2

Stochastic gradient - sample a term from objective function Q

Logic: single ground formula g LDA: single corpus index i

3

Entropic Mirror Descent (Beck & Teboulle, 2003) zit ← zit exp (η∇zitf)

  • t′ zit′ exp (η∇zit′f)

4

Recover discrete z: zi = argmax

t

zit for i = 1, . . . , N

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 10 / 18

slide-54
SLIDE 54

Scalable zKB inference

argmax

z N

  • i

T

  • t

zit log φt(wi)θdi(t) +

L

  • k
  • g∈G(ψk)

λk✶g(z)

1

Continuous relaxation

zi = t → zit ∈ {0, 1} → zit ∈ [0, 1] Represent indicator function ✶g(z) as polynomial in zit Can calculate ∇Q...but G(ψk) may be very large

2

Stochastic gradient - sample a term from objective function Q

Logic: single ground formula g LDA: single corpus index i

3

Entropic Mirror Descent (Beck & Teboulle, 2003) zit ← zit exp (η∇zitf)

  • t′ zit′ exp (η∇zit′f)

4

Recover discrete z: zi = argmax

t

zit for i = 1, . . . , N

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 10 / 18

slide-55
SLIDE 55

Scalable zKB inference

argmax

z N

  • i

T

  • t

zit log φt(wi)θdi(t) +

L

  • k
  • g∈G(ψk)

λk✶g(z)

1

Continuous relaxation

zi = t → zit ∈ {0, 1} → zit ∈ [0, 1] Represent indicator function ✶g(z) as polynomial in zit Can calculate ∇Q...but G(ψk) may be very large

2

Stochastic gradient - sample a term from objective function Q

Logic: single ground formula g LDA: single corpus index i

3

Entropic Mirror Descent (Beck & Teboulle, 2003) zit ← zit exp (η∇zitf)

  • t′ zit′ exp (η∇zit′f)

4

Recover discrete z: zi = argmax

t

zit for i = 1, . . . , N

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 10 / 18

slide-56
SLIDE 56

Scalable zKB inference

argmax

z N

  • i

T

  • t

zit log φt(wi)θdi(t) +

L

  • k
  • g∈G(ψk)

λk✶g(z)

1

Continuous relaxation

zi = t → zit ∈ {0, 1} → zit ∈ [0, 1] Represent indicator function ✶g(z) as polynomial in zit Can calculate ∇Q...but G(ψk) may be very large

2

Stochastic gradient - sample a term from objective function Q

Logic: single ground formula g LDA: single corpus index i

3

Entropic Mirror Descent (Beck & Teboulle, 2003) zit ← zit exp (η∇zitf)

  • t′ zit′ exp (η∇zit′f)

4

Recover discrete z: zi = argmax

t

zit for i = 1, . . . , N

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 10 / 18

slide-57
SLIDE 57

Experimental questions

Can logic KBs generalize to unseen documents? Can Alternating Optimization with Mirror Descent scale? Can we recover useful logic-influenced topics?

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 11 / 18

slide-58
SLIDE 58

Experimental questions

Can logic KBs generalize to unseen documents? Can Alternating Optimization with Mirror Descent scale? Can we recover useful logic-influenced topics?

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 11 / 18

slide-59
SLIDE 59

Experimental questions

Can logic KBs generalize to unseen documents? Can Alternating Optimization with Mirror Descent scale? Can we recover useful logic-influenced topics?

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 11 / 18

slide-60
SLIDE 60

Generalization and scalability

Example datasets and KBs (see paper) k-fold cross-validation

Training: do Fold·all MAP inference to estimate (ˆ φ, ˆ θ) Testing: use trainset ˆ φ only to estimate testset ˆ z Evaluation: testset objective function Q (unnormalized probability)

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 12 / 18

slide-61
SLIDE 61

Generalization and scalability

Example datasets and KBs (see paper) k-fold cross-validation

Training: do Fold·all MAP inference to estimate (ˆ φ, ˆ θ) Testing: use trainset ˆ φ only to estimate testset ˆ z Evaluation: testset objective function Q (unnormalized probability)

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 12 / 18

slide-62
SLIDE 62

Generalization and scalability

Example datasets and KBs (see paper) k-fold cross-validation

Training: do Fold·all MAP inference to estimate (ˆ φ, ˆ θ) Testing: use trainset ˆ φ only to estimate testset ˆ z Evaluation: testset objective function Q (unnormalized probability)

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 12 / 18

slide-63
SLIDE 63

Generalization and scalability

Example datasets and KBs (see paper) k-fold cross-validation

Training: do Fold·all MAP inference to estimate (ˆ φ, ˆ θ) Testing: use trainset ˆ φ only to estimate testset ˆ z Evaluation: testset objective function Q (unnormalized probability)

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 12 / 18

slide-64
SLIDE 64

Generalization and scalability

Example datasets and KBs (see paper) k-fold cross-validation

Training: do Fold·all MAP inference to estimate (ˆ φ, ˆ θ) Testing: use trainset ˆ φ only to estimate testset ˆ z Evaluation: testset objective function Q (unnormalized probability)

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 12 / 18

slide-65
SLIDE 65

Generalization and scalability

Example datasets and KBs (see paper) k-fold cross-validation

Training: do Fold·all MAP inference to estimate (ˆ φ, ˆ θ) Testing: use trainset ˆ φ only to estimate testset ˆ z Evaluation: testset objective function Q (unnormalized probability) Fold·all Baselines Mir M+L LDA Alchemy | ∪k G(ψk)| Synth 9.86 11.13 −2.18 −1.73 1.2 × 105 Comp 2.40 2.45 1.19 − 6.3 × 103 Con 2.51 2.56 1.09 − 2.9 × 103 Pol 5.67 − 5.67 − 9.6 × 108 HDG 10.66 − 3.59 − 2.3 × 108

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 12 / 18

slide-66
SLIDE 66

Generalization and scalability

Example datasets and KBs (see paper) k-fold cross-validation

Training: do Fold·all MAP inference to estimate (ˆ φ, ˆ θ) Testing: use trainset ˆ φ only to estimate testset ˆ z Evaluation: testset objective function Q (unnormalized probability) Fold·all Baselines Mir M+L LDA Alchemy | ∪k G(ψk)| Synth 9.86 11.13 −2.18 −1.73 1.2 × 105 Comp 2.40 2.45 1.19 − 6.3 × 103 Con 2.51 2.56 1.09 − 2.9 × 103 Pol 5.67 − 5.67 − 9.6 × 108 HDG 10.66 − 3.59 − 2.3 × 108

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 12 / 18

slide-67
SLIDE 67

Generalization and scalability

Example datasets and KBs (see paper) k-fold cross-validation

Training: do Fold·all MAP inference to estimate (ˆ φ, ˆ θ) Testing: use trainset ˆ φ only to estimate testset ˆ z Evaluation: testset objective function Q (unnormalized probability) Fold·all Baselines Mir M+L LDA Alchemy | ∪k G(ψk)| Synth 9.86 11.13 −2.18 −1.73 1.2 × 105 Comp 2.40 2.45 1.19 − 6.3 × 103 Con 2.51 2.56 1.09 − 2.9 × 103 Pol 5.67 − 5.67 − 9.6 × 108 HDG 10.66 − 3.59 − 2.3 × 108

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 12 / 18

slide-68
SLIDE 68

Biological concept expansion

Human Development Genes (HDG)

Given “seed” terms for each concept Do discover other related terms Concept Provided terms Neural neur dendro(cyte), glia, synapse, neural crest Embryo human embryonic stem cell, inner cell mass, pluripotent Blood hematopoietic, blood, endothel(ium) Gastrulation

  • rganizer, gastru(late)

Cardiac heart, ventricle, auricle, aorta Limb limb, blastema, zeugopod, autopod, stylopod

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 13 / 18

slide-69
SLIDE 69

Biological concept expansion

Human Development Genes (HDG)

Given “seed” terms for each concept Do discover other related terms Concept Provided terms Neural neur dendro(cyte), glia, synapse, neural crest Embryo human embryonic stem cell, inner cell mass, pluripotent Blood hematopoietic, blood, endothel(ium) Gastrulation

  • rganizer, gastru(late)

Cardiac heart, ventricle, auricle, aorta Limb limb, blastema, zeugopod, autopod, stylopod

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 13 / 18

slide-70
SLIDE 70

Seed and n-gram rules

Neural → “synapse” → Topic 0

W(i, synapse) ⇒ Z(i, 0)

Embryo → “inner cell mass” → Topic 1

W(i, inner) ∧ W(i + 1, cell) ∧ W(i + 2, mass) ⇒ Z(i, 1) W(i − 1, inner) ∧ W(i, cell) ∧ W(i + 1, mass) ⇒ Z(i, 1) W(i − 2, inner) ∧ W(i − 1, cell) ∧ W(i, mass) ⇒ Z(i, 1)

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 14 / 18

slide-71
SLIDE 71

Seed and n-gram rules

Neural → “synapse” → Topic 0

W(i, synapse) ⇒ Z(i, 0)

Embryo → “inner cell mass” → Topic 1

W(i, inner) ∧ W(i + 1, cell) ∧ W(i + 2, mass) ⇒ Z(i, 1) W(i − 1, inner) ∧ W(i, cell) ∧ W(i + 1, mass) ⇒ Z(i, 1) W(i − 2, inner) ∧ W(i − 1, cell) ∧ W(i, mass) ⇒ Z(i, 1)

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 14 / 18

slide-72
SLIDE 72

Sentence rules

Sentence inclusion

New development Topic 6 {differentiation, maturation, develops, formation, differentiates} Development Topic 6 allows each seed Topic t in sentence Sentence(i, i1, . . . , iSk) ∧ ¬Z(i1, 6) ∧ . . . ∧ ¬Z(iSk, 6) ⇒ ¬Z(i, 0)

Sentence exclusion

New disease Topic 7 {patient, disease, parasite, . . ., condition, disorder, symptom} Disease Topic 7 prevents each seed Topic t in sentence S(i, s) ∧ S(j, s) ∧ Z(i, 7) ⇒ ¬Z(j, 0)

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 15 / 18

slide-73
SLIDE 73

Sentence rules

Sentence inclusion

New development Topic 6 {differentiation, maturation, develops, formation, differentiates} Development Topic 6 allows each seed Topic t in sentence Sentence(i, i1, . . . , iSk) ∧ ¬Z(i1, 6) ∧ . . . ∧ ¬Z(iSk, 6) ⇒ ¬Z(i, 0)

Sentence exclusion

New disease Topic 7 {patient, disease, parasite, . . ., condition, disorder, symptom} Disease Topic 7 prevents each seed Topic t in sentence S(i, s) ∧ S(j, s) ∧ Z(i, 7) ⇒ ¬Z(j, 0)

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 15 / 18

slide-74
SLIDE 74

Accuracy at Top 50 threshold

(means over 10 randomized runs)

Fold·all KBs ALL INCL EXCL SEED LDA Neural 0.59 0.57 0.54 0.54 0.31 Embryo 0.24 0.24 0.23 0.23 0.07 Blood 0.46 0.47 0.40 0.39 0.13 Gast. 0.18 0.18 0.16 0.16 0.00 Cardiac 0.36 0.37 0.34 0.35 0.08 Limb 0.18 0.18 0.15 0.14 0.09

Novel terms discovered for neural

{dendritic, forebrain, hindbrain, microglial, motoneurons, neuroblasts, neurogenesis, retinal}

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 16 / 18

slide-75
SLIDE 75

Accuracy at Top 50 threshold

(means over 10 randomized runs)

Fold·all KBs ALL INCL EXCL SEED LDA Neural 0.59 0.57 0.54 0.54 0.31 Embryo 0.24 0.24 0.23 0.23 0.07 Blood 0.46 0.47 0.40 0.39 0.13 Gast. 0.18 0.18 0.16 0.16 0.00 Cardiac 0.36 0.37 0.34 0.35 0.08 Limb 0.18 0.18 0.15 0.14 0.09

Novel terms discovered for neural

{dendritic, forebrain, hindbrain, microglial, motoneurons, neuroblasts, neurogenesis, retinal}

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 16 / 18

slide-76
SLIDE 76

Accuracy at Top 50 threshold

(means over 10 randomized runs)

Fold·all KBs ALL INCL EXCL SEED LDA Neural 0.59 0.57 0.54 0.54 0.31 Embryo 0.24 0.24 0.23 0.23 0.07 Blood 0.46 0.47 0.40 0.39 0.13 Gast. 0.18 0.18 0.16 0.16 0.00 Cardiac 0.36 0.37 0.34 0.35 0.08 Limb 0.18 0.18 0.15 0.14 0.09

Novel terms discovered for neural

{dendritic, forebrain, hindbrain, microglial, motoneurons, neuroblasts, neurogenesis, retinal}

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 16 / 18

slide-77
SLIDE 77

Accuracy at Top 50 threshold

(means over 10 randomized runs)

Fold·all KBs ALL INCL EXCL SEED LDA Neural 0.59 0.57 0.54 0.54 0.31 Embryo 0.24 0.24 0.23 0.23 0.07 Blood 0.46 0.47 0.40 0.39 0.13 Gast. 0.18 0.18 0.16 0.16 0.00 Cardiac 0.36 0.37 0.34 0.35 0.08 Limb 0.18 0.18 0.15 0.14 0.09

Novel terms discovered for neural

{dendritic, forebrain, hindbrain, microglial, motoneurons, neuroblasts, neurogenesis, retinal}

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 16 / 18

slide-78
SLIDE 78

Conclusion

Fold·all topic modeling with domain knowledge

user-specified constraints side information

Scalable inference Experimental results

Logic KBs generalize to unseen documents Inference scales to realistic datasets and KBs Topics reflect domain knowledge in interesting ways

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 17 / 18

slide-79
SLIDE 79

Acknowledgements

This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. LLNL-PRES-489752 Additional support

NSF IIS-0953219 AFOSR FA9550-09-1-0313 NIH/NLM R01 LM07050

HDG experiments: Ron Stewart (Thomson Lab, UW–Madison)

Source code

https://github.com/davidandrzej/LogicLDA

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 18 / 18

slide-80
SLIDE 80

MAP inference

Find most probable (z, φ, θ)

Q(z, φ, θ) =

T

  • t

log p(φt|β) +

D

  • j

log p(θj|α) +

N

  • i

log φzi(wi)θdi(zi) +

L

  • k
  • g∈G(ψk)

λk✶g(z, w, d, o) LDA terms Logic terms

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 19 / 18

slide-81
SLIDE 81

MAP inference

Find most probable (z, φ, θ)

Q(z, φ, θ) =

T

  • t

log p(φt|β) +

D

  • j

log p(θj|α) +

N

  • i

log φzi(wi)θdi(zi) +

L

  • k
  • g∈G(ψk)

λk✶g(z, w, d, o) LDA terms Logic terms

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 19 / 18

slide-82
SLIDE 82

MAP inference

Find most probable (z, φ, θ)

Q(z, φ, θ) =

T

  • t

log p(φt|β) +

D

  • j

log p(θj|α) +

N

  • i

log φzi(wi)θdi(zi) +

L

  • k
  • g∈G(ψk)

λk✶g(z, w, d, o) LDA terms Logic terms

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 19 / 18

slide-83
SLIDE 83

Ignore trivial rule groundings

Shavlik & Natarajan, 2009

λk ∀ ψk 5 i W(i, apple) ⇒ Z(i, 3)

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 20 / 18

slide-84
SLIDE 84

Ignore trivial rule groundings

Shavlik & Natarajan, 2009

λk ∀ ψk 5 i W(i, apple) ⇒ Z(i, 3)

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 20 / 18

slide-85
SLIDE 85

Ignore trivial rule groundings

Shavlik & Natarajan, 2009

λk ∀ ψk 5 i W(i, apple) ⇒ Z(i, 3)

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 20 / 18

slide-86
SLIDE 86

Represent ✶ as a polynomial

g = Z(i, 1) ∨ ¬Z(j, 2), and t ∈ {1, 2, 3}

1

Take complement ¬g ¬Z(i, 1) ∧ Z(j, 2)

2

Remove negations (¬g)+ (Z(i, 2) ∨ Z(i, 3)) ∧ Z(j, 2)

3

Numeric zit ∈ {0, 1} (zi2 + zi3)zj2

4

Polynomial ✶g(z) 1 − (zi2 + zi3)zj2

5

Relax discrete zit zit ∈ {0, 1} → zit ∈ [0, 1] ✶g(z) = 1 −

  • gi=∅

 

  • Z(i,t)∈(¬gi)+

zit  

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 21 / 18

slide-87
SLIDE 87

Represent ✶ as a polynomial

g = Z(i, 1) ∨ ¬Z(j, 2), and t ∈ {1, 2, 3}

1

Take complement ¬g ¬Z(i, 1) ∧ Z(j, 2)

2

Remove negations (¬g)+ (Z(i, 2) ∨ Z(i, 3)) ∧ Z(j, 2)

3

Numeric zit ∈ {0, 1} (zi2 + zi3)zj2

4

Polynomial ✶g(z) 1 − (zi2 + zi3)zj2

5

Relax discrete zit zit ∈ {0, 1} → zit ∈ [0, 1] ✶g(z) = 1 −

  • gi=∅

 

  • Z(i,t)∈(¬gi)+

zit  

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 21 / 18

slide-88
SLIDE 88

Represent ✶ as a polynomial

g = Z(i, 1) ∨ ¬Z(j, 2), and t ∈ {1, 2, 3}

1

Take complement ¬g ¬Z(i, 1) ∧ Z(j, 2)

2

Remove negations (¬g)+ (Z(i, 2) ∨ Z(i, 3)) ∧ Z(j, 2)

3

Numeric zit ∈ {0, 1} (zi2 + zi3)zj2

4

Polynomial ✶g(z) 1 − (zi2 + zi3)zj2

5

Relax discrete zit zit ∈ {0, 1} → zit ∈ [0, 1] ✶g(z) = 1 −

  • gi=∅

 

  • Z(i,t)∈(¬gi)+

zit  

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 21 / 18

slide-89
SLIDE 89

Represent ✶ as a polynomial

g = Z(i, 1) ∨ ¬Z(j, 2), and t ∈ {1, 2, 3}

1

Take complement ¬g ¬Z(i, 1) ∧ Z(j, 2)

2

Remove negations (¬g)+ (Z(i, 2) ∨ Z(i, 3)) ∧ Z(j, 2)

3

Numeric zit ∈ {0, 1} (zi2 + zi3)zj2

4

Polynomial ✶g(z) 1 − (zi2 + zi3)zj2

5

Relax discrete zit zit ∈ {0, 1} → zit ∈ [0, 1] ✶g(z) = 1 −

  • gi=∅

 

  • Z(i,t)∈(¬gi)+

zit  

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 21 / 18

slide-90
SLIDE 90

Represent ✶ as a polynomial

g = Z(i, 1) ∨ ¬Z(j, 2), and t ∈ {1, 2, 3}

1

Take complement ¬g ¬Z(i, 1) ∧ Z(j, 2)

2

Remove negations (¬g)+ (Z(i, 2) ∨ Z(i, 3)) ∧ Z(j, 2)

3

Numeric zit ∈ {0, 1} (zi2 + zi3)zj2

4

Polynomial ✶g(z) 1 − (zi2 + zi3)zj2

5

Relax discrete zit zit ∈ {0, 1} → zit ∈ [0, 1] ✶g(z) = 1 −

  • gi=∅

 

  • Z(i,t)∈(¬gi)+

zit  

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 21 / 18

slide-91
SLIDE 91

Represent ✶ as a polynomial

g = Z(i, 1) ∨ ¬Z(j, 2), and t ∈ {1, 2, 3}

1

Take complement ¬g ¬Z(i, 1) ∧ Z(j, 2)

2

Remove negations (¬g)+ (Z(i, 2) ∨ Z(i, 3)) ∧ Z(j, 2)

3

Numeric zit ∈ {0, 1} (zi2 + zi3)zj2

4

Polynomial ✶g(z) 1 − (zi2 + zi3)zj2

5

Relax discrete zit zit ∈ {0, 1} → zit ∈ [0, 1] ✶g(z) = 1 −

  • gi=∅

 

  • Z(i,t)∈(¬gi)+

zit  

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 21 / 18

slide-92
SLIDE 92

Represent ✶ as a polynomial

g = Z(i, 1) ∨ ¬Z(j, 2), and t ∈ {1, 2, 3}

1

Take complement ¬g ¬Z(i, 1) ∧ Z(j, 2)

2

Remove negations (¬g)+ (Z(i, 2) ∨ Z(i, 3)) ∧ Z(j, 2)

3

Numeric zit ∈ {0, 1} (zi2 + zi3)zj2

4

Polynomial ✶g(z) 1 − (zi2 + zi3)zj2

5

Relax discrete zit zit ∈ {0, 1} → zit ∈ [0, 1] ✶g(z) = 1 −

  • gi=∅

 

  • Z(i,t)∈(¬gi)+

zit  

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 21 / 18

slide-93
SLIDE 93

Standard LDA: Neural concept

Do standard LDA, then find topics containing seed terms in Top 50 brain system nervous neurons neuronal central development ’s neural human gene disease function cortex spinal disorders developing motor cerebral glial peripheral cortical cord disorder astrocytes nerve neurological regions suggest schizophrenia including syndrome neurodegenerative mental involved retardation behavior cerebellum migration behavioral abnormal cerebellar found precursor results amyloid hippocampus sclerosis neurotrophic present

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 22 / 18

slide-94
SLIDE 94

Standard LDA: Neural concept

Do standard LDA, then find topics containing seed terms in Top 50 brain system nervous neurons neuronal central development ’s neural human gene disease function cortex spinal disorders developing motor cerebral glial peripheral cortical cord disorder astrocytes nerve neurological regions suggest schizophrenia including syndrome neurodegenerative mental involved retardation behavior cerebellum migration behavioral abnormal cerebellar found precursor results amyloid hippocampus sclerosis neurotrophic present

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 22 / 18

slide-95
SLIDE 95

Fold·all can encode existing LDA variants

Example: Hidden Topic Markov Model (HTMM) - Gruber et al, 2007 Each sentence uses only one topic Topic transitions possible between sentences with probability ǫ

FOL encoding of HTMM

λk ∀ ψk ∞ i, j, s, t S(i, s) ∧ S(j, s) ∧ Z(i, t) ⇒ Z(j, t)

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 23 / 18

slide-96
SLIDE 96

Fold·all can encode existing LDA variants

Example: Hidden Topic Markov Model (HTMM) - Gruber et al, 2007 Each sentence uses only one topic Topic transitions possible between sentences with probability ǫ

FOL encoding of HTMM

λk ∀ ψk ∞ i, j, s, t S(i, s) ∧ S(j, s) ∧ Z(i, t) ⇒ Z(j, t) − log ǫ i, s, t S(i, s) ∧ ¬S(i + 1, s) ∧ Z(i, t) ⇒ Z(i + 1, t)

Andrzejewski (LLNL) LDA with Logical Domain Knowledge IJCAI 2011 23 / 18