Reified Context Models Jacob Steinhardt Percy Liang Stanford - - PowerPoint PPT Presentation

reified context models
SMART_READER_LITE
LIVE PREVIEW

Reified Context Models Jacob Steinhardt Percy Liang Stanford - - PowerPoint PPT Presentation

Reified Context Models Jacob Steinhardt Percy Liang Stanford University { jsteinhardt,pliang } @cs.stanford.edu July 8, 2015 J. Steinhardt & P. Liang (Stanford) Reified Context Models July 8, 2015 1 / 11 Structured Prediction Task input


slide-1
SLIDE 1

Reified Context Models

Jacob Steinhardt Percy Liang

Stanford University

{jsteinhardt,pliang}@cs.stanford.edu

July 8, 2015

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 1 / 11

slide-2
SLIDE 2

Structured Prediction Task

input x:

  • utput y:

v

  • l

c a n i c

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 2 / 11

slide-3
SLIDE 3

Contexts Are Key

v

  • l

c a

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 3 / 11

slide-4
SLIDE 4

Contexts Are Key

v

  • l

c a v *o **l ***c

DP:

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 3 / 11

slide-5
SLIDE 5

Contexts Are Key

v

  • l

c a v *o **l ***c

DP:

v

  • l

c a v vo vol volc

beam search:

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 3 / 11

slide-6
SLIDE 6

Contexts Are Key

v

  • l

c a v *o **l ***c

DP:

v

  • l

c a v vo vol volc

beam search: Key idea: contexts!

*o def =          ao bo co

. . .

        

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 3 / 11

slide-7
SLIDE 7

Desiderata

r *o **l ***c v *a **i ***r

coverage (short contexts)

better uncertainty estimates (precision) stabler partially supervised learning updates

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 4 / 11

slide-8
SLIDE 8

Desiderata

r *o **l ***c v *a **i ***r

coverage (short contexts)

better uncertainty estimates (precision) stabler partially supervised learning updates

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 4 / 11

slide-9
SLIDE 9

Desiderata

r *o **l ***c v *a **i ***r

coverage (short contexts)

better uncertainty estimates (precision) stabler partially supervised learning updates

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 4 / 11

slide-10
SLIDE 10

Desiderata

r *o **l ***c v *a **i ***r

coverage (short contexts)

better uncertainty estimates (precision) stabler partially supervised learning updates

r ro rol rolc v ra ral ralc

expressivity (long contexts)

capture complex dependencies

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 4 / 11

slide-11
SLIDE 11

Desiderata

r *o **l ***c v *a **i ***r

coverage (short contexts)

better uncertainty estimates (precision) stabler partially supervised learning updates

r ro rol rolc v ra ral ralc

expressivity (long contexts)

capture complex dependencies

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 4 / 11

slide-12
SLIDE 12

Desiderata

r *o **l ***c v *a **i ***r

coverage (short contexts)

better uncertainty estimates (precision) stabler partially supervised learning updates

r ro rol rolc v ra ral ralc

expressivity (long contexts)

capture complex dependencies

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 4 / 11

slide-13
SLIDE 13

Desiderata

r *o **l ***c v *a **i ***r

coverage (short contexts)

better uncertainty estimates (precision) stabler partially supervised learning updates

r ro rol rolc v ra ral ralc

expressivity (long contexts)

capture complex dependencies

r ro rol *olc ← best of both worlds v ra ral ***c y *o *ol ***r * ** *** ****

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 4 / 11

slide-14
SLIDE 14

Reifying Contexts

input x:

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 5 / 11

slide-15
SLIDE 15

Reifying Contexts

input x:

  • utput y:

v

  • l

c a n i c

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 5 / 11

slide-16
SLIDE 16

Reifying Contexts

input x:

  • utput y:

v

  • l

c a n i c

context c:

v *o *ol *olc ······

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 5 / 11

slide-17
SLIDE 17

Reifying Contexts

input x:

  • utput y:

v

  • l

c a n i c

context c:

v *o *ol *olc ······ r ro rol *olc v ra ral ***c y *o *ol ***r * ** *** ****

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 5 / 11

slide-18
SLIDE 18

Reifying Contexts

input x:

  • utput y:

v

  • l

c a n i c

context c:

v *o *ol *olc ······ r ro rol *olc ←“context sets” v ra ral ***c y *o *ol ***r * ** *** **** C1 C2 C3 C4

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 5 / 11

slide-19
SLIDE 19

Reifying Contexts

input x:

  • utput y:

v

  • l

c a n i c

context c:

v *o *ol *olc ······ r ro rol *olc ←“context sets” v ra ral ***c y *o *ol ***r * ** *** **** C1 C2 C3 C4

Challenge: how to trade off contexts of different lengths?

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 5 / 11

slide-20
SLIDE 20

Reifying Contexts

input x:

  • utput y:

v

  • l

c a n i c

context c:

v *o *ol *olc ······ r ro rol *olc ←“context sets” v ra ral ***c y *o *ol ***r * ** *** **** C1 C2 C3 C4

Challenge: how to trade off contexts of different lengths?

= ⇒ Reify contexts as part of model!

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 5 / 11

slide-21
SLIDE 21

Reified Context Models

Given: context sets C1,...,CL

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 6 / 11

slide-22
SLIDE 22

Reified Context Models

Given: context sets C1,...,CL features φi(ci−1,yi)

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 6 / 11

slide-23
SLIDE 23

Reified Context Models

Given: context sets C1,...,CL features φi(ci−1,yi) Define the model pθ(y1:L,c1:L−1) ∝ exp

  • L

i=1

θ ⊤φi(ci−1,yi)

  • · κ(y,c)

consistency

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 6 / 11

slide-24
SLIDE 24

Reified Context Models

Given: context sets C1,...,CL features φi(ci−1,yi) Define the model pθ(y1:L,c1:L−1) ∝ exp

  • L

i=1

θ ⊤φi(ci−1,yi)

  • · κ(y,c)

consistency

Graphical model structure:

Y1 Y2 Y3 Y4 Y5 C1 C2 C3 C4

κ κ κ κ κ

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 6 / 11

slide-25
SLIDE 25

Reified Context Models

Given: context sets C1,...,CL features φi(ci−1,yi) Define the model pθ(y1:L,c1:L−1) ∝ exp

  • L

i=1

θ ⊤φi(ci−1,yi)

  • · κ(y,c)

consistency

Graphical model structure:

Y1 Y2 Y3 Y4 Y5 C1 C2 C3 C4

φ1 φ2 φ3 φ4 φ5

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 6 / 11

slide-26
SLIDE 26

Reified Context Models

Given: context sets C1,...,CL features φi(ci−1,yi) Define the model pθ(y1:L,c1:L−1) ∝ exp

  • L

i=1

θ ⊤φi(ci−1,yi)

  • · κ(y,c)

consistency

Graphical model structure:

Y1 Y2 Y3 Y4 Y5 C1 C2 C3 C4

φ1 φ2 φ3 φ4 φ5

inference via forward-backward!

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 6 / 11

slide-27
SLIDE 27

Adaptive Context Selection

Select context sets Ci during forward pass of inference

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 7 / 11

slide-28
SLIDE 28

Adaptive Context Selection

Select context sets Ci during forward pass of inference Greedily select contexts with largest mass

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 7 / 11

slide-29
SLIDE 29

Adaptive Context Selection

Select context sets Ci during forward pass of inference Greedily select contexts with largest mass a b c d e

. . .

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 7 / 11

slide-30
SLIDE 30

Adaptive Context Selection

Select context sets Ci during forward pass of inference Greedily select contexts with largest mass a b c d e

. . .

c e

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 7 / 11

slide-31
SLIDE 31

Adaptive Context Selection

Select context sets Ci during forward pass of inference Greedily select contexts with largest mass a b c d e

. . .

c e c e

⋆ C1

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 7 / 11

slide-32
SLIDE 32

Adaptive Context Selection

Select context sets Ci during forward pass of inference Greedily select contexts with largest mass a b c d e

. . .

c e c e

⋆ C1

ca cb

. . .

ea eb

. . .

⋆a

. . .

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 7 / 11

slide-33
SLIDE 33

Adaptive Context Selection

Select context sets Ci during forward pass of inference Greedily select contexts with largest mass a b c d e

. . .

c e c e

⋆ C1

ca cb

. . .

ea eb

. . .

⋆a

. . .

ca

⋆a

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 7 / 11

slide-34
SLIDE 34

Adaptive Context Selection

Select context sets Ci during forward pass of inference Greedily select contexts with largest mass a b c d e

. . .

c e c e

⋆ C1

ca cb

. . .

ea eb

. . .

⋆a

. . .

ca

⋆a

ca

⋆a ⋆⋆ C2

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 7 / 11

slide-35
SLIDE 35

Adaptive Context Selection

Select context sets Ci during forward pass of inference Greedily select contexts with largest mass a b c d e

. . .

c e c e

⋆ C1

ca cb

. . .

ea eb

. . .

⋆a

. . .

ca

⋆a

ca

⋆a ⋆⋆ C2

etc.

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 7 / 11

slide-36
SLIDE 36

Adaptive Context Selection

Select context sets Ci during forward pass of inference Greedily select contexts with largest mass a b c d e

. . .

c e c e

⋆ C1

ca cb

. . .

ea eb

. . .

⋆a

. . .

ca

⋆a

ca

⋆a ⋆⋆ C2

etc. Biases towards short contexts unless there is high confidence.

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 7 / 11

slide-37
SLIDE 37

Precision

input x:

  • utput y:

v

  • l

c a n i c

0.0 0.2 0.4 0.6 0.8 1.0

recall

0.86 0.88 0.90 0.92 0.94 0.96 0.98 1.00

precision

Word Recognition

Beam search RCM

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 8 / 11

slide-38
SLIDE 38

Precision

input x:

  • utput y:

v

  • l

c a n i c

Model assigns probability to each prediction, so can predict on most confident subset.

0.0 0.2 0.4 0.6 0.8 1.0

recall

0.86 0.88 0.90 0.92 0.94 0.96 0.98 1.00

precision

Word Recognition

Beam search RCM

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 8 / 11

slide-39
SLIDE 39

Precision

input x:

  • utput y:

v

  • l

c a n i c

Model assigns probability to each prediction, so can predict on most confident subset. Measure precision (# of correct words) vs. recall (# of words predicted).

0.0 0.2 0.4 0.6 0.8 1.0

recall

0.86 0.88 0.90 0.92 0.94 0.96 0.98 1.00

precision

Word Recognition

Beam search RCM

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 8 / 11

slide-40
SLIDE 40

Precision

input x:

  • utput y:

v

  • l

c a n i c

Model assigns probability to each prediction, so can predict on most confident subset. Measure precision (# of correct words) vs. recall (# of words predicted). comparison: beam search

0.0 0.2 0.4 0.6 0.8 1.0

recall

0.86 0.88 0.90 0.92 0.94 0.96 0.98 1.00

precision

Word Recognition

Beam search RCM

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 8 / 11

slide-41
SLIDE 41

Precision

Measure precision (# of correct words) vs. recall (# of words predicted).

0.0 0.2 0.4 0.6 0.8 1.0

recall

0.86 0.88 0.90 0.92 0.94 0.96 0.98 1.00

precision

Word Recognition

Beam search RCM

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 8 / 11

slide-42
SLIDE 42

Partially Supervised Learning

Decipherment task: cipher am → 5, I → 13, what → 54, ...

5 10 15 20

training passes

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

mapping accuracy

Decipherment

RCM beam

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 9 / 11

slide-43
SLIDE 43

Partially Supervised Learning

Decipherment task: cipher am → 5, I → 13, what → 54, ... latent z I am what I am

5 10 15 20

training passes

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

mapping accuracy

Decipherment

RCM beam

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 9 / 11

slide-44
SLIDE 44

Partially Supervised Learning

Decipherment task: cipher am → 5, I → 13, what → 54, ... latent z I am what I am

  • utput y

13 5 54 13 5

5 10 15 20

training passes

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

mapping accuracy

Decipherment

RCM beam

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 9 / 11

slide-45
SLIDE 45

Partially Supervised Learning

Decipherment task: cipher am → 5, I → 13, what → 54, ... latent z I am what I am

  • utput y

13 5 54 13 5 Goal: determine cipher

5 10 15 20

training passes

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

mapping accuracy

Decipherment

RCM beam

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 9 / 11

slide-46
SLIDE 46

Partially Supervised Learning

Decipherment task: cipher am → 5, I → 13, what → 54, ... latent z I am what I am

  • utput y

13 5 54 13 5 Goal: determine cipher Fit 2nd-order HMM with EM, using RCMs for approximate E-step.

5 10 15 20

training passes

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

mapping accuracy

Decipherment

RCM beam

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 9 / 11

slide-47
SLIDE 47

Partially Supervised Learning

Decipherment task: cipher am → 5, I → 13, what → 54, ... latent z I am what I am

  • utput y

13 5 54 13 5 Goal: determine cipher Fit 2nd-order HMM with EM, using RCMs for approximate E-step. use learned emissions to determine cipher.

5 10 15 20

training passes

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

mapping accuracy

Decipherment

RCM beam

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 9 / 11

slide-48
SLIDE 48

Partially Supervised Learning

Decipherment task: cipher am → 5, I → 13, what → 54, ... latent z I am what I am

  • utput y

13 5 54 13 5 Goal: determine cipher Fit 2nd-order HMM with EM, using RCMs for approximate E-step. use learned emissions to determine cipher. again compare to beam search (Nuhn et al., 2013)

5 10 15 20

training passes

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

mapping accuracy

Decipherment

RCM beam

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 9 / 11

slide-49
SLIDE 49

Partially Supervised Learning

Fraction of correctly mapped words:

5 10 15 20

training passes

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

mapping accuracy

Decipherment

RCM beam

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 9 / 11

slide-50
SLIDE 50

Contexts During Training

Context lengths increase smoothly during training:

5 10 15 20

number of passes

1.5 2.0 2.5 3.0 3.5 4.0 4.5

average context length

Decipherment

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 10 / 11

slide-51
SLIDE 51

Contexts During Training

Context lengths increase smoothly during training:

5 10 15 20

number of passes

1.5 2.0 2.5 3.0 3.5 4.0 4.5

average context length

Decipherment

****** ↓ ***ing ↓ idding

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 10 / 11

slide-52
SLIDE 52

Contexts During Training

Context lengths increase smoothly during training:

5 10 15 20

number of passes

1.5 2.0 2.5 3.0 3.5 4.0 4.5

average context length

Decipherment

****** ↓ ***ing ↓ idding

Start of training: little information, short contexts.

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 10 / 11

slide-53
SLIDE 53

Contexts During Training

Context lengths increase smoothly during training:

5 10 15 20

number of passes

1.5 2.0 2.5 3.0 3.5 4.0 4.5

average context length

Decipherment

****** ↓ ***ing ↓ idding

Start of training: little information, short contexts. End of training: lots of information, long contexts.

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 10 / 11

slide-54
SLIDE 54

Discussion

RCMs provide both expressivity and coverage, which enable:

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 11 / 11

slide-55
SLIDE 55

Discussion

RCMs provide both expressivity and coverage, which enable: More accurate uncertainty estimates (precision)

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 11 / 11

slide-56
SLIDE 56

Discussion

RCMs provide both expressivity and coverage, which enable: More accurate uncertainty estimates (precision) Better partially supervised learning updates

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 11 / 11

slide-57
SLIDE 57

Discussion

RCMs provide both expressivity and coverage, which enable: More accurate uncertainty estimates (precision) Better partially supervised learning updates Related work: Coarse-to-fine inference (Petrov et al., 2006; Weiss et al., 2010)

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 11 / 11

slide-58
SLIDE 58

Discussion

RCMs provide both expressivity and coverage, which enable: More accurate uncertainty estimates (precision) Better partially supervised learning updates Related work: Coarse-to-fine inference (Petrov et al., 2006; Weiss et al., 2010) Certificates of optimality (Sontag, 2010)

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 11 / 11

slide-59
SLIDE 59

Discussion

RCMs provide both expressivity and coverage, which enable: More accurate uncertainty estimates (precision) Better partially supervised learning updates Related work: Coarse-to-fine inference (Petrov et al., 2006; Weiss et al., 2010) Certificates of optimality (Sontag, 2010) Tractable models (Poon & Domingos, 2011; Niepert & Domingos, 2014; Li & Zemel, 2014; S. & Liang, 2015)

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 11 / 11

slide-60
SLIDE 60

Discussion

RCMs provide both expressivity and coverage, which enable: More accurate uncertainty estimates (precision) Better partially supervised learning updates Related work: Coarse-to-fine inference (Petrov et al., 2006; Weiss et al., 2010) Certificates of optimality (Sontag, 2010) Tractable models (Poon & Domingos, 2011; Niepert & Domingos, 2014; Li & Zemel, 2014; S. & Liang, 2015) Reproducible experiments on Codalab: codalab.org/worksheets

  • J. Steinhardt & P. Liang (Stanford)

Reified Context Models July 8, 2015 11 / 11