Learning with Structured Output Spaces Keerthiram Murugesan - - PowerPoint PPT Presentation

learning with structured output spaces
SMART_READER_LITE
LIVE PREVIEW

Learning with Structured Output Spaces Keerthiram Murugesan - - PowerPoint PPT Presentation

Learning with Structured Output Spaces Keerthiram Murugesan Standard Predic,on Find func8on from input space X to output space Y such that the predic8on error is low. x Microsoft announced today that they x acquired Apple for the amount


slide-1
SLIDE 1

Learning with Structured Output Spaces

Keerthiram Murugesan

slide-2
SLIDE 2
  • Find func8on from input space X to output space Y

such that the predic8on error is low.

(typically Y is “simple”)

Microsoft announced today that they acquired Apple for the amount equal to the gross national product of Switzerland. Microsoft officials stated that they first wanted to buy Switzerland, but eventually were turned off by the mountains and the snowy winters…

x y 1

GATACAACCTATCCCCGTATATATATTCTA TGGGTATAGTATTAAATCAATACAACCTAT CCCCGTATATATATTCTATGGGTATAGTAT TAAATCAATACAACCTATCCCCGTATATAT ATTCTATGGGTATAGTATTAAATCAGATAC AACCTATCCCCGTATATATATTCTATGGGT ATAGTATTAAATCACATTTA

x y

  • 1

x y 7.3

Standard Predic,on

slide-3
SLIDE 3

Conservation Reservoir Corridors

!"#$%&'()*'+,'-.%' !"#$%&'/)*'+,'-.%' !"#$0' 12+30' 4' 5'

  • Y
  • Y
  • The dog chased the cat.

S NP VP N Det NP V Det N

  • Y
  • Y
  • APPGEAYLQPGEAYLQV
  • Y
  • Y
  • [Obama]running

in the [presidental election] has mobilized [ many young voters]. [His][position] on [climate change] was well received by [this group].

Obama presidential election many young voters His position climate change this group

Structured Predic,on

X Y X Y X Y X X Y

slide-4
SLIDE 4
slide-5
SLIDE 5

Talk Overview

  • Structured Predic8on (Quick Review)

– Conven8onal Approach

  • Structured Predic8on Cascades

– Ensemble Cascades

  • Ensemble learning for Structured Predic8on

– Online algorithm – Boos8ng-style algorithm

  • Y
  • Y
  • The dog chased the cat.

S NP VP N Det NP V Det N

slide-6
SLIDE 6

Structured Predic8on

slide-7
SLIDE 7

Structured Output Spaces

  • Input: x
  • Predict: y Y(x)
  • Quality determined by u8lity func8on
  • Conven,onal Approach:

– Train: learn model U(x,y) of u8lity – Test: predict via

Structured!

h(x) = argmax

y∈Y (x) U(x, y)

Can be challenging Scoring func,on

slide-8
SLIDE 8
  • Part-of-Speech Tagging

– Given a sequence of words x – Predict sequence of tags y.

The rain wet the cat x Det N V Det N y

Example: Sequence Predic8on

Adj V V Det V V V N Adv Det y y

h(x) = argmax

y∈Y (x) U(x, y)

slide-9
SLIDE 9

Example: Sequence Predic8on

  • MAP inference in 1-st order Markov models

y1 y2 y3 y4 x1 x2 x3 x4

… …

1st order dynamics Similar models include CRFs, Kalman Filters, Linear Dynamical Systems, etc.

slide-10
SLIDE 10

Example: Sequence Predic8on

  • Utility function:
  • Prediction:

h(x) = argmax

y

u(x t, yt, yt−1)

t=1 n

U(x, y) = u(x t, yt, yt−1)

t=1 n

y1 y2 y3 y4 x1 x2 x3 x4

Dynamic Programming Sum over maximal cliques

slide-11
SLIDE 11

Scoring function as Linear Models

  • U/u is parameterized linearly:

U(x, y;θ) = u(x t, yt, yt−1;θ)

t

u(x, y1, y2;θ) =θ T f (x, y1, y2)

h(x;θ) = argmax

y

θ T f (x t, yt, yt−1)

t

Some feature representa,on Dynamic Programming

slide-12
SLIDE 12

Feature representa8on

slide-13
SLIDE 13

Generalizing to Other Structures

  • From last slide:
  • General Formulation:

h(x;θ) = argmax

y

θ T f (x t, yt, yt−1)

t

Ψ(x, y) = f (xt, yt, yt−1)

t

h(x;θ) = argmax

y

θ TΨ(x, y)

  • Viterbi
  • CKY Parsing
  • Sorting
  • Belief Propagation
  • Integer Programming
slide-14
SLIDE 14

Learning Se]ng

  • Generaliza8on of Conven8onal Se]ngs

– Hinge loss = Structural SVMs – Log-loss = Condi8onal Random Fields – Gradient Descent, Cu]ng Plane, etc…

  • Requires running inference during training

argmin

θ

λ 2 θ

2 +

ℓ y,h(x;θ)

( )

(x,y)

h(x) = argmax

y∈Y (x) U(x, y)

Regulariza,on Loss Func,on

slide-15
SLIDE 15

Restric8on: Increased Complexity

slide-16
SLIDE 16

Restric8on: Pre-specified Structure

  • Learn a (linearly) parameterized U

– Such that h(x) gives good predic8ons

  • What if U is “wrong”?

– Known to not be consistent – Infinite training data ≠ converging to best model

Structure

h(x;θ) = argmax

y

U(x, y;θ)

slide-17
SLIDE 17

Summary: Structured Predic8on

  • Conven8onal Approach

– Specify structure & inference procedure – Train parameters on training set {(x,y)}

  • Limita,ons:

– Run,me propor,onal to Model Complexity – Structure Mismatch & Inconsistency

h(x;θ) = argmax

y

U(x, y;θ)

Structure

slide-18
SLIDE 18

Structured Predic8on Cascades

slide-19
SLIDE 19

Classifier Cascades (Face Classifier)

slide-20
SLIDE 20

Classifier Cascades

slide-21
SLIDE 21

Tradeoffs in Cascaded Learning

  • Accuracy: Minimize the

number of errors incurred by each level

  • Efficiency: Maximize the

number of filtered assignments at each level

slide-22
SLIDE 22

Structured Predic8on Cascades

slide-23
SLIDE 23

Clique Assignments

  • Valid assignment for clique (Yk-1,Yk)
  • Invalid assignment (that will be eliminated/

pruned)

Adj N

Yk-1 Yk

N N

Yk-1 Yk Remember Sum over Cliques? U(x, y) = u(x t, yt, yt−1)

c∈C

slide-24
SLIDE 24

Clique Assignments

  • Valid assignment for clique (Yk-1,Yk)
  • Invalid assignment (that will be eliminated/

pruned)

Adj N

Yk-1 Yk

N N

Yk-1 Yk How do we know this assignment is good or bad?

  • 1. Score
  • 2. Threshold
slide-25
SLIDE 25

Max-marginal score (sequence models)

slide-26
SLIDE 26

Threshold (t)

slide-27
SLIDE 27

Threshold (t)

slide-28
SLIDE 28

Threshold (t)

slide-29
SLIDE 29
slide-30
SLIDE 30
slide-31
SLIDE 31
slide-32
SLIDE 32
slide-33
SLIDE 33
slide-34
SLIDE 34
slide-35
SLIDE 35
slide-36
SLIDE 36

Learning θ at each cascade level

slide-37
SLIDE 37

Online learning

slide-38
SLIDE 38

Structured Predic8on Ensembles

slide-39
SLIDE 39

Ensemble Learning

h1 h2 hp h3

face face face no face

Goal: Combine these output from mul8ple models / hypotheses / experts: 1) Majority Vo8ng 2) Linear combina8on of hypotheses/experts 3) Boos8ng, etc

slide-40
SLIDE 40

Weighted Majority Algorithm

slide-41
SLIDE 41

Ensemble learning for Structured Predic8on

h1 h2 hp h3 h1

1

h1

2

h1

l

. . . . . . hp

1

hp

2

hp

l

. . .

h1 V V N Adv Det

slide-42
SLIDE 42

Example: Sequence Model

slide-43
SLIDE 43

Weighted Majority Algorithm for Structured Predic8on Ensembles

slide-44
SLIDE 44

Ensemble output from Weighted Majority Algorithm

  • Given W1, W2, … WT
slide-45
SLIDE 45

Boos8ng for Structured Predic8on Ensembles

slide-46
SLIDE 46

Ensemble output from Boos8ng

  • Given the base learners h1, h2, … hT,:
  • Note h1, h2, … hT are different from h1, h2, … hP
slide-47
SLIDE 47
  • THE END