Learning Where to Sample in Structured Prediction Tianlin Shi - - PowerPoint PPT Presentation

learning where to sample in structured prediction
SMART_READER_LITE
LIVE PREVIEW

Learning Where to Sample in Structured Prediction Tianlin Shi - - PowerPoint PPT Presentation

Learning Where to Sample in Structured Prediction Tianlin Shi Jacob Steinhardt Percy Liang AISTATS 2015 Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 1 / 25 Introduction Outline Introduction Reinforcement Learning


slide-1
SLIDE 1

Learning Where to Sample in Structured Prediction

Tianlin Shi Jacob Steinhardt Percy Liang

AISTATS 2015

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 1 / 25

slide-2
SLIDE 2

Introduction

Outline

Introduction Reinforcement Learning Meta-Features Experiments

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 2 / 25

slide-3
SLIDE 3

Introduction

Setting

◮ Have “stolen” a prediction model for structured outputs:

p( y1, y2, ..., yn

  • Output

| x

  • Input

)

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 3 / 25

slide-4
SLIDE 4

Introduction

Setting

◮ Have “stolen” a prediction model for structured outputs:

p( y1, y2, ..., yn

  • Output

| x

  • Input

)

◮ At run time, use Gibbs sampling to do stochastic search:

arg max

y1,...,yn p(y1, . . . , yn|x)

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 3 / 25

slide-5
SLIDE 5

Introduction

Setting

◮ Have “stolen” a prediction model for structured outputs:

p( y1, y2, ..., yn

  • Output

| x

  • Input

)

◮ At run time, use Gibbs sampling to do stochastic search:

arg max

y1,...,yn p(y1, . . . , yn|x) ◮ Goal Optimize the running time of Gibbs sampler! How?

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 3 / 25

slide-6
SLIDE 6

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves.

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-7
SLIDE 7

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves. Example (Part-of-Speech Tagging)

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-8
SLIDE 8

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves. Example (Part-of-Speech Tagging)

x I think now is the right time

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-9
SLIDE 9

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves. Example (Part-of-Speech Tagging)

x I think now is the right time pass 1: y

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-10
SLIDE 10

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves. Example (Part-of-Speech Tagging)

x I think now is the right time pass 1: y Pronoun

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-11
SLIDE 11

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves. Example (Part-of-Speech Tagging)

x I think now is the right time pass 1: y Pronoun Verb

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-12
SLIDE 12

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves. Example (Part-of-Speech Tagging)

x I think now is the right time pass 1: y Pronoun Verb Adverb

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-13
SLIDE 13

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves. Example (Part-of-Speech Tagging)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-14
SLIDE 14

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves. Example (Part-of-Speech Tagging)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb Determiner

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-15
SLIDE 15

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves. Example (Part-of-Speech Tagging)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb Determiner Noun

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-16
SLIDE 16

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves. Example (Part-of-Speech Tagging)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb Determiner Noun Noun

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-17
SLIDE 17

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves. Example (Part-of-Speech Tagging)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb Determiner Noun Noun pass 2: y

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-18
SLIDE 18

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves. Example (Part-of-Speech Tagging)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb Determiner Noun Noun pass 2: y Pronoun

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-19
SLIDE 19

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves. Example (Part-of-Speech Tagging)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb Determiner Noun Noun pass 2: y Pronoun Verb

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-20
SLIDE 20

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves. Example (Part-of-Speech Tagging)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb Determiner Noun Noun pass 2: y Pronoun Verb Noun

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-21
SLIDE 21

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves. Example (Part-of-Speech Tagging)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb Determiner Noun Noun pass 2: y Pronoun Verb Noun Verb

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-22
SLIDE 22

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves. Example (Part-of-Speech Tagging)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb Determiner Noun Noun pass 2: y Pronoun Verb Noun Verb Determiner

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-23
SLIDE 23

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves. Example (Part-of-Speech Tagging)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb Determiner Noun Noun pass 2: y Pronoun Verb Noun Verb Determiner Adjective

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-24
SLIDE 24

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves. Example (Part-of-Speech Tagging)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb Determiner Noun Noun pass 2: y Pronoun Verb Noun Verb Determiner Adjective Noun

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-25
SLIDE 25

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves. Example (Part-of-Speech Tagging)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb Determiner Noun Noun pass 2: y Pronoun Verb Noun Verb Determiner Adjective Noun

Source of inefficiency. Some parts are harder, while some are easier.

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-26
SLIDE 26

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves. Example (Part-of-Speech Tagging)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb Determiner Noun Noun pass 2: y Pronoun Verb Noun Verb Determiner Adjective Noun

Source of inefficiency. Some parts are harder, while some are easier. Example (A Better Strategy)

x I think now is the right time

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-27
SLIDE 27

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves. Example (Part-of-Speech Tagging)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb Determiner Noun Noun pass 2: y Pronoun Verb Noun Verb Determiner Adjective Noun

Source of inefficiency. Some parts are harder, while some are easier. Example (A Better Strategy)

x I think now is the right time pass 1: y

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-28
SLIDE 28

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves. Example (Part-of-Speech Tagging)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb Determiner Noun Noun pass 2: y Pronoun Verb Noun Verb Determiner Adjective Noun

Source of inefficiency. Some parts are harder, while some are easier. Example (A Better Strategy)

x I think now is the right time pass 1: y Pronoun

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-29
SLIDE 29

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves. Example (Part-of-Speech Tagging)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb Determiner Noun Noun pass 2: y Pronoun Verb Noun Verb Determiner Adjective Noun

Source of inefficiency. Some parts are harder, while some are easier. Example (A Better Strategy)

x I think now is the right time pass 1: y Pronoun Verb

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-30
SLIDE 30

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves. Example (Part-of-Speech Tagging)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb Determiner Noun Noun pass 2: y Pronoun Verb Noun Verb Determiner Adjective Noun

Source of inefficiency. Some parts are harder, while some are easier. Example (A Better Strategy)

x I think now is the right time pass 1: y Pronoun Verb Adverb

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-31
SLIDE 31

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves. Example (Part-of-Speech Tagging)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb Determiner Noun Noun pass 2: y Pronoun Verb Noun Verb Determiner Adjective Noun

Source of inefficiency. Some parts are harder, while some are easier. Example (A Better Strategy)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-32
SLIDE 32

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves. Example (Part-of-Speech Tagging)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb Determiner Noun Noun pass 2: y Pronoun Verb Noun Verb Determiner Adjective Noun

Source of inefficiency. Some parts are harder, while some are easier. Example (A Better Strategy)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb Determiner

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-33
SLIDE 33

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves. Example (Part-of-Speech Tagging)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb Determiner Noun Noun pass 2: y Pronoun Verb Noun Verb Determiner Adjective Noun

Source of inefficiency. Some parts are harder, while some are easier. Example (A Better Strategy)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb Determiner Noun

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-34
SLIDE 34

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves. Example (Part-of-Speech Tagging)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb Determiner Noun Noun pass 2: y Pronoun Verb Noun Verb Determiner Adjective Noun

Source of inefficiency. Some parts are harder, while some are easier. Example (A Better Strategy)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb Determiner Noun Noun

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-35
SLIDE 35

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves. Example (Part-of-Speech Tagging)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb Determiner Noun Noun pass 2: y Pronoun Verb Noun Verb Determiner Adjective Noun

Source of inefficiency. Some parts are harder, while some are easier. Example (A Better Strategy)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb Determiner Noun Noun pass 2: y

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-36
SLIDE 36

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves. Example (Part-of-Speech Tagging)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb Determiner Noun Noun pass 2: y Pronoun Verb Noun Verb Determiner Adjective Noun

Source of inefficiency. Some parts are harder, while some are easier. Example (A Better Strategy)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb Determiner Noun Noun pass 2: y Noun

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-37
SLIDE 37

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves. Example (Part-of-Speech Tagging)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb Determiner Noun Noun pass 2: y Pronoun Verb Noun Verb Determiner Adjective Noun

Source of inefficiency. Some parts are harder, while some are easier. Example (A Better Strategy)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb Determiner Noun Noun pass 2: y Noun Adjective

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-38
SLIDE 38

Introduction

Dissection: Gibbs Sampler

A small cost for a local move, but a large number of moves. Example (Part-of-Speech Tagging)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb Determiner Noun Noun pass 2: y Pronoun Verb Noun Verb Determiner Adjective Noun

Source of inefficiency. Some parts are harder, while some are easier. Example (A Better Strategy)

x I think now is the right time pass 1: y Pronoun Verb Adverb Verb Determiner Noun Noun pass 2: y Noun Adjective

A HeteroSampler! (“Heterogeneous Sampler”) – Focus computation to where needed.

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 4 / 25

slide-39
SLIDE 39

Introduction

Framework

Definition Action Aj updates part yj based on p(yj|y−j, x)

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 5 / 25

slide-40
SLIDE 40

Introduction

Framework

Definition Action Aj updates part yj based on p(yj|y−j, x) Example

Example I think now is the right time Input xj x3 Output yj y3 Action Aj A3

A3 samples y3 from p(y3|y−3, x)

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 5 / 25

slide-41
SLIDE 41

Introduction

Sampler Template

Our sampler: for total of T rounds, do

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 6 / 25

slide-42
SLIDE 42

Introduction

Sampler Template

Our sampler: for total of T rounds, do

y j−1

t ¡ ¡

y j+1 y j

… … Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 6 / 25

slide-43
SLIDE 43

Introduction

Sampler Template

Our sampler: for total of T rounds, do

y j−1 Pick

t ¡ ¡

y j+1 y j

… …

j[t]

  • 1. Pick index j and the action Aj.

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 6 / 25

slide-44
SLIDE 44

Introduction

Sampler Template

Our sampler: for total of T rounds, do

y j−1 Pick

Aj[t]

t ¡ ¡

y j+1 y j

… …

j[t]

  • 1. Pick index j and the action Aj.
  • 2. Sample yj using Aj

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 6 / 25

slide-45
SLIDE 45

Introduction

Sampler Template

Our sampler: for total of T rounds, do

y j−1 Pick

9

y

Aj[t]

t ¡ ¡ t ¡+ ¡1 ¡

y j+1 y j

… …

j[t] y j−1 y j+1 y j '

… …

  • 1. Pick index j and the action Aj.
  • 2. Sample yj using Aj

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 6 / 25

slide-46
SLIDE 46

Introduction

Sampler Template

Our sampler: for total of T rounds, do

y j−1 Pick

9

y

Aj[t]

t ¡ ¡ t ¡+ ¡1 ¡

y j+1 y j

… …

j[t] y j−1 y j+1 y j '

… …

  • 1. Pick index j and the action Aj.
  • 2. Sample yj using Aj

Example

◮ Cyclic Gibbs sampler.

Pick j round-robin.

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 6 / 25

slide-47
SLIDE 47

Introduction

Sampler Template

Our sampler: for total of T rounds, do

y j−1 Pick

9

y

Aj[t]

t ¡ ¡ t ¡+ ¡1 ¡

y j+1 y j

… …

j[t] y j−1 y j+1 y j '

… …

  • 1. Pick index j and the action Aj.
  • 2. Sample yj using Aj

Example

◮ Cyclic Gibbs sampler.

Pick j round-robin.

◮ Random-Scan Gibbs sampler. Pick j uniformly at random.

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 6 / 25

slide-48
SLIDE 48

Introduction

Sampler Template

Our sampler: for total of T rounds, do

y j−1 Pick

9

y

Aj[t]

t ¡ ¡ t ¡+ ¡1 ¡

y j+1 y j

… …

j[t] y j−1 y j+1 y j '

… …

  • 1. Pick index j and the action Aj.
  • 2. Sample yj using Aj

Example

◮ Cyclic Gibbs sampler.

Pick j round-robin.

◮ Random-Scan Gibbs sampler. Pick j uniformly at random.

How to choose j?

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 6 / 25

slide-49
SLIDE 49

Reinforcement Learning

Outline

Introduction Reinforcement Learning Meta-Features Experiments

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 7 / 25

slide-50
SLIDE 50

Reinforcement Learning

Reinforcement Learning

◮ State = Entire history with configurations y[i] and choices j[i]

st = (y[0] . . . , y[t], j[0], . . . , j[t − 1])

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 8 / 25

slide-51
SLIDE 51

Reinforcement Learning

Reinforcement Learning

◮ State = Entire history with configurations y[i] and choices j[i]

st = (y[0] . . . , y[t], j[0], . . . , j[t − 1])

◮ Action = Pick a j[t] to sample

at = Aj[t]

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 8 / 25

slide-52
SLIDE 52

Reinforcement Learning

Reinforcement Learning

◮ State = Entire history with configurations y[i] and choices j[i]

st = (y[0] . . . , y[t], j[0], . . . , j[t − 1])

◮ Action = Pick a j[t] to sample

at = Aj[t]

◮ Transition = Pre-Trained Model p(yj|y¬j, x)

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 8 / 25

slide-53
SLIDE 53

Reinforcement Learning

Reinforcement Learning

◮ State = Entire history with configurations y[i] and choices j[i]

st = (y[0] . . . , y[t], j[0], . . . , j[t − 1])

◮ Action = Pick a j[t] to sample

at = Aj[t]

◮ Transition = Pre-Trained Model p(yj|y¬j, x) ◮ Reward = Improvement in log-probability

R(st, at, st+1) = log p(y[t +1]| x) − log p(y[t] | x)

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 8 / 25

slide-54
SLIDE 54

Reinforcement Learning

◮ Policy π: How to pick.

π : states → actions

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 9 / 25

slide-55
SLIDE 55

Reinforcement Learning

◮ Policy π: How to pick.

π : states → actions

◮ The expected cumulative reward RT

E[RT] = E[

T−1

  • t=0

R(st, at, st+1)].

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 9 / 25

slide-56
SLIDE 56

Reinforcement Learning

◮ Policy π: How to pick.

π : states → actions

◮ The expected cumulative reward RT

E[RT] = E[

T−1

  • t=0

R(st, at, st+1)]. Remark Cumulative reward = log p(y[T]| x) − log p(y[0]| x) (1) Maximizing cumulative reward is equivalent to maximizing probability of final sample.

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 9 / 25

slide-57
SLIDE 57

Reinforcement Learning

Learning Algorithm

Inspired by standard RL (Q-learning [Watkins et al. 1992], SARSA [Rummery et al. 1994]), Q(s, a)

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 10 / 25

slide-58
SLIDE 58

Reinforcement Learning

Learning Algorithm

Inspired by standard RL (Q-learning [Watkins et al. 1992], SARSA [Rummery et al. 1994]), Q(s, a) := how good it is to take action a in state s

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 10 / 25

slide-59
SLIDE 59

Reinforcement Learning

Learning Algorithm

Inspired by standard RL (Q-learning [Watkins et al. 1992], SARSA [Rummery et al. 1994]), Q(s, a) := how good it is to take action a in state s Example x I think now is the right time y[0] P V Adv V DT N N Q(s, a) 0.0 0.0 2.0 0.0 0.0 2.3 0.0

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 10 / 25

slide-60
SLIDE 60

Reinforcement Learning

Applying RL

Challenge: Efficiency

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 11 / 25

slide-61
SLIDE 61

Reinforcement Learning

Applying RL

Challenge: Efficiency Q(s, a) should be cheap to compute, so it does not become the computational bottleneck.

Cost to compute Q(s,a)

Cost to Sample

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 11 / 25

slide-62
SLIDE 62

Reinforcement Learning

Applying RL: Efficiency

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 12 / 25

slide-63
SLIDE 63

Reinforcement Learning

Applying RL: Efficiency

  • 1. Exploit locality.

Model Q(s, a) using local meta-features only.

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 12 / 25

slide-64
SLIDE 64

Reinforcement Learning

Applying RL: Efficiency

0 ¡

  • 1. Exploit locality.

Model Q(s, a) using local meta-features only.

  • 2. Catch.

Local meta-features can’t predict cumulative reward.

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 12 / 25

slide-65
SLIDE 65

Reinforcement Learning

Applying RL: Efficiency

0 ¡ 1 ¡

  • 1. Exploit locality.

Model Q(s, a) using local meta-features only.

  • 2. Catch.

Local meta-features can’t predict cumulative reward.

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 12 / 25

slide-66
SLIDE 66

Reinforcement Learning

Applying RL: Efficiency

0 ¡ 1 ¡ 2 ¡

  • 1. Exploit locality.

Model Q(s, a) using local meta-features only.

  • 2. Catch.

Local meta-features can’t predict cumulative reward.

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 12 / 25

slide-67
SLIDE 67

Reinforcement Learning

Applying RL: Efficiency

0 ¡ 1 ¡ 2 ¡

R

take action

  • 1. Exploit locality.

Model Q(s, a) using local meta-features only.

  • 2. Catch.

Local meta-features can’t predict cumulative reward.

  • 3. Credit Assignment.

Isolate the contribution of a.

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 12 / 25

slide-68
SLIDE 68

Reinforcement Learning

Applying RL: Efficiency

0 ¡ 1 ¡ 2 ¡

R

no action

  • 1. Exploit locality.

Model Q(s, a) using local meta-features only.

  • 2. Catch.

Local meta-features can’t predict cumulative reward.

  • 3. Credit Assignment.

Isolate the contribution of a.

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 12 / 25

slide-69
SLIDE 69

Reinforcement Learning

Applying RL: Efficiency

0 ¡ 1 ¡ 2 ¡

R

no action

  • 1. Exploit locality.

Model Q(s, a) using local meta-features only.

  • 2. Catch.

Local meta-features can’t predict cumulative reward.

  • 3. Credit Assignment.

Isolate the contribution of a. Fit Q(s, a) to R{taking action} − R{no action}

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 12 / 25

slide-70
SLIDE 70

Meta-Features

Outline

Introduction Reinforcement Learning Meta-Features Experiments

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 13 / 25

slide-71
SLIDE 71

Meta-Features

List of Meta-Features

Meta-Feature Templates Reason about name description Uncertainty cond-ent conditional entropy unigram-ent entropy by unigram model sp number of times sampled Staleness nb-vary number of neighbors changed Discord nb-discord discord with neighbors

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 14 / 25

slide-72
SLIDE 72

Meta-Features

List of Meta-Features

Meta-Feature Templates Reason about name description Uncertainty cond-ent conditional entropy unigram-ent entropy by unigram model sp number of times sampled Stalness nb-vary number of neighbors changed Discord nb-discord discord with neighbors

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 14 / 25

slide-73
SLIDE 73

Meta-Features

Uncertainty

Feature I. Entropy The entropy of q(yj|y¬j, x).

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 15 / 25

slide-74
SLIDE 74

Meta-Features

Uncertainty

Feature I. Entropy The entropy of q(yj|y¬j, x).

  • Warning. Computing entropy is as expensive as sampling.

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 15 / 25

slide-75
SLIDE 75

Meta-Features

Uncertainty

Feature I. Entropy The entropy of q(yj|y¬j, x).

  • Warning. Computing entropy is as expensive as sampling.

Principle (Use cheap meta-features) Complexity of meta-features ≪ Complexity of sampling

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 15 / 25

slide-76
SLIDE 76

Meta-Features

Uncertainty

Feature I. Entropy The entropy of q(yj|y¬j, x).

  • Warning. Computing entropy is as expensive as sampling.

Principle (Use cheap meta-features) Complexity of meta-features ≪ Complexity of sampling Very-lazy Evaluation

y j−1

t

y j+1 y j

… … Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 15 / 25

slide-77
SLIDE 77

Meta-Features

Uncertainty

Feature I. Entropy The entropy of q(yj|y¬j, x).

  • Warning. Computing entropy is as expensive as sampling.

Principle (Use cheap meta-features) Complexity of meta-features ≪ Complexity of sampling Very-lazy Evaluation

y j−1

t

y j+1 y j

… …

Meta-Features

φ j−1 φ j φ j+1

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 15 / 25

slide-78
SLIDE 78

Meta-Features

Uncertainty

Feature I. Entropy The entropy of q(yj|y¬j, x).

  • Warning. Computing entropy is as expensive as sampling.

Principle (Use cheap meta-features) Complexity of meta-features ≪ Complexity of sampling Very-lazy Evaluation

y j−1

t

y j+1

… …

Meta-Features

φ j−1 φ j φ j+1

y j

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 15 / 25

slide-79
SLIDE 79

Meta-Features

Uncertainty

Feature I. Entropy The entropy of q(yj|y¬j, x).

  • Warning. Computing entropy is as expensive as sampling.

Principle (Use cheap meta-features) Complexity of meta-features ≪ Complexity of sampling Very-lazy Evaluation

y j−1

9

y

t t + 1

y j+1

… …

y j−1 y j+1 y j '

… …

Meta-Features

φ j−1 φ j φ j+1 φ j−1 φ j ' φ j+1 φ j ' φ j+1

y j

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 15 / 25

slide-80
SLIDE 80

Meta-Features

Uncertainty

  • Warning. Entropy alone can be dangerous.

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 16 / 25

slide-81
SLIDE 81

Meta-Features

Uncertainty

  • Warning. Entropy alone can be dangerous.

Example

x The Duchess was entertaining sp(y3)

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 16 / 25

slide-82
SLIDE 82

Meta-Features

Uncertainty

  • Warning. Entropy alone can be dangerous.

Example

x The Duchess was entertaining sp(y3) y[0] Determiner Noun Verb Adjective

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 16 / 25

slide-83
SLIDE 83

Meta-Features

Uncertainty

  • Warning. Entropy alone can be dangerous.

Example

x The Duchess was entertaining sp(y3) y[0] Determiner Noun Verb Adjective y[1] Determiner Noun Verb Verb 1

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 16 / 25

slide-84
SLIDE 84

Meta-Features

Uncertainty

  • Warning. Entropy alone can be dangerous.

Example

x The Duchess was entertaining sp(y3) y[0] Determiner Noun Verb Adjective y[1] Determiner Noun Verb Verb 1 y[2] Determiner Noun Verb Adjective 2

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 16 / 25

slide-85
SLIDE 85

Meta-Features

Uncertainty

  • Warning. Entropy alone can be dangerous.

Example

x The Duchess was entertaining sp(y3) y[0] Determiner Noun Verb Adjective y[1] Determiner Noun Verb Verb 1 y[2] Determiner Noun Verb Adjective 2 y[3] Determiner Noun Verb Verb 3

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 16 / 25

slide-86
SLIDE 86

Meta-Features

Uncertainty

  • Warning. Entropy alone can be dangerous.

Example

x The Duchess was entertaining sp(y3) y[0] Determiner Noun Verb Adjective y[1] Determiner Noun Verb Verb 1 y[2] Determiner Noun Verb Adjective 2 y[3] Determiner Noun Verb Verb 3 y[4] Determiner Noun Verb Adjective 4

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 16 / 25

slide-87
SLIDE 87

Meta-Features

Uncertainty

  • Warning. Entropy alone can be dangerous.

Example

x The Duchess was entertaining sp(y3) y[0] Determiner Noun Verb Adjective y[1] Determiner Noun Verb Verb 1 y[2] Determiner Noun Verb Adjective 2 y[3] Determiner Noun Verb Verb 3 y[4] Determiner Noun Verb Adjective 4

Feature II. Over-exploration number of times yj has been sampled thus far.

◮ simplest measure of the progress in exploration. ◮ usually has negative weight.

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 16 / 25

slide-88
SLIDE 88

Meta-Features

Staleness

  • 3. Change of Markov Blanket

#variables in Markov blanket that have changed.

◮ Identify outdated variables.

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 17 / 25

slide-89
SLIDE 89

Meta-Features

Staleness

  • 3. Change of Markov Blanket

#variables in Markov blanket that have changed.

◮ Identify outdated variables.

vary = 0 Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 17 / 25

slide-90
SLIDE 90

Meta-Features

Staleness

  • 3. Change of Markov Blanket

#variables in Markov blanket that have changed.

◮ Identify outdated variables.

0 ¡ vary = 1 Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 17 / 25

slide-91
SLIDE 91

Meta-Features

Staleness

  • 3. Change of Markov Blanket

#variables in Markov blanket that have changed.

◮ Identify outdated variables.

0 ¡ vary = 2 1 ¡ Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 17 / 25

slide-92
SLIDE 92

Meta-Features

Staleness

  • 3. Change of Markov Blanket

#variables in Markov blanket that have changed.

◮ Identify outdated variables.

0 ¡ vary = 3 1 ¡

2 ¡

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 17 / 25

slide-93
SLIDE 93

Meta-Features

Staleness

  • 3. Change of Markov Blanket

#variables in Markov blanket that have changed.

◮ Identify outdated variables.

0 ¡ vary = 3 1 ¡

2 ¡

◮ Reason about very-lazy evaluation.

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 17 / 25

slide-94
SLIDE 94

Experiments

Outline

Introduction Reinforcement Learning Meta-Features Experiments

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 18 / 25

slide-95
SLIDE 95

Experiments

Experiment Outline

Tasks Part-of-speech tagging and name-entity recognition. Handwriting recognition. Color inpainting. Scene decomposition.

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 19 / 25

slide-96
SLIDE 96

Experiments

Experiment Outline

Tasks Part-of-speech tagging and name-entity recognition. Handwriting recognition. Color inpainting. Scene decomposition. Brief Setting

  • 1. “Steal” a graphical model with transitions.

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 19 / 25

slide-97
SLIDE 97

Experiments

Experiment Outline

Tasks Part-of-speech tagging and name-entity recognition. Handwriting recognition. Color inpainting. Scene decomposition. Brief Setting

  • 1. “Steal” a graphical model with transitions.
  • 2. Train the policy using RL on a training set.

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 19 / 25

slide-98
SLIDE 98

Experiments

Experiment Outline

Tasks Part-of-speech tagging and name-entity recognition. Handwriting recognition. Color inpainting. Scene decomposition. Brief Setting

  • 1. “Steal” a graphical model with transitions.
  • 2. Train the policy using RL on a training set.
  • 3. Evaluate the policy on a test set.

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 19 / 25

slide-99
SLIDE 99

Experiments

Speedup on NER

20 40 60 80 100 120

AvHragH 1umbHr of 7ransitions

0.70 0.72 0.74 0.76 0.78 0.80

F1 sForH

HHtHro6amSlHr FyFliF Gibbs

Actions

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 20 / 25

slide-100
SLIDE 100

Experiments

2-5X Speedup across tasks

(a) NER (factor size 2) (b) NER (factor size 4) (c) POS (factor size 4)

20 40 60 80 100 120

AvHragH 1umbHr of 7ransitions

0.70 0.72 0.74 0.76 0.78 0.80

F1 sForH

HHtHro6amSlHr FyFliF Gibbs

Actions

10 20 30 40 50 60 70 80

AvHragH 1umbHr of 7ransitions

0.70 0.72 0.74 0.76 0.78 0.80

F1 sForH

HHtHro6amSlHr FyFliF Gibbs

Actions

20 40 60 80 100 120

AvHragH 1umbHr of 7ransitions

0.92 0.93 0.94 0.95 0.96 0.97

Accuracy

HHtHro6amSlHr cyclic Gibbs

Actions

(d) OCR (factor size 4) (e) Color Inpainting (f) Scene Decomposition

20 40 60 80 100 120 140

AvHragH 1umbHr of 7ransitions

0.70 0.75 0.80 0.85 0.90 0.95

Accuracy

HHtHro6amSlHr cyclic Gibbs

Actions

2 4 6 8 10 12 14 16

AvHragH 1uPbHr of TransLtLons (x 105 )

−20 −18 −16 −14 −12 −10 −8 −6

Log ProbabLlLty (x 101 )

HHtHro6aPSlHr cyclLc GLbbs

Actions

200 400 600 800 1000

AvHragH 1uPbHr of 7ransLtLons

830 840 850 860 870 880 890

Log 3robabLlLty

HHtHro6aPSlHr cyclLc GLbbs

Actions

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 21 / 25

slide-101
SLIDE 101

Experiments

Overhead of meta-model is minimal

(a) NER (factor size 2) (b) NER (factor size 4) (c) POS (factor size 4)

40 60 80 100 120 140 160 180 200

Average 1uPber of TransiWions

20 40 60 80 100 120 140 160 180

WaOO-cOocN 6econds

PoOicy 2veraOO

Actions

40 60 80 100 120 140 160 180 200

Average 1uPber of TransiWions

20 40 60 80 100 120 140 160 180

WaOO-cOocN 6econds

PoOicy 2veraOO

Actions

100 200 300 400 500 600 700 800 900

Average 1uPber of 7ransiWions

100 200 300 400 500 600 700 800 900

WaOO-cOocN 6econds

3oOicy 2veraOO

Actions

(d) OCR (factor size 4) (e) Color Inpainting (f) Scene Decomposition

50 100 150 200 250

Average 1uPber of TransiWions

50 100 150 200 250

WaOO-cOocN Seconds

PoOicy 2veraOO

Actions

50 100 150 200 250 300 350

Average 1uPber of 7ransiWions

10 20 30 40 50 60 70 80

WaOO-cOocN 6econds

3oOicy 2veraOO

Actions

100 150 200 250 300

Average 1uPber of TransiWions

50 100 150 200 250

WaOO-cOocN Seconds

3oOicy 2veraOO

Actions

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 22 / 25

slide-102
SLIDE 102

Experiments

Qualitative Results

Visualization of computational resource allocation on NER:

Words Japan coach Shu Kamo said : ' ' The Syrian

  • wn goal proved lucky for us

Truth B-LOC O B-PER I-PER O O O O O B-MISC O O O O O O 1 I-ORG O B-PER I-PER O O O O O B-LOC O O O O O O 2 B-LOC O B-PER I-PER O O O O O B-MISC O O O O O O 3 B-LOC O B-PER I-PER O O O O O B-MISC O O O O O O

(a) Cyclic Gibbs sampler

Words Japan coach Shu Kamo said : ' ' The Syrian

  • wn goal proved lucky for us

Truth B-LOC O B-PER I-PER O O O O O B-MISC O O O O O O 1 I-ORG O B-PER I-PER O O O O O B-LOC O O O O O O 2 B-LOC O B-PER I-PER O O O O O B-MISC O O O O O O 3 B-LOC O B-PER I-PER O O O O O B-MISC O O O O O O Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 23 / 25

slide-103
SLIDE 103

Experiments

Qualitative Results

Visualization of computational resource allocation on NER:

Words Japan coach Shu Kamo said : ' ' The Syrian

  • wn goal proved lucky for us

Truth B-LOC O B-PER I-PER O O O O O B-MISC O O O O O O 1 I-ORG O B-PER I-PER O O O O O B-LOC O O O O O O 2 B-LOC O B-PER I-PER O O O O O B-MISC O O O O O O 3 B-LOC O B-PER I-PER O O O O O B-MISC O O O O O O

(a) Cyclic Gibbs sampler

Words Japan coach Shu Kamo said : ' ' The Syrian

  • wn goal proved lucky for us

Truth B-LOC O B-PER I-PER O O O O O B-MISC O O O O O O 1 I-ORG O B-PER I-PER O O O O O B-LOC O O O O O O 2 B-LOC O B-PER I-PER O O O O O B-MISC O O O O O O 3 B-LOC O B-PER I-PER O O O O O B-MISC O O O O O O

(b) HeteroSampler

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 23 / 25

slide-104
SLIDE 104

Experiments

Related Work

Heterogenous Inference

◮ Schedules asynchronous belief propagation [Elidan et al., 2006] ◮ Query-aware belief propagation [Chechetka et al., 2010]

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 24 / 25

slide-105
SLIDE 105

Experiments

Related Work

Heterogenous Inference

◮ Schedules asynchronous belief propagation [Elidan et al., 2006] ◮ Query-aware belief propagation [Chechetka et al., 2010]

Use simple models to help computation

◮ Coarse-to-fine object detection [Viola and Jones, 2001] ◮ Structured prediction cascades [Weiss and Taskar, 2010]

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 24 / 25

slide-106
SLIDE 106

Experiments

Related Work

Heterogenous Inference

◮ Schedules asynchronous belief propagation [Elidan et al., 2006] ◮ Query-aware belief propagation [Chechetka et al., 2010]

Use simple models to help computation

◮ Coarse-to-fine object detection [Viola and Jones, 2001] ◮ Structured prediction cascades [Weiss and Taskar, 2010]

RL for structured prediction

◮ SEARN [Daume et al., 2009] ◮ DAGGER [Ross et al., 2011a] ◮ RL for dependency parsing [Goldberg and Nivre, 2013]

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 24 / 25

slide-107
SLIDE 107

Experiments

Contribution

Take-way Message: Sample Heterogeneously!

Words Japan coach Shu Kamo said : ' ' The Syrian

  • wn goal proved lucky for us

Truth B-LOC O B-PER I-PER O O O O O B-MISC O O O O O O 1 I-ORG O B-PER I-PER O O O O O B-LOC O O O O O O 2 B-LOC O B-PER I-PER O O O O O B-MISC O O O O O O 3 B-LOC O B-PER I-PER O O O O O B-MISC O O O O O O Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 25 / 25

slide-108
SLIDE 108

Experiments

Contribution

Take-way Message: Sample Heterogeneously!

Words Japan coach Shu Kamo said : ' ' The Syrian

  • wn goal proved lucky for us

Truth B-LOC O B-PER I-PER O O O O O B-MISC O O O O O O 1 I-ORG O B-PER I-PER O O O O O B-LOC O O O O O O 2 B-LOC O B-PER I-PER O O O O O B-MISC O O O O O O 3 B-LOC O B-PER I-PER O O O O O B-MISC O O O O O O

Important Ideas

  • 1. It is effective to reason about uncertainty, staleness, etc.
  • 2. It is possible to predict where to sample using cheap meta-features

(Very-lazy evaluation).

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 25 / 25

slide-109
SLIDE 109

Experiments

Contribution

Take-way Message: Sample Heterogeneously!

Words Japan coach Shu Kamo said : ' ' The Syrian

  • wn goal proved lucky for us

Truth B-LOC O B-PER I-PER O O O O O B-MISC O O O O O O 1 I-ORG O B-PER I-PER O O O O O B-LOC O O O O O O 2 B-LOC O B-PER I-PER O O O O O B-MISC O O O O O O 3 B-LOC O B-PER I-PER O O O O O B-MISC O O O O O O

Important Ideas

  • 1. It is effective to reason about uncertainty, staleness, etc.
  • 2. It is possible to predict where to sample using cheap meta-features

(Very-lazy evaluation). Thanks!

Tianlin S., J. Steinhardt, P. Liang () HeteroSampler AISTATS 2015 25 / 25