Learning how to Active Learn: A Deep Reinforcement Learning Approach - - PowerPoint PPT Presentation

learning how to active learn a deep reinforcement
SMART_READER_LITE
LIVE PREVIEW

Learning how to Active Learn: A Deep Reinforcement Learning Approach - - PowerPoint PPT Presentation

Learning how to Active Learn: A Deep Reinforcement Learning Approach Meng Fang, Yuan Li, Trevor Cohn The University of Melbourne Presenter: Jialin Song April 05, 2018 April 05, 2018 1 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine


slide-1
SLIDE 1

Learning how to Active Learn: A Deep Reinforcement Learning Approach

Meng Fang, Yuan Li, Trevor Cohn The University of Melbourne Presenter: Jialin Song April 05, 2018

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 1 / 17

slide-2
SLIDE 2

Overview

1

Introduction

2

Model

3

Algorithms

4

Numerical Experiments

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 2 / 17

slide-3
SLIDE 3

Introduction: Active Learning

1 Annotation: Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 3 / 17

slide-4
SLIDE 4

Introduction: Active Learning

1 Annotation:

⋄ select a subset of data to annotate from a large unlabelled dataset (adding labels)

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 3 / 17

slide-5
SLIDE 5

Introduction: Active Learning

1 Annotation:

⋄ select a subset of data to annotate from a large unlabelled dataset (adding labels) ⋄ then we can train a supervised learning model φ (classifier)

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 3 / 17

slide-6
SLIDE 6

Introduction: Active Learning

1 Annotation:

⋄ select a subset of data to annotate from a large unlabelled dataset (adding labels) ⋄ then we can train a supervised learning model φ (classifier) ⋄ we hope to maximize the accuracy of the classification model

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 3 / 17

slide-7
SLIDE 7

Introduction: Active Learning

1 Annotation:

⋄ select a subset of data to annotate from a large unlabelled dataset (adding labels) ⋄ then we can train a supervised learning model φ (classifier) ⋄ we hope to maximize the accuracy of the classification model

2 Active learning: Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 3 / 17

slide-8
SLIDE 8

Introduction: Active Learning

1 Annotation:

⋄ select a subset of data to annotate from a large unlabelled dataset (adding labels) ⋄ then we can train a supervised learning model φ (classifier) ⋄ we hope to maximize the accuracy of the classification model

2 Active learning:

⋄ there is high cost annotating every sentence

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 3 / 17

slide-9
SLIDE 9

Introduction: Active Learning

1 Annotation:

⋄ select a subset of data to annotate from a large unlabelled dataset (adding labels) ⋄ then we can train a supervised learning model φ (classifier) ⋄ we hope to maximize the accuracy of the classification model

2 Active learning:

⋄ there is high cost annotating every sentence ⋄ how to select raw data to add labels in order to maximize the accuracy

  • f the classification model

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 3 / 17

slide-10
SLIDE 10

Introduction: Active Learning

1 Annotation:

⋄ select a subset of data to annotate from a large unlabelled dataset (adding labels) ⋄ then we can train a supervised learning model φ (classifier) ⋄ we hope to maximize the accuracy of the classification model

2 Active learning:

⋄ there is high cost annotating every sentence ⋄ how to select raw data to add labels in order to maximize the accuracy

  • f the classification model

⋄ active learning becomes a sequential decision: as each sentence arrives, annotate it or not (our action)

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 3 / 17

slide-11
SLIDE 11

Introduction: MDP

1 Markov Decision Process (MDP): Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 4 / 17

slide-12
SLIDE 12

Introduction: MDP

1 Markov Decision Process (MDP):

⋄ a framework to model a sequential decision process

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 4 / 17

slide-13
SLIDE 13

Introduction: MDP

1 Markov Decision Process (MDP):

⋄ a framework to model a sequential decision process ⋄ in each decision stage, agent observes state variables (s) and take a action (a) to maximize its current payoff

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 4 / 17

slide-14
SLIDE 14

Introduction: MDP

1 Markov Decision Process (MDP):

⋄ a framework to model a sequential decision process ⋄ in each decision stage, agent observes state variables (s) and take a action (a) to maximize its current payoff ⋄ after taking the action, a reward associated with the action and state (r(s, a)) is generated and current state transits to next state

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 4 / 17

slide-15
SLIDE 15

Introduction: MDP

1 Markov Decision Process (MDP):

⋄ a framework to model a sequential decision process ⋄ in each decision stage, agent observes state variables (s) and take a action (a) to maximize its current payoff ⋄ after taking the action, a reward associated with the action and state (r(s, a)) is generated and current state transits to next state ⋄ agent aims maximizing the expected value of rewards over all stages

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 4 / 17

slide-16
SLIDE 16

Introduction: Bellman Equation

1 The dynamics of MDP can be modeled in Bellman equations Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 5 / 17

slide-17
SLIDE 17

Introduction: Bellman Equation

1 The dynamics of MDP can be modeled in Bellman equations

⋄ Bellman equation 1: value function J(s) = max

a

  • ¯

r(s, a) + α

  • s′

Pss′(a)J(s′)

  • a∗

s = argmax J(s)

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 5 / 17

slide-18
SLIDE 18

Introduction: Bellman Equation

1 The dynamics of MDP can be modeled in Bellman equations

⋄ Bellman equation 1: value function J(s) = max

a

  • ¯

r(s, a) + α

  • s′

Pss′(a)J(s′)

  • a∗

s = argmax J(s)

⋄ Bellman equation 2 (more common!): Q-function Q(s, a) = ¯ r(s, a) + α

  • s′

Pss′(a) max

u

Q(s′, u) a∗

s = argmax Q(s, a)

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 5 / 17

slide-19
SLIDE 19

Introduction: Bellman Equation

1 The dynamics of MDP can be modeled in Bellman equations

⋄ Bellman equation 1: value function J(s) = max

a

  • ¯

r(s, a) + α

  • s′

Pss′(a)J(s′)

  • a∗

s = argmax J(s)

⋄ Bellman equation 2 (more common!): Q-function Q(s, a) = ¯ r(s, a) + α

  • s′

Pss′(a) max

u

Q(s′, u) a∗

s = argmax Q(s, a)

⋄ where ¯ r(s, a) is the expected reward, Pss′(a) is the transition probability from state s to s′, α is the discount of reward

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 5 / 17

slide-20
SLIDE 20

Q-Learning

1 If Pss′(a) is known, then solve the Bellmen equations (VI/PI) to get

the optimal policy. There is no need to ’learn’!!!

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 6 / 17

slide-21
SLIDE 21

Q-Learning

1 If Pss′(a) is known, then solve the Bellmen equations (VI/PI) to get

the optimal policy. There is no need to ’learn’!!!

2 If Pss′(a) is not known, then how to compute Q-function becomes a

learning problem

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 6 / 17

slide-22
SLIDE 22

Q-Learning

1 If Pss′(a) is known, then solve the Bellmen equations (VI/PI) to get

the optimal policy. There is no need to ’learn’!!!

2 If Pss′(a) is not known, then how to compute Q-function becomes a

learning problem

3 Q-learning: Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 6 / 17

slide-23
SLIDE 23

Q-Learning

1 If Pss′(a) is known, then solve the Bellmen equations (VI/PI) to get

the optimal policy. There is no need to ’learn’!!!

2 If Pss′(a) is not known, then how to compute Q-function becomes a

learning problem

3 Q-learning:

⋄ Qt+1(st, at) = (1 − ǫt)Qt(st, at) + ǫt

  • ¯

r(st, at) + α maxu Qt(st+1, u)

  • Meng Fang, Yuan Li, Trevor Cohn

CS 546 Machine Learning in NLP April 05, 2018 6 / 17

slide-24
SLIDE 24

Q-Learning

1 If Pss′(a) is known, then solve the Bellmen equations (VI/PI) to get

the optimal policy. There is no need to ’learn’!!!

2 If Pss′(a) is not known, then how to compute Q-function becomes a

learning problem

3 Q-learning:

⋄ Qt+1(st, at) = (1 − ǫt)Qt(st, at) + ǫt

  • ¯

r(st, at) + α maxu Qt(st+1, u)

  • ⋄ where t is iteration and ǫt is the learning rate

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 6 / 17

slide-25
SLIDE 25

Q-Learning

1 If Pss′(a) is known, then solve the Bellmen equations (VI/PI) to get

the optimal policy. There is no need to ’learn’!!!

2 If Pss′(a) is not known, then how to compute Q-function becomes a

learning problem

3 Q-learning:

⋄ Qt+1(st, at) = (1 − ǫt)Qt(st, at) + ǫt

  • ¯

r(st, at) + α maxu Qt(st+1, u)

  • ⋄ where t is iteration and ǫt is the learning rate

⋄ In practice, above is useless: |S| × |A| is huge

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 6 / 17

slide-26
SLIDE 26

Deep Q-Learning

1 Deep Q-learning: Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 7 / 17

slide-27
SLIDE 27

Deep Q-Learning

1 Deep Q-learning:

⋄ use the output of a DNN parametrized by θ, i.e., fθ(s, u) to approximate Q(s, a) :

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 7 / 17

slide-28
SLIDE 28

Deep Q-Learning

1 Deep Q-learning:

⋄ use the output of a DNN parametrized by θ, i.e., fθ(s, u) to approximate Q(s, a) : ⋄ input: state s, action a, reward r(s, a), state transition s′

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 7 / 17

slide-29
SLIDE 29

Deep Q-Learning

1 Deep Q-learning:

⋄ use the output of a DNN parametrized by θ, i.e., fθ(s, u) to approximate Q(s, a) : ⋄ input: state s, action a, reward r(s, a), state transition s′ ⋄ output: approximation of Q-function: fθ(s, u)

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 7 / 17

slide-30
SLIDE 30

Deep Q-Learning

1 Deep Q-learning:

⋄ use the output of a DNN parametrized by θ, i.e., fθ(s, u) to approximate Q(s, a) : ⋄ input: state s, action a, reward r(s, a), state transition s′ ⋄ output: approximation of Q-function: fθ(s, u) ⋄ the loss function minimization min

θ

1 2

  • fθt(st, at) − ¯

r(st, at) − α max

u

fθt(st+1, u) 2

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 7 / 17

slide-31
SLIDE 31

Model Active Learning as MDP

1 sentence xi from an unlabelled dataset arrives one by one Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 8 / 17

slide-32
SLIDE 32

Model Active Learning as MDP

1 sentence xi from an unlabelled dataset arrives one by one 2 for each arriving sentence, agent decides whether to annotate it or

not (binary action)

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 8 / 17

slide-33
SLIDE 33

Model Active Learning as MDP

1 sentence xi from an unlabelled dataset arrives one by one 2 for each arriving sentence, agent decides whether to annotate it or

not (binary action)

3 if agent annotates it, then the annotated set (labelled set) gets

expanded, and the classifier φ is re-trained and updated

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 8 / 17

slide-34
SLIDE 34

Model Active Learning as MDP

1 sentence xi from an unlabelled dataset arrives one by one 2 for each arriving sentence, agent decides whether to annotate it or

not (binary action)

3 if agent annotates it, then the annotated set (labelled set) gets

expanded, and the classifier φ is re-trained and updated

4 evaluate the updated classier on a separate independent dataset, get

the test accuracy (reward)

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 8 / 17

slide-35
SLIDE 35

Model Active Learning as MDP

1 sentence xi from an unlabelled dataset arrives one by one 2 for each arriving sentence, agent decides whether to annotate it or

not (binary action)

3 if agent annotates it, then the annotated set (labelled set) gets

expanded, and the classifier φ is re-trained and updated

4 evaluate the updated classier on a separate independent dataset, get

the test accuracy (reward)

5 next sentence arrives Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 8 / 17

slide-36
SLIDE 36

Model Active Learning as MDP

1 Sate (s): comprised of 2 parts: Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 9 / 17

slide-37
SLIDE 37

Model Active Learning as MDP

1 Sate (s): comprised of 2 parts:

⋄ input setence: xi (encoded using a CNN, hc)

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 9 / 17

slide-38
SLIDE 38

Model Active Learning as MDP

1 Sate (s): comprised of 2 parts:

⋄ input setence: xi (encoded using a CNN, hc) ⋄ trained classification model φ (encoded using a CNN, he)

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 9 / 17

slide-39
SLIDE 39

Model Active Learning as MDP

1 Sate (s): comprised of 2 parts:

⋄ input setence: xi (encoded using a CNN, hc) ⋄ trained classification model φ (encoded using a CNN, he)

2 Action (a): Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 9 / 17

slide-40
SLIDE 40

Model Active Learning as MDP

1 Sate (s): comprised of 2 parts:

⋄ input setence: xi (encoded using a CNN, hc) ⋄ trained classification model φ (encoded using a CNN, he)

2 Action (a):

⋄ ai = 1: annotate xi

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 9 / 17

slide-41
SLIDE 41

Model Active Learning as MDP

1 Sate (s): comprised of 2 parts:

⋄ input setence: xi (encoded using a CNN, hc) ⋄ trained classification model φ (encoded using a CNN, he)

2 Action (a):

⋄ ai = 1: annotate xi ⋄ ai = 0: not annotate xi

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 9 / 17

slide-42
SLIDE 42

Model Active Learning as MDP

1 Sate (s): comprised of 2 parts:

⋄ input setence: xi (encoded using a CNN, hc) ⋄ trained classification model φ (encoded using a CNN, he)

2 Action (a):

⋄ ai = 1: annotate xi ⋄ ai = 0: not annotate xi

3 Reward (r): Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 9 / 17

slide-43
SLIDE 43

Model Active Learning as MDP

1 Sate (s): comprised of 2 parts:

⋄ input setence: xi (encoded using a CNN, hc) ⋄ trained classification model φ (encoded using a CNN, he)

2 Action (a):

⋄ ai = 1: annotate xi ⋄ ai = 0: not annotate xi

3 Reward (r):

⋄ evaluate the classification model on a held-out set after the action a is taken and get the test accuracy

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 9 / 17

slide-44
SLIDE 44

An Value Iteration Q-learning Algorithm

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 10 / 17

slide-45
SLIDE 45

Important Step 1

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 11 / 17

slide-46
SLIDE 46

Important Step 2

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 12 / 17

slide-47
SLIDE 47

Important Step 3

1 Remarks on the Q-learning algorithm: Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 13 / 17

slide-48
SLIDE 48

Important Step 3

1 Remarks on the Q-learning algorithm:

⋄ input: unlabelled dataset D

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 13 / 17

slide-49
SLIDE 49

Important Step 3

1 Remarks on the Q-learning algorithm:

⋄ input: unlabelled dataset D ⋄ output: a series of actions (ai): policy π

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 13 / 17

slide-50
SLIDE 50

Relaxation 1: Transfer Policy

1 train annotation policy π in source language (e.g., English) and

transfer it to low-source target language

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 14 / 17

slide-51
SLIDE 51

Relaxation 1: Transfer Policy

1 train annotation policy π in source language (e.g., English) and

transfer it to low-source target language

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 14 / 17

slide-52
SLIDE 52

Relaxation 2: Transfer Model and Policy

1 train a classification model φ and annotation policy π in source

language (e.g., English) and transfer both to low-source target language

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 15 / 17

slide-53
SLIDE 53

Relaxation 2: Transfer Model and Policy

1 train a classification model φ and annotation policy π in source

language (e.g., English) and transfer both to low-source target language

2 this relaxation is more like a test and implementation procedure Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 15 / 17

slide-54
SLIDE 54

Numerical Experiments

A couple of numerical experiments show that the newly proposed active learning approach by deep Q-learning works better than some existing active learning methods such as uncertainty sampling and random sampling.

Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP April 05, 2018 16 / 17

slide-55
SLIDE 55

Thank You! .....Question?