Multi-Task Active Learning Yi Zhang Outline Active Learning - - PowerPoint PPT Presentation

multi task active learning
SMART_READER_LITE
LIVE PREVIEW

Multi-Task Active Learning Yi Zhang Outline Active Learning - - PowerPoint PPT Presentation

Multi-Task Active Learning Yi Zhang Outline Active Learning Multi-Task Active Learning Linguistic Annotations (ACL 08) Image Classification (CVPR 08) Current Work and Discussions Constraint-Driven Active Learning


slide-1
SLIDE 1

Multi-Task Active Learning

Yi Zhang

slide-2
SLIDE 2

Outline

 Active Learning  Multi-Task Active Learning

 Linguistic Annotations (ACL’ 08)  Image Classification (CVPR’ 08)

 Current Work and Discussions

 Constraint-Driven Active Learning Across Tasks  Cost-Sensitive Active Learning Across Tasks  Active Learning of Constraints and Categories

slide-3
SLIDE 3

Outline

 Active Learning  Multi-Task Active Learning

 Linguistic Annotations (ACL’ 08)  Image Classification (CVPR’ 08)

 Current Work and Discussions

 Constraint-Driven Active Learning Across Tasks  Cost-Sensitive Active Learning Across Tasks  Active Learning of Constraints and Categories

slide-4
SLIDE 4

Active Learning

 Select samples for labeling

 Optimize model performance given the new label

slide-5
SLIDE 5

Active Learning

 Uncertainty sampling

 Maximize: the reduction of model entropy on x

slide-6
SLIDE 6

Active Learning

 Query by committee (e.g., vote entropy)

 Maximize: the reduction of version space

slide-7
SLIDE 7

Active Learning

 Density-weighted entropy

 Maximize: approx. entropy reduction over U

slide-8
SLIDE 8

Active Learning

 Estimated error (uncertainty) reduction

 Maximize: reduction of uncertainty over U

slide-9
SLIDE 9

Outline

 Active Learning  Multi-Task Active Learning

 Linguistic Annotations (ACL’ 08)  Image Classification (CVPR’ 08)

 Current Work and Discussions

 Constraint-Driven Active Learning Across Tasks  Cost-Sensitive Active Learning Across Tasks  Active Learning of Constraints and Categories

slide-10
SLIDE 10

The Problem

 Select a sample  labeling all tasks

slide-11
SLIDE 11

Methods

 Alternating selection

 Iterate over tasks, sample a few from each task

slide-12
SLIDE 12

Methods

 Rank combination

 Combine rankings/scores from all single-task ALs

slide-13
SLIDE 13

Experiments

 Learning two (dissimilar) tasks

 Named entity recognition: CRFs  Parsing: Collins’ parsing model

 Competitive AL methods

 Random selection  One-side active learning: choose samples from

  • ne task, and require labels for all tasks

 Separate AL in each task is not studied (!)

 Alternating selection  Ranking combination

slide-14
SLIDE 14

Unanswered Questions

 Why “choose-one, labeling-all”?

 Authors: annotators may prefer to annotate the

same sample for all tasks

 Why learning two dissimilar tasks together?

 Outputs of one task may be useful for the other  Not studied in the paper

slide-15
SLIDE 15

Outline

 Active Learning  Multi-Task Active Learning

 Linguistic Annotations (ACL’ 08)  Image Classification (CVPR’ 08)

 Current Work and Discussions

 Constraint-Driven Active Learning Across Tasks  Cost-Sensitive Active Learning Across Tasks  Active Learning of Constraints and Categories

slide-16
SLIDE 16

The Problem: Multi-Label Image Classification

 Select any sample-label pair for labeling

slide-17
SLIDE 17

Proposed Method

 D: the set of samples  x: a sample in D  U(x): unknown labels of x  L(x): known labels of x  m: number of tasks  ys: a selected label from U(x)  yi: the label of the ith task (for a sample x)

slide-18
SLIDE 18

Proposed Method

 Why maximizing Mutual Information?

 Connecting Bayes (binary) classification error to

entropy and MI (Hellman and Raviv, 1970)

slide-19
SLIDE 19

Proposed Method

 Why maximizing Mutual Information?

 Connecting Bayes (binary) classification error to

entropy and MI (Hellman and Raviv, 1970)

slide-20
SLIDE 20

Proposed Method

 Compare: maximize the reduction of entropy

slide-21
SLIDE 21

Modeling Joint Label Probability

 But how to compute:  Need the joint conditional probability of labels

slide-22
SLIDE 22

Modeling Joint Label Probability

 Linear maximum entropy model  Kernelized version  EM for incomplete labels

slide-23
SLIDE 23

Experiments

 Data

 Image scene classification  Gene function classification

 Two competitive AL methods

 Random selection of sample-label pairs  Choose one sample, labeling all tasks for it

 Separate AL in each task is not studied (!)

slide-24
SLIDE 24

Discussion

 Maximizing the joint mutual information is

reasonable

 Directly estimate the joint label probability

 Recognize the correlation between labels  Need more labeled examples  What if # tasks is large?  Cannot use specialized models for each task  Can we use external knowledge to couple tasks?

slide-25
SLIDE 25

Outline

 Active Learning  Multi-Task Active Learning

 Linguistic Annotations (ACL’ 08)  Image Classification (CVPR’ 08)

 Current Work and Discussions

 Constraint-Driven Active Learning Across Tasks  Cost-Sensitive Active Learning Across Tasks  Active Learning of Constraints and Categories

slide-26
SLIDE 26

Constraint-Driven Multi-Task Active Learning

 Multiple tasks Y1, Y2, …, Ym  Learners for each task  A set of constraints C among tasks  May have new tasks to launch

slide-27
SLIDE 27

Value of Information (VOI) for Active Learning

 Single-task AL

 Value of information (VOI) for labeling a sample x

slide-28
SLIDE 28

Value of Information (VOI) for Active Learning

 Single-task AL

 Value of information (VOI) for labeling a sample x  Reward R(Y=y, x), e.g., how surprising it is?

slide-29
SLIDE 29

Value of Information (VOI) for Active Learning

 Single-task AL

 Value of information (VOI) for labeling a sample x  Reward R(Y=y, x), e.g., how surprising it is?  Finally, replace P(Y=y|x) with

slide-30
SLIDE 30

Constraint-Driven Active Learning

 Multiple tasks with constraints  Probability estimate of outcomes

slide-31
SLIDE 31

Constraint-Driven Active Learning

 Reward function R(y,x) in:

slide-32
SLIDE 32

Constraint-Driven Active Learning

 Propagate rewards via constraints

slide-33
SLIDE 33

Constraint-Driven Active Learning

 Multi-task AL with constraints

 Recognize inconsistency of among tasks  Launch new tasks  Favor poorly performed tasks, and “pivot” tasks  Density-weighted measure?  Use state-of-the-art learners for single tasks

slide-34
SLIDE 34

Experiments

 Four named entity recognition tasks

 “Animal”  “Mammal”  “Food”  “Celebrity”

 Constraints

 1 inheritance, 5 mutual exclusion  Lead to 12 propagation rules (plus 1 identity rule)

slide-35
SLIDE 35

Experiments

 Competitive methods for AL

 VOI of sample-task pairs with constraints  VOI of sample-task pairs without constraints

 Single-task AL

slide-36
SLIDE 36

Experiments

 Results: MAP on animal, food and celebrity

slide-37
SLIDE 37

Experiments

 Results: MAP on all four tasks

slide-38
SLIDE 38

Experiments

 Analysis

 True labels from the NNLL system  90% precision for “mammal”

 10% label noise on the task “mammal”

 Tasks are generally “easy”

 Positive examples are highly homogenous

slide-39
SLIDE 39

Outline

 Active Learning  Multi-Task Active Learning

 Linguistic Annotations (ACL’ 08)  Image Classification (CVPR’ 08)

 Current Work and Discussions

 Constraint-Driven Active Learning Across Tasks  Cost-Sensitive Active Learning Across Tasks  Active Learning of Constraints and Categories

slide-40
SLIDE 40

Cost-Sensitive Active Learning Across Tasks

 Which scenario is reasonable?

 Choose one sample, label all tasks  Arbitrary sample-label pairs

slide-41
SLIDE 41

Cost-Sensitive Active Learning Across Tasks

 Costs for labeling multi tasks on a sample x

 x is a long document

slide-42
SLIDE 42

Cost-Sensitive Active Learning Across Tasks

 Costs for labeling multi tasks on a sample x

 x is a word or an image

slide-43
SLIDE 43

Cost-Sensitive Active Learning Across Tasks

 Learn a more realistic cost function?  Active learning aware of labeling costs?

slide-44
SLIDE 44

Outline

 Active Learning  Multi-Task Active Learning

 Linguistic Annotations (ACL’ 08)  Image Classification (CVPR’ 08)

 Current Work and Discussions

 Constraint-Driven Active Learning Across Tasks  Cost-Sensitive Active Learning Across Tasks  Active Learning of Constraints and Categories

slide-45
SLIDE 45

Active Constraint Learning

 New constraints/rules are highly valuable  Find significant rules and avoid false discovery

 Oversearching (Quinlan, et al. IJCAI’ 95)  Multiple comparisons (Jensen, et al. MLJ’ 00)  Statistical tests (Webb, MLJ’ 06)

 Combining first-order logic with graphical models

 Bayesian logic programs (logic + BN)  Markov logic networks (logic + MRF)  Structure sparsity on graphs?

slide-46
SLIDE 46

Active Category Detection

 Automatically detect new categories  Clustering

 High-dimensional space  Co-clustering/bi-clustering  Local search vs. global partition

 Subgraph/community detection

 A huge bipartite graph  Optimize modularity of the graph  Overlapping communities?

slide-47
SLIDE 47

Thanks!

 Questions?