SLIDE 1 Aggregating and Predicting Sequence Labels from Crowd Annotations
An T. Nguyen1∗ Byron C. Wallace2 Jessy Li1,3 Ani Nenkova3 Matthew Lease1
1University of Texas at Austin 2 Northeastern University 3 University of Pennsylvania
ACL 2017
∗Presenter
1
SLIDE 2
Problem: Sequence Labeling with Crowd Labels
2
SLIDE 3 Problem: Sequence Labeling with Crowd Labels
Example: Named Entity Recognition. X U.N.
Ekeus heads for Baghdad Y Org O Per O O Loc
2
SLIDE 4 Problem: Sequence Labeling with Crowd Labels
Example: Named Entity Recognition. X U.N.
Ekeus heads for Baghdad Y Org O Per O O Loc W1 Org O Org O O Loc
2
SLIDE 5 Problem: Sequence Labeling with Crowd Labels
Example: Named Entity Recognition. X U.N.
Ekeus heads for Baghdad Y Org O Per O O Loc W1 Org O Org O O Loc W2 Org Per Per O O Loc
2
SLIDE 6 Problem: Sequence Labeling with Crowd Labels
Example: Named Entity Recognition. X U.N.
Ekeus heads for Baghdad Y Org O Per O O Loc W1 Org O Org O O Loc W2 Org Per Per O O Loc W3 Org O Per O O Loc
2
SLIDE 7 Problem: Sequence Labeling with Crowd Labels
Example: Named Entity Recognition. X U.N.
Ekeus heads for Baghdad Y Org O Per O O Loc W1 Org O Org O O Loc W2 Org Per Per O O Loc W3 Org O Per O O Loc Two tasks:
2
SLIDE 8 Problem: Sequence Labeling with Crowd Labels
Example: Named Entity Recognition. X U.N.
Ekeus heads for Baghdad Y Org O Per O O Loc W1 Org O Org O O Loc W2 Org Per Per O O Loc W3 Org O Per O O Loc Two tasks:
◮ Aggregation: Given (X, W1,2,3), Estimate Y
2
SLIDE 9 Problem: Sequence Labeling with Crowd Labels
Example: Named Entity Recognition. X U.N.
Ekeus heads for Baghdad Y Org O Per O O Loc W1 Org O Org O O Loc W2 Org Per Per O O Loc W3 Org O Per O O Loc Two tasks:
◮ Aggregation: Given (X, W1,2,3), Estimate Y ◮ Prediction: Given train data (X, W1,2,3), Predict Ytest for Xtest
2
SLIDE 10
Our work
Contribution: Two Joint models of sequences and crowd.
3
SLIDE 11 Our work
Contribution: Two Joint models of sequences and crowd.
◮ Hidden Markov Models (HMMs) +
Crowd Confusion Matrices.
3
SLIDE 12 Our work
Contribution: Two Joint models of sequences and crowd.
◮ Hidden Markov Models (HMMs) +
Crowd Confusion Matrices.
◮ Long Short Term memory (LSTM) +
Crowd Embedding Vectors.
3
SLIDE 13 Our work
Contribution: Two Joint models of sequences and crowd.
◮ Hidden Markov Models (HMMs) +
Crowd Confusion Matrices.
◮ Long Short Term memory (LSTM) +
Crowd Embedding Vectors. Evaluation:
◮ News NER + Biomedical IE. ◮ A range of baselines.
3
SLIDE 14 Our work
Contribution: Two Joint models of sequences and crowd.
◮ Hidden Markov Models (HMMs) +
Crowd Confusion Matrices.
◮ Long Short Term memory (LSTM) +
Crowd Embedding Vectors. Evaluation:
◮ News NER + Biomedical IE. ◮ A range of baselines.
Code + Data on Github.
3
SLIDE 15
HMM-Crowd
(for task 1 - aggregation)
HMM (position i): hi+1|hi ∼ Discrete(τ hi) vi|hi ∼ Discrete(Ωhi)
4
SLIDE 16
HMM-Crowd
(for task 1 - aggregation)
HMM (position i): hi+1|hi ∼ Discrete(τ hi) vi|hi ∼ Discrete(Ωhi) Crowd model (worker j): lij|hi ∼ Discrete(C(j)
hi )
4
SLIDE 17
HMM-Crowd
(for task 1 - aggregation)
HMM (position i): hi+1|hi ∼ Discrete(τ hi) vi|hi ∼ Discrete(Ωhi) Crowd model (worker j): lij|hi ∼ Discrete(C(j)
hi )
C(j): confusion matrix for j
4
SLIDE 18
HMM-Crowd: Parameter Learning
Expectation Maximization (EM) algorithm:
5
SLIDE 19
HMM-Crowd: Parameter Learning
Expectation Maximization (EM) algorithm: E-step
◮ Estimate posterior p(h) ◮ Extend Forward-Backward algorithm.
5
SLIDE 20
HMM-Crowd: Parameter Learning
Expectation Maximization (EM) algorithm: E-step
◮ Estimate posterior p(h) ◮ Extend Forward-Backward algorithm.
M-step:
◮ Estimate parameters τ, Ω, C ◮ Variational Bayes estimate.
5
SLIDE 21
LSTM for NER
(Lample et al. 2016)
6
SLIDE 22
LSTM for NER
(Lample et al. 2016)
LSTM: word rep. → sent. rep.
6
SLIDE 23
LSTM for NER
(Lample et al. 2016)
LSTM: word rep. → sent. rep. Hidden Layer: fully connected.
6
SLIDE 24
LSTM for NER
(Lample et al. 2016)
LSTM: word rep. → sent. rep. Hidden Layer: fully connected. Tags Scores: ∼ prob. each label for each word.
6
SLIDE 25
LSTM for NER
(Lample et al. 2016)
LSTM: word rep. → sent. rep. Hidden Layer: fully connected. Tags Scores: ∼ prob. each label for each word. CRF: word prediction → sent. prediction.
6
SLIDE 26
LSTM-Crowd
(for task 2 - prediction)
7
SLIDE 27
LSTM-Crowd
(for task 2 - prediction)
◮ vectors represented noise by worker.
7
SLIDE 28
LSTM-Crowd
(for task 2 - prediction)
◮ vectors represented noise by worker. ◮ v(good worker) ≈ 0
7
SLIDE 29
Data
Dataset Application Documents Gold Labels Crowd Labels CoNLL’03 NER 1393 All 400 Medical IE 5000 200 All
8
SLIDE 30 Evaluation: Task 1 - aggregation
Baselines:
◮ Majority Voting ◮ Dawid & Skene (1979) ◮ MACE (Hovy et al. 2013)
9
SLIDE 31 Evaluation: Task 1 - aggregation
Baselines:
◮ Majority Voting ◮ Dawid & Skene (1979) ◮ MACE (Hovy et al. 2013)
◮ CRF-MA (Rodrigues et al. 2014)
9
SLIDE 32
Results: NER task 1 - aggregation
Method F1 Majority Vote 65.71
10
SLIDE 33
Results: NER task 1 - aggregation
Method F1 Majority Vote 65.71 MACE (Hovy et al. 2013) 67.37
10
SLIDE 34
Results: NER task 1 - aggregation
Method F1 Majority Vote 65.71 MACE (Hovy et al. 2013) 67.37 Dawid-Skene (DS) 71.39
10
SLIDE 35
Results: NER task 1 - aggregation
Method F1 Majority Vote 65.71 MACE (Hovy et al. 2013) 67.37 Dawid-Skene (DS) 71.39 CRF-MA (Rodrigues et al. 2014) 62.53
10
SLIDE 36
Results: NER task 1 - aggregation
Method F1 Majority Vote 65.71 MACE (Hovy et al. 2013) 67.37 Dawid-Skene (DS) 71.39 CRF-MA (Rodrigues et al. 2014) 62.53 HMM-Crowd 74.76
10
SLIDE 37 Evaluation: Task 2 - prediction
Baselines:
◮ Majority Vote then CRF ◮ Dawid-Skene then LSTM
11
SLIDE 38 Evaluation: Task 2 - prediction
Baselines:
◮ Majority Vote then CRF ◮ Dawid-Skene then LSTM
- 2. Train directly on crowd labels:
◮ CRF-MA (Rodrigues et al. 2014) ◮ LSTM (original, Lample et al. 2016)
11
SLIDE 39
Results: NER task 2 - prediction
Method F1 Majority Vote then CRF 58.20
12
SLIDE 40
Results: NER task 2 - prediction
Method F1 Majority Vote then CRF 58.20 CRF-MA (Rodrigues et al. 2014) 62.60
12
SLIDE 41
Results: NER task 2 - prediction
Method F1 Majority Vote then CRF 58.20 CRF-MA (Rodrigues et al. 2014) 62.60 LSTM (Lample et al. 2016) 67.73
12
SLIDE 42
Results: NER task 2 - prediction
Method F1 Majority Vote then CRF 58.20 CRF-MA (Rodrigues et al. 2014) 62.60 LSTM (Lample et al. 2016) 67.73 Dawid-Skene then LSTM 66.27
12
SLIDE 43
Results: NER task 2 - prediction
Method F1 Majority Vote then CRF 58.20 CRF-MA (Rodrigues et al. 2014) 62.60 LSTM (Lample et al. 2016) 67.73 Dawid-Skene then LSTM 66.27 LSTM-Crowd 70.82
12
SLIDE 44
Results: NER task 2 - prediction
Method F1 Majority Vote then CRF 58.20 CRF-MA (Rodrigues et al. 2014) 62.60 LSTM (Lample et al. 2016) 67.73 Dawid-Skene then LSTM 66.27 LSTM-Crowd 70.82 HMM-Crowd then LSTM 70.87
12
SLIDE 45
Results: NER task 2 - prediction
Method F1 Majority Vote then CRF 58.20 CRF-MA (Rodrigues et al. 2014) 62.60 LSTM (Lample et al. 2016) 67.73 Dawid-Skene then LSTM 66.27 LSTM-Crowd 70.82 HMM-Crowd then LSTM 70.87 LSTM on Gold Labels (upper-bound) 84.22
12
SLIDE 46
Conclusion
◮ Joint models of sequences and crowd labels.
13
SLIDE 47
Conclusion
◮ Joint models of sequences and crowd labels. ◮ HMMs good for aggregation, ... ◮ ... LSTMs good for prediction.
13
SLIDE 48
Conclusion
◮ Joint models of sequences and crowd labels. ◮ HMMs good for aggregation, ... ◮ ... LSTMs good for prediction.
Paper:
◮ Alternative LSTM-Crowd model. ◮ Results for Biomedical IE.
13
SLIDE 49
Conclusion
◮ Joint models of sequences and crowd labels. ◮ HMMs good for aggregation, ... ◮ ... LSTMs good for prediction.
Paper:
◮ Alternative LSTM-Crowd model. ◮ Results for Biomedical IE.
13
SLIDE 50
Conclusion
◮ Joint models of sequences and crowd labels. ◮ HMMs good for aggregation, ... ◮ ... LSTMs good for prediction.
Paper:
◮ Alternative LSTM-Crowd model. ◮ Results for Biomedical IE.
Acknowledgment: Reviewers, Workers, NSF & NIH.
13
SLIDE 51
Conclusion
◮ Joint models of sequences and crowd labels. ◮ HMMs good for aggregation, ... ◮ ... LSTMs good for prediction.
Paper:
◮ Alternative LSTM-Crowd model. ◮ Results for Biomedical IE.
Acknowledgment: Reviewers, Workers, NSF & NIH. Questions?
13