Strong Baselines for Neural Semi-supervised Learning under Domain Shift
Sebastian Ruder Barbara Plank
Strong Baselines for Neural Semi-supervised Learning under Domain - - PowerPoint PPT Presentation
Strong Baselines for Neural Semi-supervised Learning under Domain Shift Sebastian Ruder Barbara Plank Learning under Domain Shift 2 Learning under Domain Shift State-of-the-art domain adaptation approaches 2 Learning under Domain Shift
Sebastian Ruder Barbara Plank
2
2
2
benchmark
2
benchmark
2
benchmark
extensive semi-supervised learning (SSL) literature
2
3
3
3
data?
3
5
5
as training examples. Repeat.
5
as training examples. Repeat.
5
as training examples. Repeat.
5
6
6
calibrated.
6
calibrated.
confidence unlabeled examples works best.
6
calibrated.
confidence unlabeled examples works best.
6
calibrated.
confidence unlabeled examples works best.
unlabeled data works best.
6
7
Tri-training
7
Tri-training
7
Tri-training
7
x
Tri-training
7
x
Tri-training
7
x
Tri-training
7
x
Tri-training
8
Tri-training
Tri-training
8
Tri-training
Tri-training
x
8
Tri-training
Tri-training
x
8
Tri-training
Tri-training
x
8
Tri-training
Tri-training
x
8
Tri-training
Tri-training
x
9
Tri-training with disagreement
9
Tri-training with disagreement
9
Tri-training with disagreement
and prediction differs.
9
Tri-training with disagreement
and prediction differs.
x
9
Tri-training with disagreement
and prediction differs.
x
9
Tri-training with disagreement
and prediction differs.
x
9
Tri-training with disagreement
and prediction differs.
x
9
Tri-training with disagreement
and prediction differs.
x
9
Tri-training with disagreement
and prediction differs.
x
10
10
expensive
10
expensive
10
expensive
10
expensive
10
11
Multi-task tri-training
11
Multi-task tri-training
11
Multi-task tri-training
11
x
Multi-task tri-training
11
x
Multi-task tri-training
11
x
Multi-task tri-training
11
x
Multi-task tri-training
11
x
Multi-task tri-training
use different representations.
11
x
Multi-task tri-training
use different representations.
function only on pseudo labeled to bridge domain shift.
12
12
BiLSTM
(Plank et al., 2016)
12
BiLSTM w2
char BiLSTM
(Plank et al., 2016)
12
BiLSTM w2
char BiLSTM
BiLSTM w1
char BiLSTM
(Plank et al., 2016)
12
BiLSTM w2
char BiLSTM
BiLSTM w1
char BiLSTM
BiLSTM w3
char BiLSTM
(Plank et al., 2016)
12
BiLSTM w2
char BiLSTM
BiLSTM w1
char BiLSTM
BiLSTM w3
char BiLSTM
m1
(Plank et al., 2016)
12
BiLSTM w2
char BiLSTM
BiLSTM w1
char BiLSTM
BiLSTM w3
char BiLSTM
m1 m2
(Plank et al., 2016)
12
BiLSTM w2
char BiLSTM
BiLSTM w1
char BiLSTM
BiLSTM w3
char BiLSTM
m1 m2 m3
(Plank et al., 2016)
12
BiLSTM w2
char BiLSTM
BiLSTM w1
char BiLSTM
BiLSTM w3
char BiLSTM
m1 m2 m3 m1 m2 m3
(Plank et al., 2016)
12
BiLSTM w2
char BiLSTM
BiLSTM w1
char BiLSTM
BiLSTM w3
char BiLSTM
m1 m2 m3 m1 m2 m3 m1 m2 m3
(Plank et al., 2016)
12
BiLSTM w2
char BiLSTM
BiLSTM w1
char BiLSTM
BiLSTM w3
char BiLSTM
m1 m2 m3 m1 m2 m3 m1 m2 m3
m1Wm2∥2 F
(Plank et al., 2016)
12
BiLSTM w2
char BiLSTM
BiLSTM w1
char BiLSTM
BiLSTM w3
char BiLSTM
m1 m2 m3 m1 m2 m3 m1 m2 m3
m1Wm2∥2 F
L(θ) = − ∑
i ∑ 1,..,n
log Pmi(y| ⃗ h ) + γLorth
Loss: (Plank et al., 2016)
13
13
13
Sentiment analysis on Amazon reviews dataset (Blitzer et al, 2006)
13
Sentiment analysis on Amazon reviews dataset (Blitzer et al, 2006) POS tagging on SANCL 2012 dataset (Petrov and McDonald, 2012)
Accuracy 75 76.75 78.5 80.25 82 Avg over 4 target domains
VFAE* DANN* Asym* Source only Self-training Tri-training Tri-training-Disagr. MT-Tri
* result from Saito et al., (2017) 14
Accuracy 75 76.75 78.5 80.25 82 Avg over 4 target domains
VFAE* DANN* Asym* Source only Self-training Tri-training Tri-training-Disagr. MT-Tri
* result from Saito et al., (2017) 14
Accuracy 75 76.75 78.5 80.25 82 Avg over 4 target domains
VFAE* DANN* Asym* Source only Self-training Tri-training Tri-training-Disagr. MT-Tri
* result from Saito et al., (2017) 14
Accuracy 75 76.75 78.5 80.25 82 Avg over 4 target domains
VFAE* DANN* Asym* Source only Self-training Tri-training Tri-training-Disagr. MT-Tri
* result from Saito et al., (2017) 14
Accuracy 75 76.75 78.5 80.25 82 Avg over 4 target domains
VFAE* DANN* Asym* Source only Self-training Tri-training Tri-training-Disagr. MT-Tri
* result from Saito et al., (2017) 14
has higher variance.
15
Trained on 10% labeled data (WSJ)
Accuracy 88.7 88.975 89.25 89.525 89.8 Avg over 5 target domains
Source (+embeds) Self-training Tri-training Tri-training-Disagr. MT-Tri
15
Trained on 10% labeled data (WSJ)
Accuracy 88.7 88.975 89.25 89.525 89.8 Avg over 5 target domains
Source (+embeds) Self-training Tri-training Tri-training-Disagr. MT-Tri
15
Trained on 10% labeled data (WSJ)
Accuracy 88.7 88.975 89.25 89.525 89.8 Avg over 5 target domains
Source (+embeds) Self-training Tri-training Tri-training-Disagr. MT-Tri
15
Trained on 10% labeled data (WSJ)
Accuracy 88.7 88.975 89.25 89.525 89.8 Avg over 5 target domains
Source (+embeds) Self-training Tri-training Tri-training-Disagr. MT-Tri
15
Trained on 10% labeled data (WSJ)
Accuracy 88.7 88.975 89.25 89.525 89.8 Avg over 5 target domains
Source (+embeds) Self-training Tri-training Tri-training-Disagr. MT-Tri
16
* result from Schnabel & Schütze (2014)
Trained on full labeled data (WSJ)
Accuracy 89 89.75 90.5 91.25 92 Avg over 5 target domains
TnT Stanford* Source (+embeds) Tri-training Tri-training-Disagr. MT-Tri
16
* result from Schnabel & Schütze (2014)
Trained on full labeled data (WSJ)
Accuracy 89 89.75 90.5 91.25 92 Avg over 5 target domains
TnT Stanford* Source (+embeds) Tri-training Tri-training-Disagr. MT-Tri
16
* result from Schnabel & Schütze (2014)
Trained on full labeled data (WSJ)
Accuracy 89 89.75 90.5 91.25 92 Avg over 5 target domains
TnT Stanford* Source (+embeds) Tri-training Tri-training-Disagr. MT-Tri
16
* result from Schnabel & Schütze (2014)
Trained on full labeled data (WSJ)
Accuracy 89 89.75 90.5 91.25 92 Avg over 5 target domains
TnT Stanford* Source (+embeds) Tri-training Tri-training-Disagr. MT-Tri
17
Accuracy on out-of-vocabulary (OOV) tokens
Accuracy on OOV tokens 50 57.5 65 72.5 80 % OOV tokens 2.75 5.5 8.25 11 Answers Emails Newsgroups Reviews Weblogs
OOV tokens Src Tri MT-Tri
17
Accuracy on out-of-vocabulary (OOV) tokens
Accuracy on OOV tokens 50 57.5 65 72.5 80 % OOV tokens 2.75 5.5 8.25 11 Answers Emails Newsgroups Reviews Weblogs
OOV tokens Src Tri MT-Tri
17
Accuracy on out-of-vocabulary (OOV) tokens
Accuracy on OOV tokens 50 57.5 65 72.5 80 % OOV tokens 2.75 5.5 8.25 11 Answers Emails Newsgroups Reviews Weblogs
OOV tokens Src Tri MT-Tri
17
Accuracy on out-of-vocabulary (OOV) tokens
Accuracy on OOV tokens 50 57.5 65 72.5 80 % OOV tokens 2.75 5.5 8.25 11 Answers Emails Newsgroups Reviews Weblogs
OOV tokens Src Tri MT-Tri
17
Accuracy on out-of-vocabulary (OOV) tokens
Accuracy on OOV tokens 50 57.5 65 72.5 80 % OOV tokens 2.75 5.5 8.25 11 Answers Emails Newsgroups Reviews Weblogs
OOV tokens Src Tri MT-Tri
18
POS accuracy per binned log frequency
Accuracy delta vs. src-only baseline
0.005 0.009 0.014 0.018 Binned frequency 1 2 3 4 5 6 7 8 9 10 11 12 13 14
MT-Tri Tri
18
POS accuracy per binned log frequency
Accuracy delta vs. src-only baseline
0.005 0.009 0.014 0.018 Binned frequency 1 2 3 4 5 6 7 8 9 10 11 12 13 14
MT-Tri Tri
18
POS accuracy per binned log frequency
Accuracy delta vs. src-only baseline
0.005 0.009 0.014 0.018 Binned frequency 1 2 3 4 5 6 7 8 9 10 11 12 13 14
MT-Tri Tri
bins).
19
Accuracy on unknown word-tag (UWT) tokens
Accuracy on UWT tokens 8 12.5 17 21.5 26 % UWT tokens 1 2 3 4 Answers Emails Newsgroups Reviews Weblogs
UWT rate Src Tri MT-Tri FLORS*
* result from Schnabel & Schütze (2014)
19
Accuracy on unknown word-tag (UWT) tokens
Accuracy on UWT tokens 8 12.5 17 21.5 26 % UWT tokens 1 2 3 4 Answers Emails Newsgroups Reviews Weblogs
UWT rate Src Tri MT-Tri FLORS*
v e r y d i f fi c u l t c a s e s
* result from Schnabel & Schütze (2014)
19
Accuracy on unknown word-tag (UWT) tokens
Accuracy on UWT tokens 8 12.5 17 21.5 26 % UWT tokens 1 2 3 4 Answers Emails Newsgroups Reviews Weblogs
UWT rate Src Tri MT-Tri FLORS*
v e r y d i f fi c u l t c a s e s
* result from Schnabel & Schütze (2014)
19
Accuracy on unknown word-tag (UWT) tokens
Accuracy on UWT tokens 8 12.5 17 21.5 26 % UWT tokens 1 2 3 4 Answers Emails Newsgroups Reviews Weblogs
UWT rate Src Tri MT-Tri FLORS*
v e r y d i f fi c u l t c a s e s
* result from Schnabel & Schütze (2014)
19
Accuracy on unknown word-tag (UWT) tokens
Accuracy on UWT tokens 8 12.5 17 21.5 26 % UWT tokens 1 2 3 4 Answers Emails Newsgroups Reviews Weblogs
UWT rate Src Tri MT-Tri FLORS*
tag combinations.
v e r y d i f fi c u l t c a s e s
* result from Schnabel & Schütze (2014)
19
Accuracy on unknown word-tag (UWT) tokens
Accuracy on UWT tokens 8 12.5 17 21.5 26 % UWT tokens 1 2 3 4 Answers Emails Newsgroups Reviews Weblogs
UWT rate Src Tri MT-Tri FLORS*
tag combinations.
v e r y d i f fi c u l t c a s e s
* result from Schnabel & Schütze (2014)
20
Tri-training
state-of-the-art methods for sentiment analysis.
20
Tri-training
state-of-the-art methods for sentiment analysis.
time complexity) via the proposed MT-Tri model
20
Tri-training
state-of-the-art methods for sentiment analysis.
time complexity) via the proposed MT-Tri model
baselines)
20
Tri-training