Strong Baselines for Neural Semi-supervised Learning under Domain - PowerPoint PPT Presentation

Multi-task   Tri-training Multi-task tri-training 1. Train one model with 3 objective functions. 2. Use predictions on unlabeled data for third if two agree. y = 1 y = 1 x 1 11

Multi-task   Tri-training Multi-task tri-training 1. Train one model with 3 objective functions. 2. Use predictions on unlabeled data for third if two agree. 3. Restrict final layers to   y = 1 y = 1 use different   x representations. 1 11

Multi-task   Tri-training Multi-task tri-training 1. Train one model with 3 objective functions. 2. Use predictions on unlabeled data for third if two agree. 3. Restrict final layers to   y = 1 y = 1 use different   x representations. 4. Train third objective   1 function only on   pseudo labeled to   bridge domain shift. 11

Multi-task   Tri-training 12

Multi-task   Tri-training BiLSTM (Plank et al., 2016) 12

Multi-task   Tri-training BiLSTM (Plank et al., 2016) char w 2 BiLSTM 12

Multi-task   Tri-training BiLSTM BiLSTM (Plank et al., 2016) char char w 1 w 2 BiLSTM BiLSTM 12

Multi-task   Tri-training BiLSTM BiLSTM BiLSTM (Plank et al., 2016) char char char w 1 w 2 w 3 BiLSTM BiLSTM BiLSTM 12

Multi-task   Tri-training m 1 BiLSTM BiLSTM BiLSTM (Plank et al., 2016) char char char w 1 w 2 w 3 BiLSTM BiLSTM BiLSTM 12

Multi-task   Tri-training m 2 m 1 BiLSTM BiLSTM BiLSTM (Plank et al., 2016) char char char w 1 w 2 w 3 BiLSTM BiLSTM BiLSTM 12

Multi-task   Tri-training m 2 m 3 m 1 BiLSTM BiLSTM BiLSTM (Plank et al., 2016) char char char w 1 w 2 w 3 BiLSTM BiLSTM BiLSTM 12

Multi-task   Tri-training m 2 m 3 m 1 m 2 m 3 m 1 BiLSTM BiLSTM BiLSTM (Plank et al., 2016) char char char w 1 w 2 w 3 BiLSTM BiLSTM BiLSTM 12

Multi-task   Tri-training m 2 m 3 m 2 m 3 m 1 m 2 m 3 m 1 m 1 BiLSTM BiLSTM BiLSTM (Plank et al., 2016) char char char w 1 w 2 w 3 BiLSTM BiLSTM BiLSTM 12

Multi-task   Tri-training L orth = ∥ W ⊤ m 1 W m 2 ∥ 2 orthogonality constraint (Bousmalis et al., 2016) F m 2 m 3 m 2 m 3 m 1 m 2 m 3 m 1 m 1 BiLSTM BiLSTM BiLSTM (Plank et al., 2016) char char char w 1 w 2 w 3 BiLSTM BiLSTM BiLSTM 12

⃗ Multi-task   Tri-training L orth = ∥ W ⊤ m 1 W m 2 ∥ 2 orthogonality constraint (Bousmalis et al., 2016) F m 2 m 3 m 2 m 3 m 1 m 2 m 3 m 1 m 1 BiLSTM BiLSTM BiLSTM (Plank et al., 2016) char char char w 1 w 2 w 3 BiLSTM BiLSTM BiLSTM L ( θ ) = − ∑ i ∑ log P m i ( y | h ) + γ L orth Loss: 1,.., n 12

Data & Tasks 13

Data & Tasks Two tasks: Domains: 13

Data & Tasks Two tasks: Domains: Sentiment analysis on Amazon reviews dataset (Blitzer et al, 2006) 13

Data & Tasks Two tasks: Domains: Sentiment analysis on Amazon reviews dataset (Blitzer et al, 2006) POS tagging on SANCL 2012 dataset (Petrov and McDonald, 2012) 13

Sentiment Analysis Results 82 80.25 Accuracy 78.5 76.75 75 Avg over 4 target domains VFAE* DANN* Asym* Source only Self-training Tri-training Tri-training-Disagr. MT-Tri * result from Saito et al., (2017) 14

Sentiment Analysis Results 82 80.25 Accuracy 78.5 76.75 75 Avg over 4 target domains VFAE* DANN* Asym* Source only Self-training Tri-training Tri-training-Disagr. MT-Tri * result from Saito et al., (2017) ‣ Multi-task tri-training slightly outperforms tri-training, but has higher variance. 14

POS Tagging Results Trained on 10% labeled data (WSJ) 89.8 89.525 Accuracy 89.25 88.975 88.7 Avg over 5 target domains Source (+embeds) Self-training Tri-training Tri-training-Disagr. MT-Tri 15

POS Tagging Results Trained on 10% labeled data (WSJ) 89.8 89.525 Accuracy 89.25 88.975 88.7 Avg over 5 target domains Source (+embeds) Self-training Tri-training Tri-training-Disagr. MT-Tri ‣ Tri-training with disagreement works best with little data. 15

POS Tagging Results Trained on full labeled data (WSJ) 92 91.25 Accuracy 90.5 89.75 89 Avg over 5 target domains TnT Stanford* Source (+embeds) Tri-training Tri-training-Disagr. MT-Tri * result from Schnabel & Schütze (2014) 16

POS Tagging Results Trained on full labeled data (WSJ) 92 91.25 Accuracy 90.5 89.75 89 Avg over 5 target domains TnT Stanford* Source (+embeds) Tri-training Tri-training-Disagr. MT-Tri * result from Schnabel & Schütze (2014) ‣ Tri-training works best in the full data setting. 16

POS Tagging Analysis Accuracy on out-of-vocabulary (OOV) tokens 11 80 Accuracy on OOV tokens 8.25 72.5 % OOV tokens 5.5 65 2.75 57.5 0 50 Answers Emails Newsgroups Reviews Weblogs OOV tokens Src Tri MT-Tri 17

POS Tagging Analysis Accuracy on out-of-vocabulary (OOV) tokens 11 80 Accuracy on OOV tokens 8.25 72.5 % OOV tokens 5.5 65 2.75 57.5 0 50 Answers Emails Newsgroups Reviews Weblogs OOV tokens Src Tri MT-Tri ‣ Classic tri-training works best on OOV tokens. 17

POS Tagging Analysis Accuracy on out-of-vocabulary (OOV) tokens 11 80 Accuracy on OOV tokens 8.25 72.5 % OOV tokens 5.5 65 2.75 57.5 0 50 Answers Emails Newsgroups Reviews Weblogs OOV tokens Src Tri MT-Tri ‣ Classic tri-training works best on OOV tokens. ‣ MT-Tri does worse than source-only baseline on OOV. 17

POS Tagging Analysis POS accuracy per binned log frequency 0.018 Accuracy delta vs. src-only baseline 0.014 0.009 0.005 0 -0.005 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Binned frequency MT-Tri Tri 18

Strong Baselines for Neural Semi-supervised Learning under Domain - PowerPoint PPT Presentation

Strong Baselines for Neural Semi-supervised Learning under Domain Shift Sebastian Ruder Barbara Plank Learning under Domain Shift 2 Learning under Domain Shift State-of-the-art domain adaptation approaches 2 Learning under Domain Shift

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: Semi-Supervised Learning.

BASELINES VPUU and CeaseFire BASELINES VPUU in Hanover Park At this point in Time 1: What are the

Support Vector Machines (SVMs). Semi-Supervised Learning. Semi-Supervised SVMs.

CS330 Paper Presentation: October 16th, 2019 Supervised Classification Semi-Supervised

Semi-Supervised Kernel Mean Shift Clustering A Semi-Supervised Clustering Approach Motivation:

Semi-Supervised Local Fisher Semi-Supervised Local Fisher Discriminant Analysis Discriminant

Iterative Hybrid Algorithm for Semi-supervised Classification Martin SAVESKI Supervised by

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

5 Semi-Supervised Learning BVM Tutorial: Advanced Deep Learning Methods David Zimmerer, Division

Semi-Supervised Learning Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

Supervised Learning in Neural Networks Keith L. Downing The Norwegian University of Science and

10701 Semi supervised learning Can Unlabeled Data improve supervised learning? Important

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

HB 2017 Transit Advisory Committee August 24, 2018 Meeting Agenda Agenda & Public Comment

Big Changes Afoot: What Will the U3lity Business Look Like

February Workshop Introductory Presentation Monday 8 th February 2016 Wendy Morris Ecologically

Growing Green Recyclean Program Introduction Two-Phase Project Phase One Phase Two

Emergency Generator Power Super Storm Sandy Review Purchasing a Generator

State of South Carolina Department of Alcohol and Other Drug Abuse Services Partnerships for

Resource Plan July 10, 2020 Welcome 2021 El Paso Electric Company Integrated Resource Plan

Greening The Curriculum at New Mexico State University David M. Boje New Mexico State University

Strong Baselines for Neural Semi-supervised Learning under Domain - PowerPoint PPT Presentation

Strong Baselines for Neural Semi-supervised Learning under Domain Shift Sebastian Ruder Barbara Plank Learning under Domain Shift 2 Learning under Domain Shift State-of-the-art domain adaptation approaches 2 Learning under Domain Shift

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: Semi-Supervised Learning.

BASELINES VPUU and CeaseFire BASELINES VPUU in Hanover Park At this point in Time 1: What are the

Support Vector Machines (SVMs). Semi-Supervised Learning. Semi-Supervised SVMs.

CS330 Paper Presentation: October 16th, 2019 Supervised Classification Semi-Supervised

Semi-Supervised Kernel Mean Shift Clustering A Semi-Supervised Clustering Approach Motivation:

Semi-Supervised Local Fisher Semi-Supervised Local Fisher Discriminant Analysis Discriminant

Iterative Hybrid Algorithm for Semi-supervised Classification Martin SAVESKI Supervised by

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

5 Semi-Supervised Learning BVM Tutorial: Advanced Deep Learning Methods David Zimmerer, Division

Semi-Supervised Learning Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

Supervised Learning in Neural Networks Keith L. Downing The Norwegian University of Science and

10701 Semi supervised learning Can Unlabeled Data improve supervised learning? Important

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

HB 2017 Transit Advisory Committee August 24, 2018 Meeting Agenda Agenda &amp; Public Comment

Big Changes Afoot: What Will the U3lity Business Look Like

February Workshop Introductory Presentation Monday 8 th February 2016 Wendy Morris Ecologically

Growing Green Recyclean Program Introduction Two-Phase Project Phase One Phase Two

Emergency Generator Power Super Storm Sandy Review Purchasing a Generator

State of South Carolina Department of Alcohol and Other Drug Abuse Services Partnerships for

Resource Plan July 10, 2020 Welcome 2021 El Paso Electric Company Integrated Resource Plan

Greening The Curriculum at New Mexico State University David M. Boje New Mexico State University

HB 2017 Transit Advisory Committee August 24, 2018 Meeting Agenda Agenda & Public Comment