What is domain adaptation? - PDF document

✠ A Two-Stage Approach to Domain Adaptation for Statistical Classifiers Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign What is domain adaptation? �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ 1

✁ � Example: named entity recognition persons, locations, organizations, etc. train test (labeled) (unlabeled) standard NER supervised learning 85.5% Classifier New York Times New York Times �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ Example: named entity recognition persons, locations, organizations, etc. train test (labeled) (unlabeled) non-standard NER (realistic) setting 64.1% Classifier labeled data not available New York Times Reuters New York Times �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ 2

✂ � ✁ Domain difference performance drop train test ideal setting NER 85.5% NYT NYT Classifier New York Times New York Times realistic setting NER 64.1% NYT Reuters Classifier Reuters New York Times �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ Another NER example train test ideal setting gene 54.1% name recognizer mouse mouse realistic setting gene 28.1% name recognizer fly mouse �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ 3

✁ ✂ � ✂ ✄ ✁ � ✂ ✁ � ✆ � Other examples Spam filtering: Public email collection personal inboxes Sentiment analysis of product reviews Digital cameras cell phones Movies books Can we do better than standard supervised learning? Domain adaptation: to design learning methods that are aware of the training and test domain difference. �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ How do we solve the problem in general? �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ 4

� Observation 1 domain-specific features wingless daughterless eyeless apexless … �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ Observation 1 domain-specific features • describing phenotype wingless daughterless • in fly gene nomenclature eyeless • feature “-less” weighted high apexless … CD38 feature still PABPC5 useful for other … organisms? No! �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✁ ✡☛ 5

✻ ✁ ✁ ✁ ✁ ✁ ✻ ● ✁ ✁ ● ✁ ✁ ✛ ✁ Observation 2 generalizable features ✻❂❁✤✖❃✻ ❄❆❅❈❇❊❉ ✰✵✭❋✦❃★✿✪✘✥✧✦✟✭✯✭✮✦✚✙ ✂☎✄✝✆✟✞✡✠✝✄☞☛✍✌✎✞✡✠☎✏✑✄✓✒✕✔✑✆ ●✼❍■● ✴❏✻❂❁■✗✤✦✿✸✘✥✧✴✝✗✤✭❑✖✿✗✤✙▲✶✓✳▼✰◆✖✘✳ ✔✜☛✓✒✕✏✑✄✣✢✤✢ ✖✘✗✚✙ ✖✍✥✧✦ ❖P✦✍✳✲✳✵✭ ✻❂❁✟✖✼✻ ◗✝❘❚❙✕◗❊❄❱❯ ✰✵✭ ✦✩★✫✪✘✥✬✦✟✭✮✭✯✦✟✙✱✰✲✗ ✦✩★✫✪✘✥✬✦✟✭✮✭✯✦✟✙✱✰✲✗❳❲❨✦❃✻❩✖✘✳ ✥✧✖✍✰✲✗ ✖✘✗✚✖✘✳✵✴✘✶✷✴✹✸✩✭✺✪✟✖✼✻✽✻✾✦✿✥❀✗✩✭ ✖✘✗✚✙✱✰❬✗❭✖❪✥✽✖✘✗✩✶✘✦❭✴✚❲❫✖✤✙✣✸✘✳ in each ✻❴✰✵✭✯✭❵✸✤✦✟✭✩❛ �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ Observation 2 generalizable features ✄✍❞❫✠☞❡♥✄❣✢✤✢❃✄✝✂ ✻❂❁✤✖❃✻ ✐✡❥❧❦✟♠ ✰✵✭ ✙✘✦✟❖P✖✍✪✩✦✍✗✼✻❩✖✘✪✿✳✵✦✟✶✝✰✵❖ ●✼❍■● ✴✤✻❂❁▲✗✤✦✿✸✘✥✧✴✝✗✤✭❑✖✿✗✤✙■✶✝✳✲✰♦✖✘✳ ✖✘✗✚✙ ❜❝✰❬✗✤✶✝✳✵✦✚✭✮✭ ✖✘✥✧✦ ❖P✦✿✳▼✳✵✭ ✻❂❁✟✖✼✻ ♣✼qsr✝♣✤✐☎t ✰✵✭ ✄✍❞❫✠☎❡❢✄❣✢✤✢❏✄✓✂ ✰❤✗ ✄✍❞❫✠☎❡❢✄❣✢✤✢❏✄✓✂ ✰❤✗✺❲❨✦❏✻✉✖✍✳ ✥✽✖✘✰❬✗ ✖✘✗✚✖✘✳✵✴✘✶✷✴✹✸✩✭✺✪✟✖✼✻✽✻✾✦✿✥❀✗✩✭ ✖✘✗✟✙✱✰❤✗✈✖✇✥❨✖✘✗✤✶✘✦❭✴✚❲❫✖✤✙✣✸✘✳ in each ✻❴✰✵✭✮✭✼✸✤✦✟✭❃❛ feature “X be expressed” �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✁ ✑✠ 6

✁ ✁ � ✁ General idea: two-stage approach domain-specific features Source Target Domain Domain generalizable features features �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ Goal Source Target Domain Domain features �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ 7

✁ � ✁ ✂ Regular classification Source Target Domain Domain features �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ Generalization: to emphasize generalizable features in the trained model Source Target Domain Domain features Stage 1 �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ 8

✄ ✁ Adaptation: to pick up domain-specific features for the target domain Source Target Domain Domain features Stage 2 �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✁ ✡✆ Regular semi-supervised learning Source Target Domain Domain features �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ 9

✠ � ✁ ✁ � � ✁ � ☛ ✁ Comparison with related work We explicitly model generalizable features. Previous work models it implicitly [Blitzer et al. 2006, Ben- David et al. 2007, Daumé III 2007]. We do not need labeled target data but we need multiple source (training) domains. Some work requires labeled target data [Daumé III 2007]. We have a 2 nd stage of adaptation, which uses semi-supervised learning. Previous work does not incorporate semi-supervised learning [Blitzer et al. 2006, Ben-David et al. 2007, Daumé III 2007]. �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ Implementation of the two- stage approach with logistic regression classifiers �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ 10

✗ ✘ ✖ ✠ ✁ ✖ ✖ Logistic regression classifiers 0.2 0 -less 4.5 1 w x T 5 0 exp( ) y x w = p y ( | , ) -0.3 0 w x T exp( ' ) 3.0 1 ∑ y p binary features y : : ' : : X be expressed 2.1 0 -0.9 1 … and wingless are 0.4 0 expressed in… T x w y w y x �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ Learning a logistic regression classifier 0.2 0 regularization term 4.5 1 ( 5 0 w w 2 = λ ˆ arg min -0.3 0 w 3.0 1 penalize large weights : : w x T N exp( ) 1 : : − y log control model complexity ∑ w T x N 2.1 0 exp( ) i = 1 ∑ y ' -0.9 1 y ' 0.4 0 T x log likelihood of w y training data �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✠✕✠ 11

Generalizable features in weight vectors D 1 D 2 D K K source domains 0.2 3.2 0.1 4.5 0.5 0.7 5 4.5 4.2 domain-specific features -0.3 -0.1 0.1 3.0 3.5 3.2 : : : … generalizable : : : features 2.1 0.1 1.7 -0.9 -1.0 0.1 0.4 -0.2 0.3 w K w 1 w 2 �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✠ ✁� We want to decompose w in this way h non-zero entries for h 0.2 0 0.2 4.5 0 4.5 generalizable 5 4.6 0.4 features -0.3 0 -0.3 = + 3.0 3.2 -0.2 : : : : : : 2.1 0 2.1 -0.9 0 -0.9 0.4 0 0.4 �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✠ ✁✂ 12

Feature selection matrix A matrix A selects h generalizable features 0 1 0 0 1 0 0 … 0 0 0 0 0 0 0 1 … 0 1 0 h : : 1 : : : 0 0 0 0 0 … 1 : 0 0 z = A x A 1 0 x �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✠ ✁� Decomposition of w weights for domain-specific features weights for generalizable features 0.2 0 0.2 0 4.5 1 4.5 1 4.6 0 5 0 0.4 0 -0.3 0 3.2 1 -0.3 0 + = : : 3.0 1 -0.2 1 : : : : : : 3.6 0 : : : : 2.1 0 2.1 0 -0.9 1 -0.9 1 0.4 0 0.4 0 w T x v T z + u T x = �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✠ ✁✂ 13

What is domain adaptation? - PDF document

A Two-Stage Approach to Domain Adaptation for Statistical Classifiers Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign What is domain adaptation?

Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Minema Minema

Adaptation Philipp Koehn 27 October 2020 Philipp Koehn Machine Translation: Adaptation 27

Robust Causal Domain Adaptation in a Simple Diagnostic Setting Thijs van Ommen Ghent, July 4,

discrepancy for unsupervised domain adaptation Hongliang Yan 2017/06/21 Domain Adaptation DA

Few-shot Domain Adaptation 1/12 by Causal Mechanism Transfer Domain adaptation Causal mechanism

Coastal Adaptation Kellie Fisher FCERM Senior Advisor Why Adaptation? Adaptation to a

Domain Adaptation with Asymmetrically Relaxed Distribution Alignment Yifan Wu , Ezra Winston,

Adaptation Techniques for Acoustic Adaptation Techniques for Acoustic Adaptation Techniques for

Strong Baselines for Neural Semi-supervised Learning under Domain Shift Sebastian Ruder Barbara

Web Hosting and Domain Names Introduction to Web Design Web Hosting and Domain Names

Focusing the Core Domain Model A Domain-Driven Design Case Study, Eric Evans, Domain Language

Image Processing A case study for a domain decomposed MPI code Domain Decomposition 1

Unsupervised Clustering Approaches for Domain Adaptation in Speaker Recognition Systems Stephen

Lightweight Unsupervised Domain Adaptation by Convolutional Filter Reconstruction Rahaf Aljundi,

Towards Assumption-free Unsupervised Domain Adaptation for Visual recognition

Theoretical Analysis of Domain Adaptation Current state of the art Shai Ben-David September 14,

Domain Name System (DNS) Learning Goal Foundations of DNS Security in DNS: Integrity

CompSci 356: Computer Network Architectures Lecture 23: Domain Name System (DNS) and Content

Dictionaries, Manifolds and Domain Adaptation for Image and Video- based Recognition Rama

Supporting Multi-domain Use Cases with ALTO Danny A. Lachos * Christian E. Rothenberg * Qiao Xiang

Module 18: Protection Goals of Protection Domain of Protection Access Matrix

A Practical Approach for Taking Down Avalanche Botnets Under Real-World Constraints Victor Le

A Fully Abstract Domain Model for the -Calculus Ian Stark BRICS Department of Computer

OLAP over Imprecise Data with Domain Constraints Doug Burdick University of Wisconsin