what is domain adaptation
play

What is domain adaptation? - PDF document

A Two-Stage Approach to Domain Adaptation for Statistical Classifiers Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign What is domain adaptation?


  1. ✠ A Two-Stage Approach to Domain Adaptation for Statistical Classifiers Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign What is domain adaptation? �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ 1

  2. ✁ � Example: named entity recognition persons, locations, organizations, etc. train test (labeled) (unlabeled) standard NER supervised learning 85.5% Classifier New York Times New York Times �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ Example: named entity recognition persons, locations, organizations, etc. train test (labeled) (unlabeled) non-standard NER (realistic) setting 64.1% Classifier labeled data not available New York Times Reuters New York Times �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ 2

  3. ✂ � ✁ Domain difference performance drop train test ideal setting NER 85.5% NYT NYT Classifier New York Times New York Times realistic setting NER 64.1% NYT Reuters Classifier Reuters New York Times �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ Another NER example train test ideal setting gene 54.1% name recognizer mouse mouse realistic setting gene 28.1% name recognizer fly mouse �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ 3

  4. ✁ ✂ � ✂ ✄ ✁ � ✂ ✁ � ✆ � Other examples Spam filtering: Public email collection personal inboxes Sentiment analysis of product reviews Digital cameras cell phones Movies books Can we do better than standard supervised learning? Domain adaptation: to design learning methods that are aware of the training and test domain difference. �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ How do we solve the problem in general? �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ 4

  5. � Observation 1 domain-specific features wingless daughterless eyeless apexless … �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ Observation 1 domain-specific features • describing phenotype wingless daughterless • in fly gene nomenclature eyeless • feature “-less” weighted high apexless … CD38 feature still PABPC5 useful for other … organisms? No! �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✁ ✡☛ 5

  6. ✻ ✁ ✁ ✁ ✁ ✁ ✻ ● ✁ ✁ ● ✁ ✁ ✛ ✁ Observation 2 generalizable features ✻❂❁✤✖❃✻ ❄❆❅❈❇❊❉ ✰✵✭❋✦❃★✿✪✘✥✧✦✟✭✯✭✮✦✚✙ ✂☎✄✝✆✟✞✡✠✝✄☞☛✍✌✎✞✡✠☎✏✑✄✓✒✕✔✑✆ ●✼❍■● ✴❏✻❂❁■✗✤✦✿✸✘✥✧✴✝✗✤✭❑✖✿✗✤✙▲✶✓✳▼✰◆✖✘✳ ✔✜☛✓✒✕✏✑✄✣✢✤✢ ✖✘✗✚✙ ✖✍✥✧✦ ❖P✦✍✳✲✳✵✭ ✻❂❁✟✖✼✻ ◗✝❘❚❙✕◗❊❄❱❯ ✰✵✭ ✦✩★✫✪✘✥✬✦✟✭✮✭✯✦✟✙✱✰✲✗ ✦✩★✫✪✘✥✬✦✟✭✮✭✯✦✟✙✱✰✲✗❳❲❨✦❃✻❩✖✘✳ ✥✧✖✍✰✲✗ ✖✘✗✚✖✘✳✵✴✘✶✷✴✹✸✩✭✺✪✟✖✼✻✽✻✾✦✿✥❀✗✩✭ ✖✘✗✚✙✱✰❬✗❭✖❪✥✽✖✘✗✩✶✘✦❭✴✚❲❫✖✤✙✣✸✘✳ in each ✻❴✰✵✭✯✭❵✸✤✦✟✭✩❛ �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ Observation 2 generalizable features ✄✍❞❫✠☞❡♥✄❣✢✤✢❃✄✝✂ ✻❂❁✤✖❃✻ ✐✡❥❧❦✟♠ ✰✵✭ ✙✘✦✟❖P✖✍✪✩✦✍✗✼✻❩✖✘✪✿✳✵✦✟✶✝✰✵❖ ●✼❍■● ✴✤✻❂❁▲✗✤✦✿✸✘✥✧✴✝✗✤✭❑✖✿✗✤✙■✶✝✳✲✰♦✖✘✳ ✖✘✗✚✙ ❜❝✰❬✗✤✶✝✳✵✦✚✭✮✭ ✖✘✥✧✦ ❖P✦✿✳▼✳✵✭ ✻❂❁✟✖✼✻ ♣✼qsr✝♣✤✐☎t ✰✵✭ ✄✍❞❫✠☎❡❢✄❣✢✤✢❏✄✓✂ ✰❤✗ ✄✍❞❫✠☎❡❢✄❣✢✤✢❏✄✓✂ ✰❤✗✺❲❨✦❏✻✉✖✍✳ ✥✽✖✘✰❬✗ ✖✘✗✚✖✘✳✵✴✘✶✷✴✹✸✩✭✺✪✟✖✼✻✽✻✾✦✿✥❀✗✩✭ ✖✘✗✟✙✱✰❤✗✈✖✇✥❨✖✘✗✤✶✘✦❭✴✚❲❫✖✤✙✣✸✘✳ in each ✻❴✰✵✭✮✭✼✸✤✦✟✭❃❛ feature “X be expressed” �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✁ ✑✠ 6

  7. ✁ ✁ � ✁ General idea: two-stage approach domain-specific features Source Target Domain Domain generalizable features features �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ Goal Source Target Domain Domain features �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ 7

  8. ✁ � ✁ ✂ Regular classification Source Target Domain Domain features �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ Generalization: to emphasize generalizable features in the trained model Source Target Domain Domain features Stage 1 �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ 8

  9. ✄ ✁ Adaptation: to pick up domain-specific features for the target domain Source Target Domain Domain features Stage 2 �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✁ ✡✆ Regular semi-supervised learning Source Target Domain Domain features �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ 9

  10. ✠ � ✁ ✁ � � ✁ � ☛ ✁ Comparison with related work We explicitly model generalizable features. Previous work models it implicitly [Blitzer et al. 2006, Ben- David et al. 2007, Daumé III 2007]. We do not need labeled target data but we need multiple source (training) domains. Some work requires labeled target data [Daumé III 2007]. We have a 2 nd stage of adaptation, which uses semi-supervised learning. Previous work does not incorporate semi-supervised learning [Blitzer et al. 2006, Ben-David et al. 2007, Daumé III 2007]. �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ Implementation of the two- stage approach with logistic regression classifiers �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ 10

  11. ✗ ✘ ✖ ✠ ✁ ✖ ✖ Logistic regression classifiers 0.2 0 -less 4.5 1 w x T 5 0 exp( ) y x w = p y ( | , ) -0.3 0 w x T exp( ' ) 3.0 1 ∑ y p binary features y : : ' : : X be expressed 2.1 0 -0.9 1 … and wingless are 0.4 0 expressed in… T x w y w y x �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ Learning a logistic regression classifier 0.2 0 regularization term 4.5 1 ( 5 0 w w 2 = λ ˆ arg min -0.3 0 w 3.0 1 penalize large weights : : w x T N exp( ) 1 : : − y log control model complexity ∑ w T x N 2.1 0 exp( ) i = 1 ∑ y ' -0.9 1 y ' 0.4 0 T x log likelihood of w y training data �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✠✕✠ 11

  12. Generalizable features in weight vectors D 1 D 2 D K K source domains 0.2 3.2 0.1 4.5 0.5 0.7 5 4.5 4.2 domain-specific features -0.3 -0.1 0.1 3.0 3.5 3.2 : : : … generalizable : : : features 2.1 0.1 1.7 -0.9 -1.0 0.1 0.4 -0.2 0.3 w K w 1 w 2 �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✠ ✁� We want to decompose w in this way h non-zero entries for h 0.2 0 0.2 4.5 0 4.5 generalizable 5 4.6 0.4 features -0.3 0 -0.3 = + 3.0 3.2 -0.2 : : : : : : 2.1 0 2.1 -0.9 0 -0.9 0.4 0 0.4 �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✠ ✁✂ 12

  13. Feature selection matrix A matrix A selects h generalizable features 0 1 0 0 1 0 0 … 0 0 0 0 0 0 0 1 … 0 1 0 h : : 1 : : : 0 0 0 0 0 … 1 : 0 0 z = A x A 1 0 x �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✠ ✁� Decomposition of w weights for domain-specific features weights for generalizable features 0.2 0 0.2 0 4.5 1 4.5 1 4.6 0 5 0 0.4 0 -0.3 0 3.2 1 -0.3 0 + = : : 3.0 1 -0.2 1 : : : : : : 3.6 0 : : : : 2.1 0 2.1 0 -0.9 1 -0.9 1 0.4 0 0.4 0 w T x v T z + u T x = �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✠ ✁✂ 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend