What is domain adaptation? - - PDF document

what is domain adaptation
SMART_READER_LITE
LIVE PREVIEW

What is domain adaptation? - - PDF document

A Two-Stage Approach to Domain Adaptation for Statistical Classifiers Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign What is domain adaptation?


slide-1
SLIDE 1

1

A Two-Stage Approach to Domain Adaptation for Statistical Classifiers

Jing Jiang & ChengXiang Zhai

Department of Computer Science University of Illinois at Urbana-Champaign

✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✠

What is domain adaptation?

slide-2
SLIDE 2

2

✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆
  • Example: named entity recognition

New York Times New York Times

train (labeled) test (unlabeled)

NER Classifier

standard supervised learning

85.5%

persons, locations, organizations, etc.

✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✁

Example: named entity recognition

New York Times NER Classifier

train (labeled) test (unlabeled)

New York Times labeled data not available Reuters

64.1%

persons, locations, organizations, etc.

non-standard (realistic) setting

slide-3
SLIDE 3

3

✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆
  • Domain difference

performance drop

train test

NYT NYT

New York Times New York Times NER Classifier

85.5%

Reuters NYT

Reuters New York Times NER Classifier

64.1%

ideal setting realistic setting

✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✂

Another NER example

train test

mouse mouse gene name recognizer fly mouse gene name recognizer

ideal setting realistic setting

54.1% 28.1%

slide-4
SLIDE 4

4

✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✆

Other examples

  • Spam filtering:

Public email collection

personal inboxes

  • Sentiment analysis of product reviews

Digital cameras

cell phones

Movies

books

  • Can we do better than standard supervised

learning?

  • Domain adaptation: to design learning methods that

are aware of the training and test domain difference.

✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✄

How do we solve the problem in general?

slide-5
SLIDE 5

5

✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆
  • Observation 1

domain-specific features

wingless daughterless eyeless apexless …

✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✁✡☛

Observation 1

domain-specific features

wingless daughterless eyeless apexless …

  • describing phenotype
  • in fly gene nomenclature
  • feature “-less” weighted high

CD38 PABPC5 …

feature still useful for other

  • rganisms?

No!

slide-6
SLIDE 6

6

✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✁ ✁

Observation 2

generalizable features

✁ ✂☎✄✝✆✟✞✡✠✝✄☞☛✍✌✎✞✡✠☎✏✑✄✓✒✕✔✑✆ ✖✘✗✚✙ ✛ ✔✜☛✓✒✕✏✑✄✣✢✤✢ ✖✍✥✧✦ ✦✩★✫✪✘✥✬✦✟✭✮✭✯✦✟✙✱✰✲✗ ✖✘✗✚✖✘✳✵✴✘✶✷✴✹✸✩✭✺✪✟✖✼✻✽✻✾✦✿✥❀✗✩✭

in each

✁ ✁ ✻❂❁✤✖❃✻ ❄❆❅❈❇❊❉ ✰✵✭❋✦❃★✿✪✘✥✧✦✟✭✯✭✮✦✚✙
  • ✼❍■●
✴❏✻❂❁■✗✤✦✿✸✘✥✧✴✝✗✤✭❑✖✿✗✤✙▲✶✓✳▼✰◆✖✘✳ ❖P✦✍✳✲✳✵✭ ✁ ✻❂❁✟✖✼✻ ◗✝❘❚❙✕◗❊❄❱❯ ✰✵✭ ✦✩★✫✪✘✥✬✦✟✭✮✭✯✦✟✙✱✰✲✗❳❲❨✦❃✻❩✖✘✳
  • ✥✧✖✍✰✲✗
✖✘✗✚✙✱✰❬✗❭✖❪✥✽✖✘✗✩✶✘✦❭✴✚❲❫✖✤✙✣✸✘✳ ✻ ✻❴✰✵✭✯✭❵✸✤✦✟✭✩❛ ✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✁✑✠

Observation 2

generalizable features

✁ ✙✘✦✟❖P✖✍✪✩✦✍✗✼✻❩✖✘✪✿✳✵✦✟✶✝✰✵❖ ✖✘✗✚✙ ❜❝✰❬✗✤✶✝✳✵✦✚✭✮✭ ✖✘✥✧✦ ✄✍❞❫✠☎❡❢✄❣✢✤✢❏✄✓✂ ✰❤✗ ✖✘✗✚✖✘✳✵✴✘✶✷✴✹✸✩✭✺✪✟✖✼✻✽✻✾✦✿✥❀✗✩✭

in each

✁ ✁ ✻❂❁✤✖❃✻ ✐✡❥❧❦✟♠ ✰✵✭ ✄✍❞❫✠☞❡♥✄❣✢✤✢❃✄✝✂
  • ✼❍■●
✴✤✻❂❁▲✗✤✦✿✸✘✥✧✴✝✗✤✭❑✖✿✗✤✙■✶✝✳✲✰♦✖✘✳ ❖P✦✿✳▼✳✵✭ ✁ ✻❂❁✟✖✼✻ ♣✼qsr✝♣✤✐☎t ✰✵✭ ✄✍❞❫✠☎❡❢✄❣✢✤✢❏✄✓✂ ✰❤✗✺❲❨✦❏✻✉✖✍✳
  • ✥✽✖✘✰❬✗
✖✘✗✟✙✱✰❤✗✈✖✇✥❨✖✘✗✤✶✘✦❭✴✚❲❫✖✤✙✣✸✘✳ ✻ ✻❴✰✵✭✮✭✼✸✤✦✟✭❃❛

feature “X be expressed”

slide-7
SLIDE 7

7

✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✁
  • General idea: two-stage approach

Source Domain Target Domain features generalizable features domain-specific features

✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✁ ✁

Goal

Target Domain Source Domain features

slide-8
SLIDE 8

8

✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✁
  • Regular classification

Source Domain Target Domain features

✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✁ ✂

Generalization: to emphasize generalizable

features in the trained model

Source Domain Target Domain features

Stage 1

slide-9
SLIDE 9

9

✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✁✡✆

Adaptation: to pick up domain-specific

features for the target domain

Source Domain Target Domain features

Stage 2

✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✁ ✄

Regular semi-supervised learning

Source Domain Target Domain features

slide-10
SLIDE 10

10

✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✁
  • Comparison with related work
  • We explicitly model generalizable features.

Previous work models it implicitly [Blitzer et al. 2006, Ben- David et al. 2007, Daumé III 2007].

  • We do not need labeled target data but we need

multiple source (training) domains.

Some work requires labeled target data [Daumé III 2007].

  • We have a 2nd stage of adaptation, which uses

semi-supervised learning.

Previous work does not incorporate semi-supervised learning [Blitzer et al. 2006, Ben-David et al. 2007, Daumé III 2007].

✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✠ ☛

Implementation of the two- stage approach with logistic regression classifiers

slide-11
SLIDE 11

11

✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✠ ✁

x

Logistic regression classifiers

1 1 : : 1 0.2 4.5 5

  • 0.3

3.0 : : 2.1

  • 0.9

0.4

  • less

X be expressed

wy

T x

=

' ' )

exp( ) exp( ) , | (

y T y T y

y p x w x w w x

p binary features

… and wingless are expressed in…

wy

✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✠✕✠

Learning a logistic regression classifier

1 1 : : 1 0.2 4.5 5

  • 0.3

3.0 : : 2.1

  • 0.9

0.4

(

✖ ✖ ✖ ✗ ✘

− =

∑ ∑

= N i y T y T y

N

1 ' ' 2

) exp( ) exp( log 1 min arg ˆ x w x w w w

w

λ

log likelihood of training data regularization term

penalize large weights control model complexity

wy

T x

slide-12
SLIDE 12

12

✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✠✁

Generalizable features in weight vectors

0.2 4.5 5

  • 0.3

3.0 : : 2.1

  • 0.9

0.4 3.2 0.5 4.5

  • 0.1

3.5 : : 0.1

  • 1.0
  • 0.2

0.1 0.7 4.2 0.1 3.2 : : 1.7 0.1 0.3 D1 D2 DK w1 w2 wK …

K source domains

generalizable features domain-specific features

✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✠✁✂

We want to decompose w in this way

0.2 4.5 5

  • 0.3

3.0 : : 2.1

  • 0.9

0.4 4.6 3.2 : : 0.2 4.5 0.4

  • 0.3
  • 0.2

: : 2.1

  • 0.9

0.4 = + h non-zero entries for h generalizable features

slide-13
SLIDE 13

13

✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✠✁

Feature selection matrix A

0 0 1 0 0 … 0 0 0 0 0 1 … 0 : : 0 0 0 0 0 … 1 1 1 : : 1 1 : :

x A z = Ax

h

matrix A selects h generalizable features

✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✠✁✂

0.2 4.5 5

  • 0.3

3.0 : : 2.1

  • 0.9

0.4 4.6 3.2 : : 3.6 0.2 4.5 0.4

  • 0.3
  • 0.2

: : 2.1

  • 0.9

0.4 = +

Decomposition of w

1 1 : : 1 1 : : 1 1 : : 1

wT x = vT z + uT x

weights for generalizable features weights for domain-specific features

slide-14
SLIDE 14

14

✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✠ ✆

Decomposition of w

wTx = vTz + uTx = (Av)Tx + uTx = vTAx + uTx w = AT v + u

✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✠✁

0.2 4.5 5

  • 0.3

3.0 : : 2.1

  • 0.9

0.4 4.6 3.2 : : 3.6 0.2 4.5 0.4

  • 0.3
  • 0.2

: : 2.1

  • 0.9

0.4 = + 0 0 … 0 0 0 … 0 1 0 … 0 0 0 … 0 0 1 … 0 : : 0 0 … 0 0 0 … 0 0 0 … 0

Decomposition of w

shared by all domains domain-specific

w = AT v + u

slide-15
SLIDE 15

15

✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✠✁

Framework for generalization

Fix A, optimize: wk regularization term

s >> 1: to penalize domain-specific features Source Domain Target Domain

✄ ☎ ✆

+ −

✝✞✟ ✠ ✡ ☛ ☞✌✍

+ =

✎ ✎ ✎

= = =

k k k

N k N i k T k i k i k K k k s k

A y p N K

1 1 1 2 2 } { ,

) ; | ( log 1 1 min arg }) ˆ { , ˆ ( u v x u v u v

u v

λ λ

likelihood of labeled data from K source domains

✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✕☛

Framework for adaptation

✏ ✑✒ ✓ ✔✕

+ +

✖ ✖✗✘

+ + −

✙✚✛ ✓ ✔✕ ✖ ✗ ✘

+ + =

✜ ✜ ✜ ✜

= = = =

) ; | ( log 1 ) ; | ( log 1 1 1 min arg }) ˆ { , ˆ , ˆ (

1 1 1 2 1 2 2 } { , m i t T t i t i N k N i k T k i k i k t t K k k s k t

A y p m A y p N K

k k k t

u v x u v x u u v u u v

u u v

λ λ λ

Source Domain Target Domain

likelihood of pseudo labeled target domain examples

t = 1 <<

s : to pick up domain-specific

features in the target domain Fix A, optimize:

slide-16
SLIDE 16

16

✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✕✗✖ ✘

Joint optimization

Alternating optimization

✚ ✛ ✜

+ −

✢✣✤ ✥ ✦ ✧ ★✩✪

+ =

∑ ∑ ∑

= = =

k k k

N k N i k T k i k i k K k k s A, k

A y p N K A

1 1 1 2 2 } { ,

) ; | ( log 1 1 min arg }) ˆ { , ˆ , ˆ ( u v x u v u v

u v

λ λ

How to find A? (1)

✫✂✬☎✭✝✮✟✯✡✰☞✱☎✱☎✮ ✲✎✳✑✴✓✵✔✰☞✱☎✱☎✮ ✶✡✰

How to find A? (2)

Domain cross validation

Idea: training on (K – 1) source domains and test

  • n the held-out source domain

Approximation:

wf

k: weight for feature f learned from domain k

wf

k: weight for feature f learned from other domains

rank features by

See paper for details

=

K k k f k f w

w

1

slide-17
SLIDE 17

17

✫✂✬☎✭✝✮✟✯✡✰☞✱☎✱☎✮ ✲✎✳✑✴✓✵✔✰☞✱☎✱☎✮ ✶ ✶

Intuition for domain cross validation

domains

… … … expressed … … …

  • less

D1 D2 Dk-1 Dk (fly)

… …

  • less

… … expressed … … w 1.5 0.05 w 2.0 1.2 … … … expressed … … …

  • less

1.8 0.1

✫✂✬☎✭✝✮✟✯✡✰☞✱☎✱☎✮ ✲✎✳✑✴✓✵✔✰☞✱☎✱☎✮ ✶✂✁

Experiments

Data set

BioCreative Challenge Task 1B

Gene/protein name recognition

3 organisms/domains: fly, mouse and yeast

Experiment setup

2 organisms for training, 1 for testing

F1 as performance measure

slide-18
SLIDE 18

18

✫✂✬☎✭✝✮✟✯✡✰☞✱☎✱☎✮ ✲✎✳✑✴✓✵✔✰☞✱☎✱☎✮ ✶✁

Experiments: Generalization

0.470 0.195 0.654 DA-2 (domain CV) 0.425 0.153 0.627 DA-1 (joint-opt) 0.416 0.129 0.633 BL Y+F

M M+Y

F F+M

Y Method

Source Domain Target Domain Source Domain Target Domain

using generalizable features is effective domain cross validation is more effective than joint optimization

F: fly M: mouse Y: yeast

✫✂✬☎✭✝✮✟✯✡✰☞✱☎✱☎✮ ✲✎✳✑✴✓✵✔✰☞✱☎✱☎✮ ✶✁✄

Experiments: Adaptation

0.501 0.305 0.759 DA-2-SSL 0.458 0.241 0.633 BL-SSL Y+F

M M+Y

F F+M

Y Method

Source Domain Target Domain

F: fly M: mouse Y: yeast

domain-adaptive bootstrapping is more effective than regular bootstrapping

Source Domain Target Domain

slide-19
SLIDE 19

19

✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✕✖✆

Experiments: Adaptation

domain-adaptive SSL is more effective, especially with a small number of pseudo labels

✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✕✘✗

Conclusions and future work

Two-stage domain adaptation

Generalization: outperformed standard supervised learning

Adaptation: outperformed standard bootstrapping

Two ways to find generalizable features

Domain cross validation is more effective

Future work

Single source domain?

Setting parameters h and m

slide-20
SLIDE 20

20

✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✕✁

References

  • S. Ben-David, J. Blitzer, K. Crammer & F.
  • Pereira. Analysis of representations for domain
  • adaptation. NIPS 2007.
  • J. Blitzer, R. McDonald & F. Pereira. Domain

adaptation with structural correspondence

  • learning. EMNLP 2006.
  • H. Daumé III. Frustratingly easy domain
  • adaptation. ACL 2007.
✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✄✖☛

Thank you!