New Regularized Algorithms for Transductive Learning Partha Pratim - - PowerPoint PPT Presentation

new regularized algorithms for transductive learning
SMART_READER_LITE
LIVE PREVIEW

New Regularized Algorithms for Transductive Learning Partha Pratim - - PowerPoint PPT Presentation

New Regularized Algorithms for Transductive Learning Partha Pratim Talukdar University of Pennsylvania, USA Koby Crammer Technion, Israel 1 Graph-based Semi-Supervised Learning 0.2 0.2 0.1 0.3 0.3 0.2 0.2 2 Graph-based


slide-1
SLIDE 1

New Regularized Algorithms for Transductive Learning

Partha Pratim Talukdar University of Pennsylvania, USA Koby Crammer Technion, Israel

1

slide-2
SLIDE 2

Graph-based Semi-Supervised Learning

2

0.3 0.2 0.2 0.1 0.2 0.3 0.2

slide-3
SLIDE 3

Graph-based Semi-Supervised Learning

2

Labeled (seed)

0.3 0.2 0.2 0.1 0.2 0.3 0.2

slide-4
SLIDE 4

Graph-based Semi-Supervised Learning

2

Unlabeled Labeled (seed)

0.3 0.2 0.2 0.1 0.2 0.3 0.2

slide-5
SLIDE 5

Graph-based Semi-Supervised Learning

3

0.3 0.2 0.2 0.1 0.2 0.3 0.2

slide-6
SLIDE 6

Graph-based Semi-Supervised Learning

3

Various methods: LP (Zhu et al., 2003); QC (Bengio et al., 2007); Adsorption (Baluja et al., 2008)

0.3 0.2 0.2 0.1 0.2 0.3 0.2

slide-7
SLIDE 7

Adsorption Algorithm

4

slide-8
SLIDE 8

Adsorption Algorithm

  • Successfully used in
  • YouTube

Video Recommendation [Baluja et al., 2008], Semantic Classification [Talukdar et al., 2008]

4

slide-9
SLIDE 9

Adsorption Algorithm

  • Successfully used in
  • YouTube

Video Recommendation [Baluja et al., 2008], Semantic Classification [Talukdar et al., 2008]

  • It has not been analyzed so far
  • Is it optimizing an objective? If so, what?
  • Motivation for proposed work

4

slide-10
SLIDE 10

Adsorption Algorithm

[Baluja et al., WWW 2008]

0.3 0.3 0.2

slide-11
SLIDE 11

Adsorption Algorithm

[Baluja et al., WWW 2008]

0.3 0.3 0.2

slide-12
SLIDE 12

Adsorption Algorithm

[Baluja et al., WWW 2008]

0.3 0.3 0.2 Dummy Label

slide-13
SLIDE 13

Characteristics of Adsorption

6

slide-14
SLIDE 14

Characteristics of Adsorption

  • Highly scalable and iterative

6

slide-15
SLIDE 15

Characteristics of Adsorption

  • Highly scalable and iterative
  • Main difference with previous methods:
  • all nodes are not equal: high-degree nodes

are discounted

6

slide-16
SLIDE 16

Characteristics of Adsorption

  • Highly scalable and iterative
  • Main difference with previous methods:
  • all nodes are not equal: high-degree nodes

are discounted

  • Two equivalent views:

6

slide-17
SLIDE 17

Characteristics of Adsorption

  • Highly scalable and iterative
  • Main difference with previous methods:
  • all nodes are not equal: high-degree nodes

are discounted

  • Two equivalent views:

6

U L Label Diffusion L Random Walk U

slide-18
SLIDE 18

Random Walk View

7

U V

slide-19
SLIDE 19

Random Walk View

7

U V

what next?

slide-20
SLIDE 20

Random Walk View

7

U V

what next?

  • Continue walk with prob.
  • Assign

V’s seed label to U with prob.

  • Abandon random walk with prob.
  • assign U a dummy label with score

pcont

v

pinj

v

pabnd

v

pabnd

v

slide-21
SLIDE 21

Discounting High-Degree Nodes

slide-22
SLIDE 22

Discounting High-Degree Nodes

  • High-degree nodes can be unreliable
  • do not allow propagation/walk through

them

slide-23
SLIDE 23

Discounting High-Degree Nodes

  • High-degree nodes can be unreliable
  • do not allow propagation/walk through

them

  • Solution: increase abandon probability on high-

degree nodes

slide-24
SLIDE 24

Discounting High-Degree Nodes

  • High-degree nodes can be unreliable
  • do not allow propagation/walk through

them

  • Solution: increase abandon probability on high-

degree nodes

pabnd

v

∝ degree(v)

slide-25
SLIDE 25

Is Adsorption Optimizing an Objective?

slide-26
SLIDE 26

Is Adsorption Optimizing an Objective?

  • Under certain assumptions, NO!
  • Theorem in the paper
slide-27
SLIDE 27

Is Adsorption Optimizing an Objective?

  • Under certain assumptions, NO!
  • Theorem in the paper
  • Our Goal:
  • Retain Adsorption’s desirable properties
  • But do so using a well-defined optimization
slide-28
SLIDE 28

Is Adsorption Optimizing an Objective?

  • Under certain assumptions, NO!
  • Theorem in the paper
  • Our Goal:
  • Retain Adsorption’s desirable properties
  • But do so using a well-defined optimization
  • Proposed Solution: MAD (next slide)
slide-29
SLIDE 29

Modified Adsorption (MAD)

[This Paper]

10

MAD Objective

slide-30
SLIDE 30

Modified Adsorption (MAD)

[This Paper]

10

MAD Objective

min{ˆ

yvl}

  • l
  • µ1
  • v

pinj

v (yvl − ˆ

yvl)2 + µ2

  • u
  • v

w

  • uv(ˆ

yul − ˆ yvl)2 + µ3

  • v

(rvl − ˆ yvl)2

slide-31
SLIDE 31

Modified Adsorption (MAD)

[This Paper]

10

MAD Objective

Smoothness Loss Across Edge Label Prior Loss (e.g. prior on dummy label)

+ +

Seed Label Loss (if any)

min

slide-32
SLIDE 32

Modified Adsorption (MAD)

[This Paper]

  • High-degree node discounting is enforced through the

third term

10

MAD Objective

Smoothness Loss Across Edge Label Prior Loss (e.g. prior on dummy label)

+ +

Seed Label Loss (if any)

min

slide-33
SLIDE 33

Modified Adsorption (MAD)

[This Paper]

  • High-degree node discounting is enforced through the

third term

  • Results in an Adsorption-like iterative update, scalable

10

MAD Objective

Smoothness Loss Across Edge Label Prior Loss (e.g. prior on dummy label)

+ +

Seed Label Loss (if any)

min

slide-34
SLIDE 34

Extension to Dependent Labels

  • Labels are not always mutually exclusive.

11

slide-35
SLIDE 35

Extension to Dependent Labels

  • Labels are not always mutually exclusive.

11

BrownAle 1.0 Ale PaleAle ScotchAle 1.0 1.0 TopFormentedBeer 0.95 White Porter 0.8 0.8

slide-36
SLIDE 36

Extension to Dependent Labels

  • Labels are not always mutually exclusive.

11

Label Graph

BrownAle 1.0 Ale PaleAle ScotchAle 1.0 1.0 TopFormentedBeer 0.95 White Porter 0.8 0.8

Labels Label Similarity

slide-37
SLIDE 37

MAD with Dependent Labels (MADDL)

12

min

Edge Smoothness Loss Label Prior Loss (e.g. prior

  • n dummy label)

+ +

Seed Label Loss (if any)

slide-38
SLIDE 38

MAD with Dependent Labels (MADDL)

12

min

Edge Smoothness Loss Label Prior Loss (e.g. prior

  • n dummy label)

+ +

Seed Label Loss (if any)

+

Dependent Label Loss

slide-39
SLIDE 39

MAD with Dependent Labels (MADDL)

12

MADDL Objective min

Edge Smoothness Loss Label Prior Loss (e.g. prior

  • n dummy label)

+ +

Seed Label Loss (if any)

+

Dependent Label Loss

slide-40
SLIDE 40

MAD with Dependent Labels (MADDL)

12

MADDL Objective min

Edge Smoothness Loss Label Prior Loss (e.g. prior

  • n dummy label)

+ +

Seed Label Loss (if any)

+

Dependent Label Loss Penalize if similar labels are assigned different scores on a node

slide-41
SLIDE 41

MAD with Dependent Labels (MADDL)

12

MADDL Objective min

Edge Smoothness Loss Label Prior Loss (e.g. prior

  • n dummy label)

+ +

Seed Label Loss (if any)

+

Dependent Label Loss Penalize if similar labels are assigned different scores on a node BrownAle 1.0 Ale

slide-42
SLIDE 42

MAD with Dependent Labels (MADDL)

12

MADDL Objective

  • MADDL objective results in a

scalable iterative update, with convergence guarantee.

min

Edge Smoothness Loss Label Prior Loss (e.g. prior

  • n dummy label)

+ +

Seed Label Loss (if any)

+

Dependent Label Loss Penalize if similar labels are assigned different scores on a node BrownAle 1.0 Ale

slide-43
SLIDE 43

Experimental Setup

13

slide-44
SLIDE 44

Experimental Setup

  • I. Classification Experiments
  • WebKB (4 classes) [Subramanya and Bilmes, 2008]
  • Sentiment Classification (4 classes) [Blitzer,

Dredze and Pereira, 2007]

  • k-Nearest Neighbor Graph (k is tuned)

13

slide-45
SLIDE 45

Experimental Setup

  • I. Classification Experiments
  • WebKB (4 classes) [Subramanya and Bilmes, 2008]
  • Sentiment Classification (4 classes) [Blitzer,

Dredze and Pereira, 2007]

  • k-Nearest Neighbor Graph (k is tuned)
  • II. Smoother Sentiment Ranking with MADDL

13

slide-46
SLIDE 46

PRBEP (macro-averaged) on WebKB Dataset, 3148 test instances

14

(Zhu et al., 03)

slide-47
SLIDE 47

Precision on 3568 Sentiment test instances

(Zhu et al., 03)

slide-48
SLIDE 48
  • II. Smooth Sentiment Ranking

16

smooth predictions

rank 1 rank 4

slide-49
SLIDE 49
  • II. Smooth Sentiment Ranking

16

non-smooth predictions smooth predictions

rank 1 rank 4

slide-50
SLIDE 50
  • II. Smooth Sentiment Ranking

16

  • ver

Prefer

non-smooth predictions smooth predictions

rank 1 rank 4

slide-51
SLIDE 51
  • II. Smooth Sentiment Ranking

16

  • ver

Prefer

non-smooth predictions 1.0 1.0

MADDL Label Constraints

smooth predictions

rank 1 rank 4

slide-52
SLIDE 52

1 2 3 4 1 2 3 4 5000 10000 Label 1 Count of Top Predicted Pair in MADDL Output Label 2

  • II. Smooth Sentiment Ranking

1 2 3 4 1 2 3 4 2000 4000 Label 1 Count of Top Predicted Pair in MAD Output Label 2

slide-53
SLIDE 53

1 2 3 4 1 2 3 4 5000 10000 Label 1 Count of Top Predicted Pair in MADDL Output Label 2

  • II. Smooth Sentiment Ranking

1 2 3 4 1 2 3 4 2000 4000 Label 1 Count of Top Predicted Pair in MAD Output Label 2

MADDL generates smoother ranking, while preserving accuracy of prediction.

slide-54
SLIDE 54

Conclusion

18

slide-55
SLIDE 55

Conclusion

  • Presented Modified Adsorption (MAD)
  • an Adsorption-like algorithm but with well defined
  • ptimization.

18

slide-56
SLIDE 56

Conclusion

  • Presented Modified Adsorption (MAD)
  • an Adsorption-like algorithm but with well defined
  • ptimization.
  • Extended MAD to MADDL
  • MADDL can handle non mutually-exclusive labels.

18

slide-57
SLIDE 57

Conclusion

  • Presented Modified Adsorption (MAD)
  • an Adsorption-like algorithm but with well defined
  • ptimization.
  • Extended MAD to MADDL
  • MADDL can handle non mutually-exclusive labels.
  • Demonstrated effectiveness of MAD and

MADDL on real world datasets.

18

slide-58
SLIDE 58

Conclusion

  • Presented Modified Adsorption (MAD)
  • an Adsorption-like algorithm but with well defined
  • ptimization.
  • Extended MAD to MADDL
  • MADDL can handle non mutually-exclusive labels.
  • Demonstrated effectiveness of MAD and

MADDL on real world datasets.

  • Future Work
  • Apply MADDL in other domains with dependent

labels e.g. Information Extraction

18

slide-59
SLIDE 59

Thanks!

19

algorithm authors