Efficient Learning of Topic Ranking by Soft-projections onto - - PowerPoint PPT Presentation

efficient learning of topic ranking by soft projections
SMART_READER_LITE
LIVE PREVIEW

Efficient Learning of Topic Ranking by Soft-projections onto - - PowerPoint PPT Presentation

Efficient Learning of Topic Ranking by Soft-projections onto Polyhedra Yoram Singer Part of the work was performed at The Hebrew University, Jerusalem, Israel AIM Workshop on the Mathematics of Ranking, Aug. 19, 2010 1 Acknowledgements


slide-1
SLIDE 1

1

Efficient Learning of Topic Ranking by Soft-projections onto Polyhedra

Yoram Singer

Part of the work was performed at The Hebrew University, Jerusalem, Israel

AIM Workshop on the Mathematics of Ranking, Aug. 19, 2010

slide-2
SLIDE 2

2

Acknowledgements

  • Koby Crammer, Technion

Initial framework & numerous algorithms for online ranking using dual decomposition

  • Shai Shalev-Shwartz, Hebrew U.

Specialized efficient ranking algorithm, Regret bound analysis using primal-dual method

slide-3
SLIDE 3

3

This Workshop (thus far & today)

  • Overviews of structure of ranking problems
  • Foundations from economics to social choice
  • Probabilistic & statistical analysis of orderings
  • Models for expressing & generating orderings
  • Page rank, graph-based methods, Hodge theory
  • Overview of machine learning 4 ranking (MLR)
  • Learning theoretic analysis of MLR (to follow)
slide-4
SLIDE 4

4

This Talk

  • Specific ranking problem setting
  • Loss minimization framework
  • An efficient learning algorithm
  • Brief experimental discussion
  • High-level overview of formal analysis
slide-5
SLIDE 5

5

ECONOMICS CORPORATE / INDUSTRIAL REGULATION / POLICY MARKETS LABOUR GOVERNMENT / SOCIAL LEGAL/JUDICIAL REGULATION/POLICY SHARE LISTINGS PERFORMANCE ACCOUNTS/EARNINGS COMMENT / FORECASTS SHARE CAPITAL BONDS / DEBT ISSUES LOANS / CREDITS STRATEGY / PLANS INSOLVENCY / LIQUIDITY

Desired Ordering Document The higher minimum wage signed into law… will be welcome relief for millions of workers … . The 90- cent-an-hour increase … . Relevant topics

MARKETS LABOUR ECONOMICS REGULATION / POLICY CORPORATE / INDUSTRIAL GOVERNMENT / SOCIAL

Multi-label , Bipartite feedback

Topic Ranking

Courtesy of K. Crammer

slide-6
SLIDE 6

6

  • Instances - vectors in

(documents, images, speech signals, …)

  • Predefined set of labels

(topics, categories, phonemes)

  • Target ranking of labels:

label r preferred over s ( ) iff

  • Ranking functions

Topic Ranking: Setting

slide-7
SLIDE 7

7

Special Cases

  • Binary classification:
  • Multiclass categorization:
  • Multiclass multilabel:
slide-8
SLIDE 8

8

Preference-Based View

1 5 2 4 3

3 3 2 2 1 1 1 1

slide-9
SLIDE 9

9

Quality of Predicted Ranking

  • Loss for a pair of labels s.t.
  • Measures whether we predicted the
  • rder of a pair correctly, and with

sufficient confidence

slide-10
SLIDE 10

10

Quality of Predicted Ranking (cont.)

  • Loss of f on a subset E:
  • Loss of f on an entire example

predefined subset weights We will assume that each defines a bipartite graph

slide-11
SLIDE 11

11

Linear Ranking Functions

  • Linear predictors
  • “Complexity” of a predictor
  • Complexity of a function set
  • Can be used with Mercer kernels
slide-12
SLIDE 12

12

Loss Minimization & Regularization

Complexity of Ranker Empirical Risk of Ranker

slide-13
SLIDE 13

13

Loss  Pair-wise Constraints

  • Each pair of (comparable) labels corresponds to

a margin constraint

  • Slack variables distinguish between different edge-sets
  • Focus on a single instance & a single edge-set
  • Make use of the fact that

Loss:

slide-14
SLIDE 14

14

A Reduced Problem

  • Current estimate of ranking functions
  • “New” example
  • Update ranking function
  • Framework for online learning
  • Iterative procedure for batch optimization
slide-15
SLIDE 15

15

Solving the Reduced Problem

  • Direct use of Lagrange multipliers & strong duality

leads to variables since each constraint is associated with a variable.

slide-16
SLIDE 16

16

Reducing the Reduced…

  • Introduce new |A|+|B|=k variables

B A

1 2 2 3 1

slide-17
SLIDE 17

17

Reduced & Compacted Dual

  • More compact dual problem:
  • Feasible  Feasible
  • Feasible  Feasible
slide-18
SLIDE 18

18

Obtaining (Primal) Solution

  • The new set of ranking vectors:
  • But, we still need to find
slide-19
SLIDE 19

19

“Decoupling” the Dual

  • Compact Dual
  • Suppose we were given
  • We can solve two independent problems
slide-20
SLIDE 20

20

Solving the Decoupled Problems

  • We need to solve
  • Optimal solution takes the form
  • Can be found in linear time using an

improved & generalized (CS’99, SS’06, DSSC’09) projection technique by Bretsekas

slide-21
SLIDE 21

21

Finding C*

  • The sets define |A|+|B|=k “knots”

Value of Dual as function of C*

  • Function is piecewise quadratic
  • Minimum is unique
  • Can be computed in O(k)
slide-22
SLIDE 22

22

“Coupled” Again

  • Locate the global optimum
  • Check whether C>C*

Value

  • f

Dual

slide-23
SLIDE 23

23

Recapping

  • 1. Focus on a single example
  • 2. Find a compact form of the dual
  • 3. Decouple the reduced & compact dual
  • 4. Compute “knots” for decoupled problems
  • 5. Find the optimal value of
  • 6. Once C* is known we use soft projection to

find α and β

  • 7. Once α and β are known we find w from u
slide-24
SLIDE 24

24

Back to Multiple Examples

  • Cycle through the examples:
  • Online mode:
  • Visit each example only once
  • Can obtain worst case loss bound
  • Batch mode:
  • Visit each example multiple time an “re-project”
  • Can obtain asymptotic convergence
  • Generalization bounds
  • SOPOPO - SOft Projection Onto POlyhedra
slide-25
SLIDE 25

25

Empirical Evaluation

(Batch convergence)

slide-26
SLIDE 26

26

0.5 1 1.5 2 x 10

4

0.4 0.5 0.6 0.7 0.8 Perc PA Sop 0.5 1 1.5 2 x 10

4

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Perc PA Sop

Empirical Evaluation

(Online Error)

Letter (poly ker.) Letter (linear)

slide-27
SLIDE 27

27

Why Does it “Work” ?

(proof by picture)

Primal Objective Dual Objective Single Iterate If each > then

slide-28
SLIDE 28

28

Summary

  • General framework for online ranking
  • Specialized efficient algorithm - SOPOPO
  • Ranking: rich structure, interesting, useful, …
  • Primal-dual regret bound analysis
  • Based on:
  • “Efficient Learning of Label Ranking by Soft Projections onto Polyhedra”

Journal Of Machine Learning Research, Vol. 7, 2007

  • "A Primal-Dual Perspective of Online Learning Algorithms”

Machine Learning Journal, Vol. 69, 2007

  • Code: http://www.cs.huji.ac.il/~shais/code/sopopo.tgz
  • See also: http://www.magicbroom.info