 
              Efficient Learning of Topic Ranking by Soft-projections onto Polyhedra Yoram Singer Part of the work was performed at The Hebrew University, Jerusalem, Israel AIM Workshop on the Mathematics of Ranking, Aug. 19, 2010 1
Acknowledgements • Koby Crammer, Technion Initial framework & numerous algorithms for online ranking using dual decomposition • Shai Shalev-Shwartz, Hebrew U. Specialized efficient ranking algorithm, Regret bound analysis using primal-dual method 2
This Workshop (thus far & today) • Overviews of structure of ranking problems • Foundations from economics to social choice • Probabilistic & statistical analysis of orderings • Models for expressing & generating orderings • Page rank, graph-based methods, Hodge theory • Overview of machine learning 4 ranking (MLR) • Learning theoretic analysis of MLR (to follow) 3
This Talk • Specific ranking problem setting • Loss minimization framework • An efficient learning algorithm • Brief experimental discussion • High-level overview of formal analysis 4
Courtesy of K. Crammer Topic Ranking Document Desired Ordering ECONOMICS The higher minimum wage signed CORPORATE / INDUSTRIAL REGULATION / POLICY into law… will be welcome relief MARKETS for millions of workers … . The 90- LABOUR cent-an-hour increase … . GOVERNMENT / SOCIAL LEGAL/JUDICIAL REGULATION/POLICY SHARE LISTINGS PERFORMANCE Relevant topics ACCOUNTS/EARNINGS COMMENT / FORECASTS REGULATION / POLICY MARKETS SHARE CAPITAL BONDS / DEBT ISSUES CORPORATE / INDUSTRIAL LABOUR LOANS / CREDITS STRATEGY / PLANS GOVERNMENT / SOCIAL ECONOMICS INSOLVENCY / LIQUIDITY Multi-label , Bipartite feedback 5
Topic Ranking: Setting • Instances - vectors in (documents, images, speech signals, …) • Predefined set of labels (topics, categories, phonemes) • Target ranking of labels: label r preferred over s ( ) iff • Ranking functions 6
Special Cases • Binary classification: • Multiclass categorization: • Multiclass multilabel: 7
Preference-Based View 3 3 3 2 2 2 4 1 1 1 1 5 1 8
Quality of Predicted Ranking • Loss for a pair of labels s.t. • Measures whether we predicted the order of a pair correctly, and with sufficient confidence 9
Quality of Predicted Ranking (cont.) • Loss of f on a subset E: • Loss of f on an entire example predefined subset weights We will assume that each defines a bipartite graph 10
Linear Ranking Functions • Linear predictors • “Complexity” of a predictor • Complexity of a function set • Can be used with Mercer kernels 11
Loss Minimization & Regularization Empirical Risk of Ranker Complexity of Ranker 12
Loss  Pair-wise Constraints Loss: • Each pair of (comparable) labels corresponds to a margin constraint • Slack variables distinguish between different edge-sets • Focus on a single instance & a single edge-set • Make use of the fact that 13
A Reduced Problem • Current estimate of ranking functions • “New” example • Update ranking function • Framework for online learning • Iterative procedure for batch optimization 14
Solving the Reduced Problem • Direct use of Lagrange multipliers & strong duality leads to variables since each constraint is associated with a variable. 15
Reducing the Reduced… • Introduce new |A|+|B|=k variables A 2 1 B 1 2 3 16
Reduced & Compacted Dual • More compact dual problem: • Feasible  Feasible • Feasible  Feasible 17
Obtaining (Primal) Solution • The new set of ranking vectors: • But, we still need to find 18
“Decoupling” the Dual • Compact Dual • Suppose we were given • We can solve two independent problems 19
Solving the Decoupled Problems • We need to solve • Optimal solution takes the form • Can be found in linear time using an improved & generalized (CS’99, SS’06, DSSC’09) projection technique by Bretsekas 20
Finding C* • The sets define |A|+|B|=k “knots” • Function is piecewise quadratic • Minimum is unique • Can be computed in O(k) Value of Dual as function of C* 21
“Coupled” Again • Locate the global optimum • Check whether C>C* Value of Dual 22
Recapping 1. Focus on a single example 2. Find a compact form of the dual 3. Decouple the reduced & compact dual 4. Compute “knots” for decoupled problems 5. Find the optimal value of 6. Once C* is known we use soft projection to find α and β 7. Once α and β are known we find w from u 23
Back to Multiple Examples • Cycle through the examples: • Online mode: • Visit each example only once • Can obtain worst case loss bound • Batch mode: • Visit each example multiple time an “re-project” • Can obtain asymptotic convergence • Generalization bounds • SOPOPO - SOft Projection Onto POlyhedra 24
Empirical Evaluation (Batch convergence) 25
Empirical Evaluation (Online Error) 0.8 Perc PA 0.7 Sop 0.6 0.5 Letter (poly ker.) 0.8 0.4 Perc 0.7 PA Sop 0 0.5 1 1.5 2 0.6 4 Letter (linear) x 10 0.5 0.4 0.3 0.2 0.1 0 0.5 1 1.5 2 26 4 x 10
Why Does it “Work” ? (proof by picture) Single Iterate Primal Objective If each > then Dual Objective 27
Summary • General framework for online ranking • Specialized efficient algorithm - SOPOPO • Ranking: rich structure, interesting, useful, … • Primal-dual regret bound analysis • Based on: “Efficient Learning of Label Ranking by Soft Projections onto Polyhedra” • Journal Of Machine Learning Research, Vol. 7, 2007 "A Primal-Dual Perspective of Online Learning Algorithms” • Machine Learning Journal, Vol. 69, 2007 • Code: http://www.cs.huji.ac.il/~shais/code/sopopo.tgz • See also: http://www.magicbroom.info 28
Recommend
More recommend