New Regularized Algorithms for Transductive Learning Partha Pratim - PowerPoint PPT Presentation

New Regularized Algorithms for Transductive Learning Partha Pratim Talukdar University of Pennsylvania, USA Koby Crammer Technion, Israel 1

Graph-based Semi-Supervised Learning 0.2 0.2 0.1 0.3 0.3 0.2 0.2 2

Graph-based Semi-Supervised Learning 0.2 0.2 0.1 0.3 0.3 0.2 0.2 Labeled (seed) 2

Graph-based Semi-Supervised Learning 0.2 0.2 0.1 0.3 0.3 0.2 0.2 Unlabeled Labeled (seed) 2

Graph-based Semi-Supervised Learning 0.2 0.2 0.1 0.3 0.3 0.2 0.2 3

Graph-based Semi-Supervised Learning 0.2 0.2 0.1 0.3 0.3 0.2 0.2 Various methods: LP (Zhu et al., 2003); QC (Bengio et al., 2007); Adsorption (Baluja et al., 2008) 3

Adsorption Algorithm 4

Adsorption Algorithm • Successfully used in • YouTube Video Recommendation [Baluja et al., 2008] , Semantic Classification [Talukdar et al., 2008] 4

Adsorption Algorithm • Successfully used in • YouTube Video Recommendation [Baluja et al., 2008] , Semantic Classification [Talukdar et al., 2008] • It has not been analyzed so far • Is it optimizing an objective? If so, what? • Motivation for proposed work 4

Adsorption Algorithm [Baluja et al., WWW 2008] 0.3 0.3 0.2

Adsorption Algorithm [Baluja et al., WWW 2008] 0.3 0.3 Dummy Label 0.2

Characteristics of Adsorption 6

Characteristics of Adsorption • Highly scalable and iterative 6

Characteristics of Adsorption • Highly scalable and iterative • Main difference with previous methods: • all nodes are not equal: high-degree nodes are discounted 6

Characteristics of Adsorption • Highly scalable and iterative • Main difference with previous methods: • all nodes are not equal: high-degree nodes are discounted • Two equivalent views: 6

Characteristics of Adsorption • Highly scalable and iterative • Main difference with previous methods: • all nodes are not equal: high-degree nodes are discounted • Two equivalent views: Label Diffusion Random Walk U L L U 6

Random Walk View U V 7

Random Walk View what next? U V 7

Random Walk View what next? U V • Continue walk with prob. p cont v • Assign p inj V’s seed label to U with prob. v • Abandon random walk with prob. p abnd v • assign U a dummy label with score p abnd v 7

Discounting High-Degree Nodes

Discounting High-Degree Nodes • High-degree nodes can be unreliable • do not allow propagation/walk through them

Discounting High-Degree Nodes • High-degree nodes can be unreliable • do not allow propagation/walk through them • Solution: increase abandon probability on high- degree nodes

Discounting High-Degree Nodes • High-degree nodes can be unreliable • do not allow propagation/walk through them • Solution: increase abandon probability on high- degree nodes p abnd ∝ degree(v) v

Is Adsorption Optimizing an Objective?

Is Adsorption Optimizing an Objective? • Under certain assumptions, NO ! • Theorem in the paper

Is Adsorption Optimizing an Objective? • Under certain assumptions, NO ! • Theorem in the paper • Our Goal: • Retain Adsorption’s desirable properties • But do so using a well-defined optimization

Is Adsorption Optimizing an Objective? • Under certain assumptions, NO ! • Theorem in the paper • Our Goal: • Retain Adsorption’s desirable properties • But do so using a well-defined optimization • Proposed Solution: MAD (next slide)

Modified Adsorption (MAD) [This Paper] MAD Objective 10

Modified Adsorption (MAD) [This Paper] MAD Objective � � y vl ) 2 + µ 2 y vl ) 2 + µ 3 � � � � � � p inj y vl ) 2 min { ˆ v ( y vl − ˆ uv (ˆ y ul − ˆ ( r vl − ˆ µ 1 w y vl } v u v v l 10

Modified Adsorption (MAD) [This Paper] MAD Objective Smoothness Seed Label Loss Label Prior Loss (e.g. min + + Loss Across Edge (if any) prior on dummy label) 10

Modified Adsorption (MAD) [This Paper] MAD Objective Smoothness Seed Label Loss Label Prior Loss (e.g. min + + Loss Across Edge (if any) prior on dummy label) • High-degree node discounting is enforced through the third term 10

Modified Adsorption (MAD) [This Paper] MAD Objective Smoothness Seed Label Loss Label Prior Loss (e.g. min + + Loss Across Edge (if any) prior on dummy label) • High-degree node discounting is enforced through the third term • Results in an Adsorption-like iterative update, scalable 10

Extension to Dependent Labels • Labels are not always mutually exclusive. 11

Extension to Dependent Labels • Labels are not always mutually exclusive. White ScotchAle 0.8 1.0 0.95 1.0 TopFormentedBeer BrownAle Ale 1.0 0.8 PaleAle Porter 11

Extension to Dependent Labels • Labels are not always mutually exclusive. White ScotchAle 0.8 1.0 0.95 1.0 TopFormentedBeer BrownAle Ale 1.0 0.8 PaleAle Porter Label Similarity Labels Label Graph 11

MAD with Dependent Labels (MADDL) Label Prior Seed Label Edge min + + Loss (e.g. prior Loss Smoothness on dummy label) (if any) Loss 12

MAD with Dependent Labels (MADDL) Label Prior Seed Label Edge Dependent min + + Loss (e.g. prior + Loss Smoothness Label Loss on dummy label) (if any) Loss 12

MAD with Dependent Labels (MADDL) MADDL Objective Label Prior Seed Label Edge Dependent min + + Loss (e.g. prior + Loss Smoothness Label Loss on dummy label) (if any) Loss 12

MAD with Dependent Labels (MADDL) MADDL Objective Label Prior Seed Label Edge Dependent min + + Loss (e.g. prior + Loss Smoothness Label Loss on dummy label) (if any) Loss Penalize if similar labels are assigned different scores on a node 12

MAD with Dependent Labels (MADDL) MADDL Objective Label Prior Seed Label Edge Dependent min + + Loss (e.g. prior + Loss Smoothness Label Loss on dummy label) (if any) Loss Penalize if similar labels are assigned different scores on a node 1.0 BrownAle Ale 12

MAD with Dependent Labels (MADDL) MADDL Objective Label Prior Seed Label Edge Dependent min + + Loss (e.g. prior + Loss Smoothness Label Loss on dummy label) (if any) Loss Penalize if similar labels are assigned different scores on a node • MADDL objective results in a scalable iterative update, with convergence guarantee. 1.0 BrownAle Ale 12

Experimental Setup 13

Experimental Setup I. Classification Experiments • WebKB (4 classes) [Subramanya and Bilmes, 2008] • Sentiment Classification (4 classes) [Blitzer, Dredze and Pereira, 2007] • k-Nearest Neighbor Graph (k is tuned) 13

Experimental Setup I. Classification Experiments • WebKB (4 classes) [Subramanya and Bilmes, 2008] • Sentiment Classification (4 classes) [Blitzer, Dredze and Pereira, 2007] • k-Nearest Neighbor Graph (k is tuned) II. Smoother Sentiment Ranking with MADDL 13

(Zhu et al., 03) PRBEP (macro-averaged) on WebKB Dataset, 3148 test instances 14

(Zhu et al., 03) Precision on 3568 Sentiment test instances

II. Smooth Sentiment Ranking rank 1 rank 4 smooth predictions 16

II. Smooth Sentiment Ranking rank 1 rank 4 smooth non-smooth predictions predictions 16

II. Smooth Sentiment Ranking rank 1 Prefer over rank 4 smooth non-smooth predictions predictions 16

II. Smooth Sentiment Ranking rank 1 Prefer over rank 4 smooth non-smooth predictions predictions 1.0 MADDL Label 1.0 Constraints 16

II. Smooth Sentiment Ranking Count of Top Predicted Pair in MAD Output Count of Top Predicted Pair in MADDL Output 4000 10000 2000 5000 0 0 1 1 2 2 3 4 4 3 3 3 4 4 2 2 1 1 Label 2 Label 2 Label 1 Label 1

II. Smooth Sentiment Ranking Count of Top Predicted Pair in MAD Output Count of Top Predicted Pair in MADDL Output 4000 10000 2000 5000 0 0 1 1 2 2 3 4 4 3 3 3 4 4 2 2 1 1 Label 2 Label 2 Label 1 Label 1 MADDL generates smoother ranking, while preserving accuracy of prediction.

Conclusion 18

Conclusion • Presented Modified Adsorption (MAD) • an Adsorption-like algorithm but with well defined optimization. 18

Conclusion • Presented Modified Adsorption (MAD) • an Adsorption-like algorithm but with well defined optimization. • Extended MAD to MADDL • MADDL can handle non mutually-exclusive labels. 18

Conclusion • Presented Modified Adsorption (MAD) • an Adsorption-like algorithm but with well defined optimization. • Extended MAD to MADDL • MADDL can handle non mutually-exclusive labels. • Demonstrated effectiveness of MAD and MADDL on real world datasets. 18

Conclusion • Presented Modified Adsorption (MAD) • an Adsorption-like algorithm but with well defined optimization. • Extended MAD to MADDL • MADDL can handle non mutually-exclusive labels. • Demonstrated effectiveness of MAD and MADDL on real world datasets. • Future Work • Apply MADDL in other domains with dependent labels e.g. Information Extraction 18

Thanks! algorithm authors 19

New Regularized Algorithms for Transductive Learning Partha Pratim - PowerPoint PPT Presentation

New Regularized Algorithms for Transductive Learning Partha Pratim Talukdar University of Pennsylvania, USA Koby Crammer Technion, Israel 1 Graph-based Semi-Supervised Learning 0.2 0.2 0.1 0.3 0.3 0.2 0.2 2 Graph-based

An Analysis of Graph Cut Size for Transductive Learning Steve Hanneke Machine Learning

Bipartite Edge Prediction via Transductive Learning over Product Graphs Hanxiao Liu, Yiming Yang

L1-regularized Logistic Regression Stacking and Transductive CRF Smoothing for Action Recognition

Regularized generalized CCA (RGCCA) Arthur Tenenhaus (SUPELEC) Michel Tenenhaus (HEC Paris) 1

First-Order Algorithms for Approximate TV-Regularized Image Denoising Stephen Wright University

MULTILINGUAL DOCUMENT CLASSICATION VIA TRANSDUCTIVE LEARNING Salvatore Romeo UNICAL

6 Transductive Support Vector Machines Thorsten Joachims tj@cs.cornell.edu In contrast to

Transductive learning for statistical machine translation Nicola Ueffing 1 Gholamreza Haffari 2

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

Regularized Least Squares Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin

Regularized Linear Models in Stacked Generalization Sam Reid and Greg Grudic Department of

Regularized Least Squares Charlie Frogner 1 MIT 2012 1 Slides mostly stolen from Ryan Rifkin

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization

CSI5180. MachineLearningfor BioinformaticsApplications Regularized Linear Models by Marcel

Model Selection and Fast Rates for Regularized Least-Squares Andrea Caponnetto 1 Plan

CSCI 350 Ch. 4 Threads and Concurrency Mark Redekopp Michael Shindler & Ramesh Govindan

CS3000: Algorithms & Data Jonathan Ullman Lecture 17: More Applications of Network Flow

Lecture 11 Dijkstras Algorithm Sanjoy Dasgupta Russell Impagliazzo Ragesh Jaiswal

LEEP: A New Measure to Evaluate Transferability of Learned Representations Cuong V. Nguyen Tal

Using ANTLR In Cybersecurity Stuart Maclean Applied Physics Laboratory University of Washington

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Mark Lunt Centre for

Deep learning 5.7. Writing an autograd function Fran cois Fleuret https://fleuret.org/ee559/

MOBICEAL: TOWARDS SECURE AND PRACTICAL PLAUSIBLY DENIABLE ENCRYPTION ON MOBILE DEVICES Bing

New Regularized Algorithms for Transductive Learning Partha Pratim - PowerPoint PPT Presentation

New Regularized Algorithms for Transductive Learning Partha Pratim Talukdar University of Pennsylvania, USA Koby Crammer Technion, Israel 1 Graph-based Semi-Supervised Learning 0.2 0.2 0.1 0.3 0.3 0.2 0.2 2 Graph-based

An Analysis of Graph Cut Size for Transductive Learning Steve Hanneke Machine Learning

Bipartite Edge Prediction via Transductive Learning over Product Graphs Hanxiao Liu, Yiming Yang

L1-regularized Logistic Regression Stacking and Transductive CRF Smoothing for Action Recognition

Regularized generalized CCA (RGCCA) Arthur Tenenhaus (SUPELEC) Michel Tenenhaus (HEC Paris) 1

First-Order Algorithms for Approximate TV-Regularized Image Denoising Stephen Wright University

MULTILINGUAL DOCUMENT CLASSICATION VIA TRANSDUCTIVE LEARNING Salvatore Romeo UNICAL

6 Transductive Support Vector Machines Thorsten Joachims tj@cs.cornell.edu In contrast to

Transductive learning for statistical machine translation Nicola Ueffing 1 Gholamreza Haffari 2

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

Regularized Least Squares Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin

Regularized Linear Models in Stacked Generalization Sam Reid and Greg Grudic Department of

Regularized Least Squares Charlie Frogner 1 MIT 2012 1 Slides mostly stolen from Ryan Rifkin

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization

CSI5180. MachineLearningfor BioinformaticsApplications Regularized Linear Models by Marcel

Model Selection and Fast Rates for Regularized Least-Squares Andrea Caponnetto 1 Plan

CSCI 350 Ch. 4 Threads and Concurrency Mark Redekopp Michael Shindler &amp; Ramesh Govindan

CS3000: Algorithms &amp; Data Jonathan Ullman Lecture 17: More Applications of Network Flow

Lecture 11 Dijkstras Algorithm Sanjoy Dasgupta Russell Impagliazzo Ragesh Jaiswal

LEEP: A New Measure to Evaluate Transferability of Learned Representations Cuong V. Nguyen Tal

Using ANTLR In Cybersecurity Stuart Maclean Applied Physics Laboratory University of Washington

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Mark Lunt Centre for

Deep learning 5.7. Writing an autograd function Fran cois Fleuret https://fleuret.org/ee559/

MOBICEAL: TOWARDS SECURE AND PRACTICAL PLAUSIBLY DENIABLE ENCRYPTION ON MOBILE DEVICES Bing

CSCI 350 Ch. 4 Threads and Concurrency Mark Redekopp Michael Shindler & Ramesh Govindan

CS3000: Algorithms & Data Jonathan Ullman Lecture 17: More Applications of Network Flow