Recursive Regularization for Large-scale Classification with - PowerPoint PPT Presentation

Motivation Related Work Proposed Model Optimization Experiments Recursive Regularization for Large-scale Classification with Hierarchical and Graphical Dependencies Siddharth Gopal Yiming Yang Carnegie Mellon Univeristy 12th Aug 2013 Siddharth Gopal, Yiming Yang Recursive Regularization for Large-scale Classification with Hierarchical

Motivation Related Work Proposed Model Optimization Experiments Outline of the Talk Motivation Related work Proposed model and Optimization Experiments Siddharth Gopal, Yiming Yang Recursive Regularization for Large-scale Classification with Hierarchical

Motivation Related Work Proposed Model Optimization Experiments Motivation Big data era - easy access to lots of structured data. Hierarchies and graphs provide a natural way to organize data. For example 1 Open Directory Project - A collection of Billions of webpages into a hierarchy with ∼ 300,000 classes. 2 International Patent Taxonomy - Millions of patents across the world follow this hierarchy. 3 Wikipedia pages - Millions of wikipedia pages have associated categories which are linked to each other. Siddharth Gopal, Yiming Yang Recursive Regularization for Large-scale Classification with Hierarchical

Motivation Related Work Proposed Model Optimization Experiments Challenges Assign an unseen webpage/patent/article to one or more nodes in the hierarchy or graph. Siddharth Gopal, Yiming Yang Recursive Regularization for Large-scale Classification with Hierarchical

Motivation Related Work Proposed Model Optimization Experiments Challenges Assign an unseen webpage/patent/article to one or more nodes in the hierarchy or graph. How to use the inter-class dependencies to improve classification ? A webpage that belongs to the class ‘ medicine ’ in unlikely to also belong to ‘ mutual funds ’. Siddharth Gopal, Yiming Yang Recursive Regularization for Large-scale Classification with Hierarchical

Motivation Related Work Proposed Model Optimization Experiments Challenges Assign an unseen webpage/patent/article to one or more nodes in the hierarchy or graph. How to use the inter-class dependencies to improve classification ? A webpage that belongs to the class ‘ medicine ’ in unlikely to also belong to ‘ mutual funds ’. How to scale to large number of classes ? Siddharth Gopal, Yiming Yang Recursive Regularization for Large-scale Classification with Hierarchical

Motivation Related Work Proposed Model Optimization Experiments Scalability Some existing datasets Dataset #Instances #Labels #Features #Parameters ODP subset 394,756 27,875 594,158 16,562,154,250 Wikipedia subset 2,365,436 325,056 1,617,899 525,907,777,344 Siddharth Gopal, Yiming Yang Recursive Regularization for Large-scale Classification with Hierarchical

Motivation Related Work Proposed Model Optimization Experiments Scalability Some existing datasets Dataset #Instances #Labels #Features #Parameters ODP subset 394,756 27,875 594,158 16,562,154,250 Wikipedia subset 2,365,436 325,056 1,617,899 525,907,777,344 ODP subset ∼ 66 GB of parameters Wikipedia subsets ∼ 2 TB of parameters Siddharth Gopal, Yiming Yang Recursive Regularization for Large-scale Classification with Hierarchical

Motivation Related Work Proposed Model Optimization Experiments Scalability Some existing datasets Dataset #Instances #Labels #Features #Parameters ODP subset 394,756 27,875 594,158 16,562,154,250 Wikipedia subset 2,365,436 325,056 1,617,899 525,907,777,344 ODP subset ∼ 66 GB of parameters Wikipedia subsets ∼ 2 TB of parameters Focus 1 How to use interclass dependencies ? 2 How to scale ? Siddharth Gopal, Yiming Yang Recursive Regularization for Large-scale Classification with Hierarchical

Motivation Related Work Proposed Model Optimization Experiments Related Work Earlier works Top-down pachinko machine style approaches [Dumais and Chen, 2000], [Yang et al., 2003] [Liu et al., 2005], [Koller and Sahami, 1997] Large-margin methods 1 Maximize the margin between correct and incorrect labels based on a hierarchical loss. 2 Discriminant functions takes contribution from all nodes along the path to root-node. [Tsochantaridis et al., 2006], [Cai and Hofmann, 2004], [Rousu et al., 2006], [Dekel et al., 2004], [Cesa-Bianchi et al., 2006] Bayesian methods Hierarchical Naive Bayes [McCallum et al., 1998] , Correlated Multinomial Logit [Shahbaba and Neal, 2007] , Hierarchical Bayesian logistic regression [Gopal et al., 2012] Siddharth Gopal, Yiming Yang Recursive Regularization for Large-scale Classification with Hierarchical

Motivation Related Work Proposed Model Optimization Experiments Notations Given training examples and hierarchy 1 Hierarchy of nodes N defined by parent function π ( n ). 2 N training examples, x i denote i th instance y in denotes whether x i is labeled to node n . 3 T denotes set of leaf nodes. 4 C n denotes the set of child-nodes of node n . Siddharth Gopal, Yiming Yang Recursive Regularization for Large-scale Classification with Hierarchical

Motivation Related Work Proposed Model Optimization Experiments Proposed model Learn a prediction function with parameters W . Estimate W as W λ ( W ) + C × R emp arg min Each node n is associated with parameter vector w n . Siddharth Gopal, Yiming Yang Recursive Regularization for Large-scale Classification with Hierarchical

Motivation Related Work Proposed Model Optimization Experiments Proposed model Define R emp as the empirical loss using loss function L at the leaf-nodes. N � � L ( w ⊤ R emp = n x i , y in ) i =1 n ∈T Siddharth Gopal, Yiming Yang Recursive Regularization for Large-scale Classification with Hierarchical

Motivation Related Work Proposed Model Optimization Experiments Proposed model Define R emp as the empirical loss using loss function L at the leaf-nodes. N � � L ( w ⊤ R emp = n x i , y in ) i =1 n ∈T Incorporate the hierarchy into regularization term λ ( W ) � � w n − w π ( n ) � 2 λ ( W ) = n ∈N Siddharth Gopal, Yiming Yang Recursive Regularization for Large-scale Classification with Hierarchical

Motivation Related Work Proposed Model Optimization Experiments Proposed model Define R emp as the empirical loss using loss function L at the leaf-nodes. N � � L ( w ⊤ R emp = n x i , y in ) i =1 n ∈T Incorporate the hierarchy into regularization term λ ( W ) � � w n − w π ( n ) � 2 λ ( W ) = n ∈N With a graph with edges E ⊂ { ( i , j ) : i , j ∈ N} , � � w i − w j � 2 λ ( W ) = ( i , j ) ∈ E Siddharth Gopal, Yiming Yang Recursive Regularization for Large-scale Classification with Hierarchical

Motivation Related Work Proposed Model Optimization Experiments Advantages Advantages over other works 1 Structure not used in the Empirical Risk term. 2 Multiple independent problems that can be parallelized. 3 Flexibility in choosing a loss function. Siddharth Gopal, Yiming Yang Recursive Regularization for Large-scale Classification with Hierarchical

Motivation Related Work Proposed Model Optimization Experiments Advantages Advantages over other works 1 Structure not used in the Empirical Risk term. 2 Multiple independent problems that can be parallelized. 3 Flexibility in choosing a loss function. N 1 2 || w n − w π ( n ) || 2 + C � � � (1 − y in w ⊤ [HR-SVM] min n x i ) + W n ∈N n ∈T i =1 N 1 2 || w n − w π ( n ) || 2 + C � � � log(1 + exp( − y in w ⊤ [HR-LR] min n x i )) W n ∈N n ∈T i =1 Siddharth Gopal, Yiming Yang Recursive Regularization for Large-scale Classification with Hierarchical

Motivation Related Work Proposed Model Optimization Experiments Optimizing with Hinge-loss N 1 2 || w n − w π ( n ) || 2 + C � � � (1 − y in w ⊤ [HR-SVM] min n x i ) + W n ∈N n ∈T i =1 Problems Large-number of parameters (2 Terabytes) Non-differentiability of Hinge-loss Siddharth Gopal, Yiming Yang Recursive Regularization for Large-scale Classification with Hierarchical

Motivation Related Work Proposed Model Optimization Experiments Optimizing with Hinge-loss N 1 2 || w n − w π ( n ) || 2 + C � � � (1 − y in w ⊤ [HR-SVM] min n x i ) + W n ∈N n ∈T i =1 Problems Large-number of parameters (2 Terabytes) Non-differentiability of Hinge-loss Solution Block-coordinate descent to handle large number of parameters (update one w n at a time). Solve dual problem within block for non-differentiability. Siddharth Gopal, Yiming Yang Recursive Regularization for Large-scale Classification with Hierarchical

Motivation Related Work Proposed Model Optimization Experiments Optimizing HR-SVM Update for non-leaf node w n , Siddharth Gopal, Yiming Yang Recursive Regularization for Large-scale Classification with Hierarchical

Motivation Related Work Proposed Model Optimization Experiments Optimizing HR-SVM Update for non-leaf node w n ,   1 � w n =  w π ( n ) + w c  | C n | + 1 c ∈ C n Siddharth Gopal, Yiming Yang Recursive Regularization for Large-scale Classification with Hierarchical

Motivation Related Work Proposed Model Optimization Experiments Optimizing HR-SVM Update for non-leaf node w n ,   1 � w n =  w π ( n ) + w c  | C n | + 1 c ∈ C n For leaf-node, the objective is N 1 2 || w n − w π ( n ) || 2 + C � (1 − y in w ⊤ min n x i ) + w n i =1 Siddharth Gopal, Yiming Yang Recursive Regularization for Large-scale Classification with Hierarchical

Recursive Regularization for Large-scale Classification with - PowerPoint PPT Presentation

Motivation Related Work Proposed Model Optimization Experiments Recursive Regularization for Large-scale Classification with Hierarchical and Graphical Dependencies Siddharth Gopal Yiming Yang Carnegie Mellon Univeristy 12th Aug 2013

61A Lecture 6 Announcements Recursive Functions Recursive Functions 4 Recursive Functions

Recursive Methods Noter ch.2 Recursive Methods Recursive problem solution Problems

Recursion Announcements Recursive Functions Recursive Functions 4 Recursive Functions

Lesson 9 Recursive Types 2/19, 21 Chapters 20, 21 Recursive type Recursive type terms are

Recursive Methods Recursive problem solution Problems that are naturally solved by

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

Regularization Overview Regularization Overview Problems & Multicollinearity We will

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Regularization Regularization is a general approach to add a complexity parameter to a

Assessing the Stability of Forecasting Models: Recursive Parameter Estimation and Recursive

Non-Recursive In-Place FFT Algorithm Idea: "Unwind the in-place recursive algorithm and work

Recursion Announcements Recursive Functions Recursive Functions Definition : A function is

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

OUTLINE CHAPTER 10 Recursive Hierarchies Table of contents Recursive Hierarchies and Bridges

Review Recursion Factorial (Iterative and Recursive versions) Call Stack (Last-in,

Python: Recursive Functions Recursive Functions Recall factorial function: Iterative Algorithm

The Problem MDMDP(V): Multi-Dimensional Mechanism Design Problem for class V Revenue

r

Trail Bound Techniques in Primitives with Weak Alignment Silvia Mella 1 based on a joint work

3.3 Index Access Scheduling Given: index scans over m lists L i (i=1..m), with current

General Game Playing in AI Research and Education Michael Thielscher GGP in AI Research &

Survey results slow/ fast/ easy/ difficult/ just short long right (-1) (+1) average (0)

LibreOffice: Code Structure By Miklos Vajna Senior Software Engineer at Collabora Productivity

Algorithmic Coalitional Game Theory Lecture 11: Coalition Structure Generation Oskar Skibski

Recursive Regularization for Large-scale Classification with - PowerPoint PPT Presentation

Motivation Related Work Proposed Model Optimization Experiments Recursive Regularization for Large-scale Classification with Hierarchical and Graphical Dependencies Siddharth Gopal Yiming Yang Carnegie Mellon Univeristy 12th Aug 2013

61A Lecture 6 Announcements Recursive Functions Recursive Functions 4 Recursive Functions

Recursive Methods Noter ch.2 Recursive Methods Recursive problem solution Problems

Recursion Announcements Recursive Functions Recursive Functions 4 Recursive Functions

Lesson 9 Recursive Types 2/19, 21 Chapters 20, 21 Recursive type Recursive type terms are

Recursive Methods Recursive problem solution Problems that are naturally solved by

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

Regularization Overview Regularization Overview Problems &amp; Multicollinearity We will

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Regularization Regularization is a general approach to add a complexity parameter to a

Assessing the Stability of Forecasting Models: Recursive Parameter Estimation and Recursive

Non-Recursive In-Place FFT Algorithm Idea: &quot;Unwind the in-place recursive algorithm and work

Recursion Announcements Recursive Functions Recursive Functions Definition : A function is

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

OUTLINE CHAPTER 10 Recursive Hierarchies Table of contents Recursive Hierarchies and Bridges

Review Recursion Factorial (Iterative and Recursive versions) Call Stack (Last-in,

Python: Recursive Functions Recursive Functions Recall factorial function: Iterative Algorithm

The Problem MDMDP(V): Multi-Dimensional Mechanism Design Problem for class V Revenue

r

Trail Bound Techniques in Primitives with Weak Alignment Silvia Mella 1 based on a joint work

3.3 Index Access Scheduling Given: index scans over m lists L i (i=1..m), with current

General Game Playing in AI Research and Education Michael Thielscher GGP in AI Research &amp;

Survey results slow/ fast/ easy/ difficult/ just short long right (-1) (+1) average (0)

LibreOffice: Code Structure By Miklos Vajna Senior Software Engineer at Collabora Productivity

Algorithmic Coalitional Game Theory Lecture 11: Coalition Structure Generation Oskar Skibski

Regularization Overview Regularization Overview Problems & Multicollinearity We will

Non-Recursive In-Place FFT Algorithm Idea: "Unwind the in-place recursive algorithm and work

General Game Playing in AI Research and Education Michael Thielscher GGP in AI Research &