a framework for learnig predictive structures from
play

A Framework for Learnig Predictive Structures from Multiple Tasks - PowerPoint PPT Presentation

A Framework for Learnig Predictive Structures from Multiple Tasks and Unlabeled Data Rie Kubota Ando and Tong Zhang IBM Watson Research Center Yahoo Research Nov. 20th, 2006 Lei Tang Framework for Structural Learning 1 Introduction 2


  1. A Framework for Learnig Predictive Structures from Multiple Tasks and Unlabeled Data Rie Kubota Ando and Tong Zhang IBM Watson Research Center Yahoo Research Nov. 20th, 2006 Lei Tang Framework for Structural Learning

  2. 1 Introduction 2 Structural Learning Problem 3 Algorithm 4 Experiments Lei Tang Framework for Structural Learning

  3. Semi-supervised Learning Large amount of unlabeled data, while labeled data are very costly Various methods: transductive inference, co-training (basically label propagation), fails when noise is introduced into classification through non-perfect classification. Another direction: define a good functional structures using unlabeled data. (what is a structure? distance, kernel, manifold) But a graph structure might not be predictive. Can we learn a predictive structure? Yes, if we have multiple related tasks. Lei Tang Framework for Structural Learning

  4. Semi-supervised Learning Large amount of unlabeled data, while labeled data are very costly Various methods: transductive inference, co-training (basically label propagation), fails when noise is introduced into classification through non-perfect classification. Another direction: define a good functional structures using unlabeled data. (what is a structure? distance, kernel, manifold) But a graph structure might not be predictive. Can we learn a predictive structure? Yes, if we have multiple related tasks. Lei Tang Framework for Structural Learning

  5. Semi-supervised Learning Large amount of unlabeled data, while labeled data are very costly Various methods: transductive inference, co-training (basically label propagation), fails when noise is introduced into classification through non-perfect classification. Another direction: define a good functional structures using unlabeled data. (what is a structure? distance, kernel, manifold) But a graph structure might not be predictive. Can we learn a predictive structure? Yes, if we have multiple related tasks. Lei Tang Framework for Structural Learning

  6. Semi-supervised Learning Large amount of unlabeled data, while labeled data are very costly Various methods: transductive inference, co-training (basically label propagation), fails when noise is introduced into classification through non-perfect classification. Another direction: define a good functional structures using unlabeled data. (what is a structure? distance, kernel, manifold) But a graph structure might not be predictive. Can we learn a predictive structure? Yes, if we have multiple related tasks. Lei Tang Framework for Structural Learning

  7. Semi-supervised Learning Large amount of unlabeled data, while labeled data are very costly Various methods: transductive inference, co-training (basically label propagation), fails when noise is introduced into classification through non-perfect classification. Another direction: define a good functional structures using unlabeled data. (what is a structure? distance, kernel, manifold) But a graph structure might not be predictive. Can we learn a predictive structure? Yes, if we have multiple related tasks. Lei Tang Framework for Structural Learning

  8. Learning Predictive Structures 1 Structural learning from multiple tasks 2 Use unlabeled data to generate auxiliary(related) tasks. Lei Tang Framework for Structural Learning

  9. A toy example The intrinsic distance metric should force A , C , D “close” to each other, and F and E to each other. Lei Tang Framework for Structural Learning

  10. Connection to Hypothesis Space Supervised Learning Find a predictor in the hypothesis space. Estimation error: The smaller the space is, the easier to learn a best predictor given limited samples. Approximation error: caused by a restricted size of hypothesis Need a trade-off of these two types of errors (model selection) Model Selection Cross validation Can achieve better result if we have multiple problems on the same underlying domain. Lei Tang Framework for Structural Learning

  11. Connection to Hypothesis Space Supervised Learning Find a predictor in the hypothesis space. Estimation error: The smaller the space is, the easier to learn a best predictor given limited samples. Approximation error: caused by a restricted size of hypothesis Need a trade-off of these two types of errors (model selection) Model Selection Cross validation Can achieve better result if we have multiple problems on the same underlying domain. Lei Tang Framework for Structural Learning

  12. Empirical Risk Minimization(ERM) Supervised Learning Find a predictor f such that R ( f ) = E X , Y L ( f ( X ) , Y )) Empirically, we use the loss on training data as an indicator. n � ˆ f = arg min L ( f ( X i ) , Y i ) f ∈H i =1 To avoid over-fitting, usually some regularization term is added n � ˆ f = arg min L ( f ( X i ) , Y i ) + g ( f ) f ∈H ���� i =1 Regularization term Lei Tang Framework for Structural Learning

  13. Joint Empirical Risk Minimization In STL, the hypothesis space (bias) is fixed. n � ˆ f = arg min L ( f ( X i ) , Y i ) + g ( f ) f ∈H i =1 Use parameter θ to represent the hypothesis space, then n � ˆ f θ = arg min L ( f ( X i ) , Y i ) + g ( f ) f ∈H ( θ ) i =1 For multiple related tasks, we want to find the hypothesis shared by all these tasks. (To determine a proper θ )  � � m n l g ( f l ( θ )) + 1 � � [ˆ f l , ˆ L ( f l ( θ ) , X l i , Y l   θ ] = arg min r ( θ ) + i )   n l f l ,θ ���� l =1 l =1 regularization Lei Tang Framework for Structural Learning

  14. Structural Learning with Linear Predictors f ( x ) = w T · + v T · φ ( x ) ψ θ ( x ) ���� � �� � task specific features internal dimensions How to represent θ ? A matrix(can be considered as a transformation matrix to find new dimensions) f θ ( w , v ; x ) = w T φ ( x ) + v T θψ ( x ) Lei Tang Framework for Structural Learning

  15. Structural Learning with Linear Predictors f ( x ) = w T · + v T · φ ( x ) ψ θ ( x ) ���� � �� � task specific features internal dimensions How to represent θ ? A matrix(can be considered as a transformation matrix to find new dimensions) f θ ( w , v ; x ) = w T φ ( x ) + v T θψ ( x ) Lei Tang Framework for Structural Learning

  16. Alternating structure optimization(1) Assume φ ( x ) = ψ ( x ) = x , it follows that v l } , ˆ [ { ˆ w l , ˆ θ ] = � � m n l 1 � � L (( w l + θ T v l ) T X l i , Y l i ) + λ l || w l || 2 arg min 2 n l { w l , v l } ,θ l =1 i =1 θθ T = I s . t . � �� � equivalent to regularization Let u = w + v θ T , then f ( x ) = u T x . � � min � m � n l 1 i ) + λ l || u l − θ T v l || 2 i =1 L ( u T l X l i , Y l l =1 2 n l θθ T = I s . t . Lei Tang Framework for Structural Learning

  17. Alternating structure optimization(1) Assume φ ( x ) = ψ ( x ) = x , it follows that v l } , ˆ [ { ˆ w l , ˆ θ ] = � � m n l 1 � � L (( w l + θ T v l ) T X l i , Y l i ) + λ l || w l || 2 arg min 2 n l { w l , v l } ,θ l =1 i =1 θθ T = I s . t . � �� � equivalent to regularization Let u = w + v θ T , then f ( x ) = u T x . � � min � m � n l 1 i ) + λ l || u l − θ T v l || 2 i =1 L ( u T l X l i , Y l l =1 2 n l θθ T = I s . t . Lei Tang Framework for Structural Learning

  18. Alternating structure optimization (2) Algorithm 1 Fix ( θ, v ), optimize with respect to u (a convex optimization problem) 2 Fix u , optimize with respect to ( θ, v ). It turns out θ are the top left eigenvectors for the SVD of a matrix � � � U = [ λ 2 u 2 , · · · , λ m u m ] λ 1 u 1 , 3 Iterate until convergence. 4 Usually one iteration is enough. Connection to PCA PCA find the “principal components” of data points. u l is actually the predictor for task l . It is finding the “principal components” of the predictors. Each predictor is considered a point in the predictor space. Lei Tang Framework for Structural Learning

  19. Alternating structure optimization (2) Algorithm 1 Fix ( θ, v ), optimize with respect to u (a convex optimization problem) 2 Fix u , optimize with respect to ( θ, v ). It turns out θ are the top left eigenvectors for the SVD of a matrix � � � U = [ λ 2 u 2 , · · · , λ m u m ] λ 1 u 1 , 3 Iterate until convergence. 4 Usually one iteration is enough. Connection to PCA PCA find the “principal components” of data points. u l is actually the predictor for task l . It is finding the “principal components” of the predictors. Each predictor is considered a point in the predictor space. Lei Tang Framework for Structural Learning

  20. Semi-supervised learning 1 Learn structure parameter θ by joint empirical risk minimization. 2 Learn a predictor based on θ How to generate auxiliary problems? Automatic labeling. Relevancy. Two strategies: Unsupervised Semi-supervised Lei Tang Framework for Structural Learning

  21. Semi-supervised learning 1 Learn structure parameter θ by joint empirical risk minimization. 2 Learn a predictor based on θ How to generate auxiliary problems? Automatic labeling. Relevancy. Two strategies: Unsupervised Semi-supervised Lei Tang Framework for Structural Learning

  22. Semi-supervised learning 1 Learn structure parameter θ by joint empirical risk minimization. 2 Learn a predictor based on θ How to generate auxiliary problems? Automatic labeling. Relevancy. Two strategies: Unsupervised Semi-supervised Lei Tang Framework for Structural Learning

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend