RegML 2020 Class 7 Dictionary learning Lorenzo Rosasco - PowerPoint PPT Presentation

RegML 2020 Class 7 Dictionary learning Lorenzo Rosasco UNIGE-MIT-IIT

Data representation A mapping of data in new format better suited for further processing Data Representation L.Rosasco, RegML 2020

Data representation (cont.) X data-space, a data representation is a map Φ : X → F , to a representation space F . Different names in different fields: ◮ machine learning : feature map ◮ signal processing : analysis operator/transform ◮ information theory : encoder ◮ computational geometry : embedding L.Rosasco, RegML 2020

Outline Part II: Data representation by learning Dictionary learning Metric learning L.Rosasco, RegML 2020

Supervised or Unsupervised? Supervised (labelled/annotated) data are expensive! Ideally a good data representation should reduce the need of (human) annotation. . . � Unsupervised learning of Φ L.Rosasco, RegML 2020

Unsupervised representation learning Samples S = { x 1 , . . . , x n } from a distribution ρ on the input space X are available. What are the principles to learn ”good” representation in an unsupervised fashion? L.Rosasco, RegML 2020

Unsupervised representation learning principles Two main concepts 1. Reconstruction , there exists a map Ψ : F → X such that Ψ ◦ Φ( x ) ∼ x, ∀ x ∈ X 2. Similarity preservation , it holds Φ( x ) ∼ Φ( x ′ ) ⇔ x ∼ x ′ , ∀ x ∈ X L.Rosasco, RegML 2020

Unsupervised representation learning principles Two main concepts 1. Reconstruction , there exists a map Ψ : F → X such that Ψ ◦ Φ( x ) ∼ x, ∀ x ∈ X 2. Similarity preservation , it holds Φ( x ) ∼ Φ( x ′ ) ⇔ x ∼ x ′ , ∀ x ∈ X Most unsupervised work has focused on reconstruction rather than on similarity �� We give an overview next L.Rosasco, RegML 2020

Reconstruction based data representation Basic idea : the quality of a representation Φ is measured by the reconstruction error provided by an associated reconstruction Ψ � x − Ψ ◦ Φ( x ) � , L.Rosasco, RegML 2020

Empirical data and population Given S = { x 1 , . . . , x n } minimize the empirical reconstruction error n � E (Φ , Ψ) = 1 � x i − Ψ ◦ Φ( x i ) � 2 , � n i =1 L.Rosasco, RegML 2020

Empirical data and population Given S = { x 1 , . . . , x n } minimize the empirical reconstruction error n � E (Φ , Ψ) = 1 � x i − Ψ ◦ Φ( x i ) � 2 , � n i =1 as a proxy to the expected reconstruction error � dρ ( x ) � x − Ψ ◦ Φ( x ) � 2 , E (Φ , Ψ) = where ρ is the data distribution (fixed but uknown). L.Rosasco, RegML 2020

Empirical data and population � dρ ( x ) � x − Ψ ◦ Φ( x ) � 2 , min Φ , Ψ E (Φ , Ψ) , E (Φ , Ψ) = Caveat. . . But reconstruction alone is not enough ... copying data, i.e. Ψ ◦ Φ = I , gives zero reconstruction error! L.Rosasco, RegML 2020

Dictionary learning � x − Ψ ◦ Φ( x ) � Let X = R d , F = R p 1. linear reconstruction Ψ ∈ D , with D a subset of the space of linear maps from X to F . L.Rosasco, RegML 2020

Dictionary learning � x − Ψ ◦ Φ( x ) � Let X = R d , F = R p 1. linear reconstruction Ψ ∈ D , with D a subset of the space of linear maps from X to F . 2. nearest neighbor representation , � x − Ψ β � 2 , Φ( x ) = Φ Ψ ( x ) = arg min Ψ ∈ D , β ∈F λ where F λ is a subset of F . L.Rosasco, RegML 2020

Linear reconstruction and dictionaries Each reconstruction Ψ ∈ D can be identified a dictionary matrix with columns a 1 , . . . , a p ∈ R d . L.Rosasco, RegML 2020

Linear reconstruction and dictionaries Each reconstruction Ψ ∈ D can be identified a dictionary matrix with columns a 1 , . . . , a p ∈ R d . The reconstruction of an input x ∈ X corresponds to a suitable linear expansion on the dictionary p � x = a j β j , β 1 , . . . , β p ∈ R . j =1 L.Rosasco, RegML 2020

Nearest neighbor representation � x − Ψ β � 2 , Φ( x ) = Φ Ψ ( x ) = arg min Ψ ∈ D , β ∈F λ The above representation is called nearest neighbor (NN) since, for Ψ ∈ D , X λ = Ψ F λ , the representation Φ( x ) provides the closest point to x in X λ , x ′ ∈X λ � x − x ′ � 2 = min β ∈F λ � x − Ψ β � 2 . d ( x, X λ ) = min L.Rosasco, RegML 2020

Nearest neighbor representation (cont.) NN representation are defined by a constrained inverse problem , β ∈F λ � x − Ψ β � 2 . min L.Rosasco, RegML 2020

Nearest neighbor representation (cont.) NN representation are defined by a constrained inverse problem , β ∈F λ � x − Ψ β � 2 . min Alternatively let F λ = F and adding a regularization term R λ : F → R � � � x − Ψ β � 2 + R λ ( β ) min . β ∈F L.Rosasco, RegML 2020

Dictionary learning Then n � 1 � x i − Ψ ◦ Φ( x i ) � 2 min n Ψ , Φ i =1 becomes n � 1 β i ∈F λ � x i − Ψ β i � 2 min min . n Ψ ∈D �� i =1 Dictionary learning Representation learning Dictionary learning ◮ learning a regularized representation on a dictionary. . . ◮ while simultaneously learning the dictionary itself. L.Rosasco, RegML 2020

Examples The framework introduced above encompasses a large number of approaches. ◮ PCA (& kernel PCA) ◮ KSVD ◮ Sparse coding ◮ K-means ◮ K-flats ◮ . . . L.Rosasco, RegML 2020

Example 1: Principal Component Analysis (PCA) Let F λ = F k = R k , k ≤ min { n, d } , and D = { Ψ : F → X , linear | Ψ ∗ Ψ = I } . L.Rosasco, RegML 2020

Example 1: Principal Component Analysis (PCA) Let F λ = F k = R k , k ≤ min { n, d } , and D = { Ψ : F → X , linear | Ψ ∗ Ψ = I } . ◮ Ψ is a d × k matrix with orthogonal, unit norm columns, � k Ψ β = a j β j , β ∈ F j =1 L.Rosasco, RegML 2020

Example 1: Principal Component Analysis (PCA) Let F λ = F k = R k , k ≤ min { n, d } , and D = { Ψ : F → X , linear | Ψ ∗ Ψ = I } . ◮ Ψ is a d × k matrix with orthogonal, unit norm columns, � k Ψ β = a j β j , β ∈ F j =1 ◮ Ψ ∗ : X → F , Ψ ∗ x = ( � a 1 , x � , . . . , � a k , x � ) , x ∈ X L.Rosasco, RegML 2020

PCA & best subspace ΨΨ ∗ x = � k ◮ ΨΨ ∗ : X → X , j =1 a j � a j , x � , x ∈ X . x x − h x, a i a a |{z} h x,a i a ◮ P = ΨΨ ∗ is the projection ( P = P 2 ) on the subspace of R d spanned by a 1 , . . . , a k . L.Rosasco, RegML 2020

Rewriting PCA Note that, � x − Ψ β � 2 , Φ( x ) = Ψ ∗ x = arg min ∀ x ∈ X , β ∈F k so that we can rewrite the PCA minimization as � n 1 � x − ΨΨ ∗ x i � 2 . min n Ψ ∈D i =1 L.Rosasco, RegML 2020

Rewriting PCA Note that, � x − Ψ β � 2 , Φ( x ) = Ψ ∗ x = arg min ∀ x ∈ X , β ∈F k so that we can rewrite the PCA minimization as � n 1 � x − ΨΨ ∗ x i � 2 . min n Ψ ∈D i =1 Subspace learning The problem of finding a k − dimensional orthogonal projection giving the best reconstruction . L.Rosasco, RegML 2020

PCA computation X T � Let � n � X the n × d data matrix and C = 1 X . L.Rosasco, RegML 2020

PCA computation X T � Let � n � X the n × d data matrix and C = 1 X . . . . PCA optimization problem is solved by the eigenvector of C associated to the K largest eigenvalues. L.Rosasco, RegML 2020

Learning a linear representation with PCA Subspace learning The problem of finding a k − dimensional orthogonal projection giving the best reconstruction . X PCA assumes the support of the data distribution to be well approximated by a low dimensional linear subspace L.Rosasco, RegML 2020

PCA beyond linearity X L.Rosasco, RegML 2020

Kernel PCA Consider K ( x, x ′ ) = � φ ( x ) , φ ( x ′ ) � H φ : X → H , and a feature map and associated (reproducing) kernel . We can consider the empirical reconstruction in the feature space , n � 1 β i ∈H � φ ( x i ) − Ψ β i � 2 min min H . n Ψ ∈D i =1 Connection to manifold learning. . . L.Rosasco, RegML 2020

Examples 2: Sparse coding One of the first and most famous dictionary learning techniques. L.Rosasco, RegML 2020

Examples 2: Sparse coding One of the first and most famous dictionary learning techniques. It corresponds to ◮ F = R p , L.Rosasco, RegML 2020

Examples 2: Sparse coding One of the first and most famous dictionary learning techniques. It corresponds to ◮ F = R p , ◮ p ≥ d , F λ = { β ∈ F : � β � 1 ≤ λ } , λ > 0 , L.Rosasco, RegML 2020

Examples 2: Sparse coding One of the first and most famous dictionary learning techniques. It corresponds to ◮ F = R p , ◮ p ≥ d , F λ = { β ∈ F : � β � 1 ≤ λ } , λ > 0 , ◮ D = { Ψ : F → X | � Ψ e j � F ≤ 1 } . L.Rosasco, RegML 2020

Examples 2: Sparse coding One of the first and most famous dictionary learning techniques. It corresponds to ◮ F = R p , ◮ p ≥ d , F λ = { β ∈ F : � β � 1 ≤ λ } , λ > 0 , ◮ D = { Ψ : F → X | � Ψ e j � F ≤ 1 } . Hence, � n 1 β i ∈F λ � x i − Ψ β i � 2 min min n Ψ ∈D �� i =1 dictionary learning sparse representation L.Rosasco, RegML 2020

RegML 2020 Class 7 Dictionary learning Lorenzo Rosasco - PowerPoint PPT Presentation

RegML 2020 Class 7 Dictionary learning Lorenzo Rosasco UNIGE-MIT-IIT Data representation A mapping of data in new format better suited for further processing Data Representation L.Rosasco, RegML 2020 Data representation (cont.) X

RegML 2016 Class 7 Dictionary learning Lorenzo Rosasco UNIGE-MIT-IIT June 30, 2016 Data

RegML 2016 Class 6 Structured sparsity Lorenzo Rosasco UNIGE-MIT-IIT June 30, 2016 Exploiting

The Dictionary ADT The dictionary ADT models a searchable collection findElement(k): if the

RegML 2020 Class 4 Regularization for multi-task learning Lorenzo Rosasco UNIGE-MIT-IIT

RegML 2020 Class 1 Statistical Learning Theory Lorenzo Rosasco UNIGE-MIT-IIT All starts with

RegML 2020 Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT Learning

RegML 2016 Class 4 Regularization for multi-task learning Lorenzo Rosasco UNIGE-MIT-IIT June

RegML 2020 Class 3 Early Stopping and Spectral Regularization Lorenzo Rosasco UNIGE-MIT-IIT

CMSC 206 Dictionaries and Hashing The Dictionary ADT n a dictionary (table) is an abstract

Sparse Coding and Dictionary Learning for Image Analysis Part II: Dictionary Learning for signal

Agenda Announcements Dictionary please snarf code for class today

Applications to high dimensional problems Francesca Odone and Lorenzo Rosasco RegML 2013

Dictionaries A Good morning dictionary English: Good morning Spanish: Buenas das

Hashing - Introduction Dictionary Dictionary = a dynamic set that supports the = a dynamic set

Dictionary lookup Suppose youre looking up a word in the dictionary (paper one, not

The dictionary problem. A dictionary can be seen as a database of records; in each record we

Kernels + K-Means Matt Gormley Lecture 29 April 25, 2018 1 Reminders Homework 8:

On the Measure of Distortions Hugo Hopenhayn May 11, 2012 1 / 38 Introduction Recent

An Integrated View into Multivariate Associations Inferred from TCGA Cancer Data Dick Kreisberg

Robust multivariate methods for compositional data Peter Filzmoser Department of Statistics and

In the name of Allah In the name of Allah the compassionate, the merciful the compassionate, the

Nash demand game Julio D avila 2009 Julio D avila Nash demand game Nash demand game

Bargaining and Coalition Formation Dr James Tremewan (james.tremewan@univie.ac.at) Cooperative

On Selfish Behavior in CSMA/CA Networks Mario Cagalj 1 Saurabh Ganeriwal 2 Imad Aad 1