An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 - PowerPoint PPT Presentation

An Introduction to Spectral Learning An Introduction to Spectral Learning Hanxiao Liu November 8, 2013

An Introduction to Spectral Learning Outline 1 Method of Moments 2 Learning topic models using spectral properties 3 Anchor words

An Introduction to Spectral Learning Preliminaries X 1 , · · · , X n ∼ p ( x ; θ ) , θ = ( θ 1 , · · · θ m ) ⊤ θ = ˆ ˆ θ n = w ( X 1 , · · · , X n ) Maximum Likelihood Estimator (MLE) ˆ θ = argmax log L ( θ ) θ Bayes Estimator (BE) � θ p ( x | θ ) π ( θ ) d θ ˆ θ = E ( θ | X ) = � p ( x | θ ) π ( θ ) d θ

An Introduction to Spectral Learning Preliminaries Question What makes a good estimator? MLE is consistent Both the MLE and BE have asymptotic normality √ n 1 � � � � ˆ θ n − θ 0, � N I ( θ ) under mild (regularity) conditions Can be computationally expensive

An Introduction to Spectral Learning Preliminaries Example ( Gamma distribution ) 1 − x i � � Γ ( α ) θ α x α − 1 p ( x i ; α , θ ) = exp i θ � n � n � α − 1 � n 1 � � i = 1 x i � � L ( α , θ ) = x i exp − Γ ( α ) θ α θ i = 1 MLE is hard to compute due to the existence of Γ ( α )

An Introduction to Spectral Learning Method of Moments j -th theoretical moment, j ∈ [ k ] � X j � µ j ( θ ) : = E θ j -th sample moment, j ∈ [ k ] n M j : = 1 � X j i n i = 1 Plug-in and solve the multivariate polynomial equations M j = µ j ( θ ) j ∈ [ k ] sometimes can be recast as spectral decomposition

An Introduction to Spectral Learning Method of Moments Example ( Gamma distribution ) 1 � − x i � Γ ( α ) θ α x α − 1 p ( x i ; α , θ ) = exp i θ X = E ( X i ) = αθ n � 2 = Var ( X i ) = αθ 2 1 � � X i − X n i = 1 n nX 2 � 2 , ˆ 1 α = X � ⇒ ˆ � θ = X i − X θ = ˆ � 2 nX � � n X i − X i = 1 i = 1

An Introduction to Spectral Learning Method of Moments lack guarantee about the solution high-order sample moments are hard to estimate To reach a specified accuracy, the required sample size and computational cost is exponential in k (or n )! Question Could we recover the true θ from only low-order moments? Question Could we lower the sample requirement and computational complexity based on some (hopefully mild) assumptions?

An Introduction to Spectral Learning Learning the Topic Models Papadimitriou et al. (2000) Non-overlapping separation condition (strong) Anandkumar et al. (2012), MoM+SD Full rank assumption (weak) Multinomial Mixture, LDA Arora et al. (2012), MoM+NMF+LP Anchor words (mild) LDA, Correlated Topic Model A more practical algorithm proposed in 2013

An Introduction to Spectral Learning Learning the Topic Models Suppose there are n documents, k hidden topics, d features M = [ µ 1 | µ 2 | . . . | µ k ] ∈ R d × k , µ j ∈ ∆ d − 1 ∀ j ∈ [ k ] w = ( w 1 , . . . , w k ) , w ∈ ∆ k − 1 P ( h = j ) = w j j ∈ [ k ] For the v -th word in a document, x v ∈ { e 1 , . . . e d } P ( x v = e i | h = j ) = µ i j , j ∈ [ k ] , i ∈ [ d ] Goal : Recover the M using low-order moments

An Introduction to Spectral Learning Learning the Topic Models Construct moment statistics Pairs ij : = P ( x 1 = e i , x 2 = e j ) Triples ij : = P ( x 1 = e i , x 2 = e j , x 3 = e t ) Pair = E [ x 1 ⊗ x 2 ] ∈ R d × d Triples = E [ x 1 ⊗ x 2 ⊗ x 3 ] ∈ R d × d × d ˆ ˆ Empirical plug-ins i.e. Pairs and Triples could be obtained from data through a straightforward manner We want to establish some equivalence between the empirical moments and parameters of interest

An Introduction to Spectral Learning Learning the Topic Models Triples ( η ) : = E [ x 1 ⊗ x 2 ⊗ � x 3 , η � ] ∈ R d × d Triples ( η ) : R d → R d × d Lemma Pairs = M diag ( w ) M ⊤ � � � � M ⊤ η M ⊤ Triples ( η ) = M diag ( w ) diag The unknown M and w are twisted.

An Introduction to Spectral Learning Learning the Topic Models Assumption ( Non-degeneracy ) M has full column rank k � − 1 and � − 1 exist. 1 Find U , V ∈ R d × k s.t. � � U ⊤ M V ⊤ M 2 ∀ η ∈ R d , define B ( η ) ∈ R k × k � − 1 � � � U ⊤ Triples ( η ) V U ⊤ Pairs V B ( η ) : = Lemma (Observable Operator) � − 1 � � � � � U ⊤ M M ⊤ η U ⊤ M B ( η ) = diag

An Introduction to Spectral Learning Learning the Topic Models ˆ ˆ Input : Pairs and Triples Output : topic-word distributions ˆ M U , ˆ ˆ Pairs a ˆ V ← top k left, right eigenvectors of η ← random sample from range( ˆ U ) � � ξ 1 , ˆ ˆ ξ 2 , . . . , ˆ ← right eigenvectors of B ( η ) b ξ k for j ← 1 to k do µ j ← ˆ U ˆ ξ j / � 1, ˆ U ˆ ˆ ξ j � end return ˆ M = [ ˆ µ 1 | ˆ µ 2 | . . . | ˆ µ k ] a Pairs = M diag ( w ) M ⊤ U ⊤ M � − 1 b B ( η ) = � U ⊤ M � M ⊤ η � � diag �

An Introduction to Spectral Learning Learning the Topic Models Lemma (Observable Operator) � − 1 � � � � � U ⊤ M M ⊤ η U ⊤ M B ( η ) = diag We hope M ⊤ η has distinct entries. How to pick η ? η ← e i ⇒ M ⊤ η i- th word’s distribution over topics Prior knowledge required! Otherwise, η ← U θ , θ ∼ Uniform ( S k − 1 )

An Introduction to Spectral Learning Learning the Topic Models SVD is carried out on R k × k , k ≪ d Only involves trigram statistics i.e. low-order moments Guaranteed to recover the parameters Parameters of more complicated models like LDA can be recovered in the same manner

An Introduction to Spectral Learning Tensor Decomposition Recall Pairs = M diag ( w ) M ⊤ � � � � M ⊤ η M ⊤ Triples ( η ) = M diag diag ( w ) k � Pairs = w j · µ j ⊗ µ j j k � Triples = w j · µ j ⊗ µ j ⊗ µ j j Symmetric tensor decomposition? µ j need to be orthogonal

An Introduction to Spectral Learning Tensor Decomposition Whiten Pairs 1 2 ⇒ W ⊤ Pairs W = I W : = UD j : = √ w j W ⊤ µ j µ ′ We can check that µ ′ j , j ∈ [ k ] are orthonormal vectors Do orthogonal tensor decomposition on k k � ⊗ 3 = 1 � ⊗ 3 � W ⊤ µ j � µ ′ Triples ( W , W , W ) = w j √ w j j j = 1 j = 1 Then recover µ j from µ ′ j

An Introduction to Spectral Learning Anchor Words Drawbacks of previous algorithms topics cannot be correlated the bound is weak (comparatively speaking) empirical runtime performance is not satisfactory Alternatively assumptions?

An Introduction to Spectral Learning Anchor Words Definition ( p -separable) M is p -separable if ∀ j , ∃ i s.t. M ij ≥ p and M ij ′ = 0 for j ′ � = j Documents do not necessarily contains anchor words Two-fold algorithm 1 Selection: find the anchor word for each topic 2 Recover: recover M based on anchor words Good theoretical guarantees and empirical results

An Introduction to Spectral Learning Anchor Words 1 1 The illustration is taken from Ankur Moitra’s slides, http://people.csail.mit.edu/moitra/docs/IASM.pdf

An Introduction to Spectral Learning Discussion Summary A brief introduction to MoM Learning topic models by spectral decomposition Anchor words assumption Connections with our work?

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 - PowerPoint PPT Presentation

An Introduction to Spectral Learning An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral Learning Outline 1 Method of Moments 2 Learning topic models using spectral properties 3 Anchor words An

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

10Hz Spectral Lines Joschua Dilly 10Hz Spectral Lines 2 Introduction Ions 50cm Protons 30cm

Lesson 9 Introduction Signal Spectral Analysis: Estimation of the power spectral density

AIRS In-flight Spectral Calibration Steve Gaiser 1 Steve Gaiser, AIRS in-orbit spectral

Scikit Spectral Learning (SpLearn): a toolbox for the spectral learning of weighted automata Denis

Poster #190 1 Spectral Clustering of Signed Graphs Poster #190 Our Goal: Extend Spectral

Spectral Graph Theory and its Applications Lillian Dai 6.454 Oct. 20, 2004 1 Outline Basic

Characterization of Spectral Flow Magdalena Georgescu 42nd Canadian Annual Symposium on Operator

MACHINE LEARNING Spectral Clustering 1 ADVANCED MACHINE LEARNING Outline of Todays Lecture

Basic Definitions and The Spectral Estimation Problem Lecture 1 Lecture notes to accompany

Hourglass Alternative and constructivity of spectral of matrix products V ICTOR K OZYAKIN

Avoiding artifacts in spectral white matter fiber clustering and embedding Demian Wassermann

Improving the Rectification of Spectral Images Linda Dressel, Paul Barrett, Paul Goudfrooij, and

Feature Extraction Combining Feature Extraction Combining Spectral Noise Reduction and Spectral

SEG Spring 2005 Distinguished Lecture: Spectral Decomposition and Spectral Inversion Greg

Spectral Clustering Lecture 16 David Sontag New York

Lecture 3. Fitting Distributions to data - choice of a model. Igor Rychlik Chalmers Department

Statistics Point Estimation Shiu-Sheng Chen Department of Economics National Taiwan University

Estimating Frequency Moments of Streams In this class we will look at the two simple sketches for

II.2 Statistical Inference: Sampling and Estimation A statistical model is a set of

A general procedure to combine estimators Fr ed eric Lavancier Laboratoire de Math

Introduction to Moments Intermediate Portfolio Analysis in R Optimization Inputs Portfolio

CHAPTER 3: MIGRATION APHUG | BHS | Ms. Justice Key Question 3.3 Where do people migrate? Where

Atlan&c Meridional Overturning Circula&on (AMOC) and its

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 - PowerPoint PPT Presentation

An Introduction to Spectral Learning An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral Learning Outline 1 Method of Moments 2 Learning topic models using spectral properties 3 Anchor words An

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

10Hz Spectral Lines Joschua Dilly 10Hz Spectral Lines 2 Introduction Ions 50cm Protons 30cm

Lesson 9 Introduction Signal Spectral Analysis: Estimation of the power spectral density

AIRS In-flight Spectral Calibration Steve Gaiser 1 Steve Gaiser, AIRS in-orbit spectral

Scikit Spectral Learning (SpLearn): a toolbox for the spectral learning of weighted automata Denis

Poster #190 1 Spectral Clustering of Signed Graphs Poster #190 Our Goal: Extend Spectral

Spectral Graph Theory and its Applications Lillian Dai 6.454 Oct. 20, 2004 1 Outline Basic

Characterization of Spectral Flow Magdalena Georgescu 42nd Canadian Annual Symposium on Operator

MACHINE LEARNING Spectral Clustering 1 ADVANCED MACHINE LEARNING Outline of Todays Lecture

Basic Definitions and The Spectral Estimation Problem Lecture 1 Lecture notes to accompany

Hourglass Alternative and constructivity of spectral of matrix products V ICTOR K OZYAKIN

Avoiding artifacts in spectral white matter fiber clustering and embedding Demian Wassermann

Improving the Rectification of Spectral Images Linda Dressel, Paul Barrett, Paul Goudfrooij, and

Feature Extraction Combining Feature Extraction Combining Spectral Noise Reduction and Spectral

SEG Spring 2005 Distinguished Lecture: Spectral Decomposition and Spectral Inversion Greg

Spectral Clustering Lecture 16 David Sontag New York

Lecture 3. Fitting Distributions to data - choice of a model. Igor Rychlik Chalmers Department

Statistics Point Estimation Shiu-Sheng Chen Department of Economics National Taiwan University

Estimating Frequency Moments of Streams In this class we will look at the two simple sketches for

II.2 Statistical Inference: Sampling and Estimation A statistical model is a set of

A general procedure to combine estimators Fr ed eric Lavancier Laboratoire de Math

Introduction to Moments Intermediate Portfolio Analysis in R Optimization Inputs Portfolio

CHAPTER 3: MIGRATION APHUG | BHS | Ms. Justice Key Question 3.3 Where do people migrate? Where

Atlan&amp;c Meridional Overturning Circula&amp;on (AMOC) and its

Atlan&c Meridional Overturning Circula&on (AMOC) and its