Learning Gaussian Tree Models: Analysis of Error Exponents and - PowerPoint PPT Presentation

Learning Gaussian Tree Models: Analysis of Error Exponents and Extremal Structures Vincent Tan Animashree Anandkumar, Alan Willsky Stochastic Systems Group, Laboratory for Information and Decision Systems, Massachusetts Institute of Technology Allerton Conference (Sep 30, 2009) 1/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 1 / 20

Motivation Given a set of i.i.d. samples drawn from p , a Gaussian tree model. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20

Motivation Given a set of i.i.d. samples drawn from p , a Gaussian tree model. Inferring structure of Phylogenetic Trees from observed data. Carlson et al. 2008, PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20

More motivation What is the exact rate of decay of the probability of error? 3/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 3 / 20

More motivation What is the exact rate of decay of the probability of error? How do the structure and parameters of the model influence the error exponent (rate of decay)? 3/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 3 / 20

More motivation What is the exact rate of decay of the probability of error? How do the structure and parameters of the model influence the error exponent (rate of decay)? What are extremal tree distributions for learning? 3/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 3 / 20

More motivation What is the exact rate of decay of the probability of error? How do the structure and parameters of the model influence the error exponent (rate of decay)? What are extremal tree distributions for learning? Consistency is well established (Chow and Wagner 1973). Error Exponent is a quantitative measure of the “goodness” of learning. 3/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 3 / 20

Main Contributions Provide the exact Rate of Decay for a given p . 1 4/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 4 / 20

Main Contributions Provide the exact Rate of Decay for a given p . 1 Rate of decay ≈ SNR for learning. 2 4/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 4 / 20

Main Contributions Provide the exact Rate of Decay for a given p . 1 Rate of decay ≈ SNR for learning. 2 Characterized the extremal trees structures for learning, i.e., stars 3 and Markov chains. Stars have the slowest rate. Chains have the fastest rate. 4/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 4 / 20

Notation and Background p = N ( 0 , Σ ) : d -dimensional Gaussian tree model. 5/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 5 / 20

Notation and Background p = N ( 0 , Σ ) : d -dimensional Gaussian tree model. Samples x n = { x 1 , x 2 , . . . , x n } drawn i.i.d. from p . 5/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 5 / 20

Notation and Background p = N ( 0 , Σ ) : d -dimensional Gaussian tree model. Samples x n = { x 1 , x 2 , . . . , x n } drawn i.i.d. from p . p : Markov on T p = ( V , E p ) , a tree. p : Factorizes according to T p . 5/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 5 / 20

Notation and Background p = N ( 0 , Σ ) : d -dimensional Gaussian tree model. Samples x n = { x 1 , x 2 , . . . , x n } drawn i.i.d. from p . p : Markov on T p = ( V , E p ) , a tree. p : Factorizes according to T p . p ( x ) = p 1 ( x 1 ) p 1 , 2 ( x 1 , x 2 ) p 1 , 3 ( x 1 , x 3 ) p 1 , 4 ( x 1 , x 4 ) , p 1 ( x 1 ) p 1 ( x 1 ) p 1 ( x 1 ) 5/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 5 / 20

Notation and Background p = N ( 0 , Σ ) : d -dimensional Gaussian tree model. Samples x n = { x 1 , x 2 , . . . , x n } drawn i.i.d. from p . p : Markov on T p = ( V , E p ) , a tree. p : Factorizes according to T p .   ♠ ♣ ♣ ♣   p ( x ) = p 1 ( x 1 ) p 1 , 2 ( x 1 , x 2 ) p 1 , 3 ( x 1 , x 3 ) p 1 , 4 ( x 1 , x 4 ) ♣ ♠ 0 0 Σ − 1 =   ,   ♣ ♠ p 1 ( x 1 ) p 1 ( x 1 ) p 1 ( x 1 ) 0 0 ♣ ♠ 0 0 5/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 5 / 20

Max-Likelihood Learning of Tree Distributions (Chow-Liu) p x n as the empirical distribution of x n , i.e., Denote � p = � p ( x ) := N ( x ; 0 , � � Σ ) where � Σ is the empirical covariance matrix of x n . 6/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 6 / 20

Max-Likelihood Learning of Tree Distributions (Chow-Liu) p x n as the empirical distribution of x n , i.e., Denote � p = � p ( x ) := N ( x ; 0 , � � Σ ) where � Σ is the empirical covariance matrix of x n . � p e : Empirical on edge e . 6/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 6 / 20

Max-Likelihood Learning of Tree Distributions (Chow-Liu) p x n as the empirical distribution of x n , i.e., Denote � p = � p ( x ) := N ( x ; 0 , � � Σ ) where � Σ is the empirical covariance matrix of x n . � p e : Empirical on edge e . Reduces to a max-weight spanning tree problem (Chow-Liu 1968) � � p e ) := � E CL ( x n ) = argmax I ( � p e ) . I ( � I ( X i ; X j ) . E q : q ∈ Trees e ∈E q 6/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 6 / 20

Max-Likelihood Learning of Tree Distributions 7/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 7 / 20

Max-Likelihood Learning of Tree Distributions True MIs { I ( p e ) } Max-weight spanning tree E p 7/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 7 / 20

Max-Likelihood Learning of Tree Distributions True MIs { I ( p e ) } Max-weight spanning tree E p Max-weight spanning tree � p e ) } from x n E CL ( x n ) � = E p Empirical MIs { I ( � 7/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 7 / 20

Problem Statement The estimated edge set is � E CL ( x n ) 8/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 8 / 20

Problem Statement The estimated edge set is � E CL ( x n ) and the error event is � � � E CL ( x n ) � = E p . 8/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 8 / 20

Problem Statement The estimated edge set is � E CL ( x n ) and the error event is � � � E CL ( x n ) � = E p . Find and analyze the error exponent K p : �� n →∞ − 1 � E CL ( x n ) � = E p K p := lim n log P . 8/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 8 / 20

Problem Statement The estimated edge set is � E CL ( x n ) and the error event is � � � E CL ( x n ) � = E p . Find and analyze the error exponent K p : �� n →∞ − 1 � E CL ( x n ) � = E p K p := lim n log P . Alternatively, �� . � E CL ( x n ) � = E p = exp ( − nK p ) . P 8/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 8 / 20

The Crossover Rate I 9/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 9 / 20

The Crossover Rate I � V � Two pairs of nodes e , e ′ ∈ with distribution p e , e ′ , s.t. 2 I ( p e ) > I ( p e ′ ) . 9/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 9 / 20

The Crossover Rate I � V � Two pairs of nodes e , e ′ ∈ with distribution p e , e ′ , s.t. 2 I ( p e ) > I ( p e ′ ) . Consider the crossover event: { I ( � p e ) ≤ I ( � p e ′ ) } . 9/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 9 / 20

The Crossover Rate I � V � Two pairs of nodes e , e ′ ∈ with distribution p e , e ′ , s.t. 2 I ( p e ) > I ( p e ′ ) . Consider the crossover event: { I ( � p e ) ≤ I ( � p e ′ ) } . Definition: Crossover Rate n →∞ − 1 J e , e ′ := n log P ( { I ( � p e ) ≤ I ( � p e ′ ) } ) . lim 9/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 9 / 20

The Crossover Rate I � V � Two pairs of nodes e , e ′ ∈ with distribution p e , e ′ , s.t. 2 I ( p e ) > I ( p e ′ ) . Consider the crossover event: { I ( � p e ) ≤ I ( � p e ′ ) } . Definition: Crossover Rate n →∞ − 1 J e , e ′ := n log P ( { I ( � p e ) ≤ I ( � p e ′ ) } ) . lim This event may potentially lead to an error in structure learning. Why? 9/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 9 / 20

The Crossover Rate II Theorem The crossover rate is � � J e , e ′ = D ( q || p e , e ′ ) : I ( q e ′ ) = I ( q e ) inf . q ∈ Gaussians 10/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 10 / 20

Learning Gaussian Tree Models: Analysis of Error Exponents and - PowerPoint PPT Presentation

Learning Gaussian Tree Models: Analysis of Error Exponents and Extremal Structures Vincent Tan Animashree Anandkumar, Alan Willsky Stochastic Systems Group, Laboratory for Information and Decision Systems, Massachusetts Institute of Technology

Algebra practice part 4 E. Exponents 3 4 Positive exponents Negative exponents Examples:

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Logarithms A Quick Review of Exponents Exponent 5 7 Base Exponents have two parts: 1. Base: The

2 - 2 Laws of Exponents The laws of exponents are listed below:

Some Recent Progress in the Applications of Niho Exponents Nian Li Faculty of Mathematics and

Chapter 11: The R.M.S. Error for Regression Errors: A has a large positive error B has a large

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Error Exponents, Weight and Magnetization Enumerators in LDPC Error Correcting Codes

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

ERROR DETECTON & CORRECTION Error Detection EDC= Error Detection and Correction bits

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

REPRESENTATIONS AND OPERATORS FOR IMPROVING EVOLUTIONARY SOFTWARE REPAIR Claire Westley

Panel GLMs Department of Political Science and Government Aarhus University May 12, 2015 Review

Work-based Learning and Student Agency: Developing Strategic Learners 3 rd WLE Mobile Learning

Di Divergenc nce and nd Convergenc nce in n APCDs Ds: : A M A Multi-level Study y of the

Baryogenesis from Helical Magnetic Fields Through the EW Phase Transition Andrew Long EWPT

Assessment of VDOT Bowers Hill Improvement Alternatives to Ease Evacuation HRTPO Board Meeting

Statistical estimation of diachronic stability from synchronic data Gerhard Jger Tbingen

The use of parsed corpora in information structural research LSA Summer Institute 2013: Workshop

Learning Gaussian Tree Models: Analysis of Error Exponents and - PowerPoint PPT Presentation

Learning Gaussian Tree Models: Analysis of Error Exponents and Extremal Structures Vincent Tan Animashree Anandkumar, Alan Willsky Stochastic Systems Group, Laboratory for Information and Decision Systems, Massachusetts Institute of Technology

Algebra practice part 4 E. Exponents 3 4 Positive exponents Negative exponents Examples:

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Logarithms A Quick Review of Exponents Exponent 5 7 Base Exponents have two parts: 1. Base: The

2 - 2 Laws of Exponents The laws of exponents are listed below:

Some Recent Progress in the Applications of Niho Exponents Nian Li Faculty of Mathematics and

Chapter 11: The R.M.S. Error for Regression Errors: A has a large positive error B has a large

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Error Exponents, Weight and Magnetization Enumerators in LDPC Error Correcting Codes

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

ERROR DETECTON &amp; CORRECTION Error Detection EDC= Error Detection and Correction bits

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

REPRESENTATIONS AND OPERATORS FOR IMPROVING EVOLUTIONARY SOFTWARE REPAIR Claire Westley

Panel GLMs Department of Political Science and Government Aarhus University May 12, 2015 Review

Work-based Learning and Student Agency: Developing Strategic Learners 3 rd WLE Mobile Learning

Di Divergenc nce and nd Convergenc nce in n APCDs Ds: : A M A Multi-level Study y of the

Baryogenesis from Helical Magnetic Fields Through the EW Phase Transition Andrew Long EWPT

Assessment of VDOT Bowers Hill Improvement Alternatives to Ease Evacuation HRTPO Board Meeting

Statistical estimation of diachronic stability from synchronic data Gerhard Jger Tbingen

The use of parsed corpora in information structural research LSA Summer Institute 2013: Workshop

ERROR DETECTON & CORRECTION Error Detection EDC= Error Detection and Correction bits