on the sample complexity of graph selection practical
play

On the sample complexity of graph selection: Practical methods and - PowerPoint PPT Presentation

On the sample complexity of graph selection: Practical methods and fundamental limits Martin Wainwright UC Berkeley Departments of Statistics, and EECS Based on joint work with: John Lafferty (CMU) Pradeep Ravikumar (UT Austin) Prasad


  1. On the sample complexity of graph selection: Practical methods and fundamental limits Martin Wainwright UC Berkeley Departments of Statistics, and EECS Based on joint work with: John Lafferty (CMU) Pradeep Ravikumar (UT Austin) Prasad Santhanam (Univ. Hawaii) Martin Wainwright (UC Berkeley) High-dimensional graph selection August 2009 1 / 27

  2. Introduction Markov random fields (undirected graphical models): central to many applications in science and engineering: ◮ communication, coding, information theory, networking ◮ machine learning and statistics ◮ computer vision; image processing ◮ statistical physics ◮ bioinformatics, computational biology ... Martin Wainwright (UC Berkeley) High-dimensional graph selection August 2009 2 / 27

  3. Introduction Markov random fields (undirected graphical models): central to many applications in science and engineering: ◮ communication, coding, information theory, networking ◮ machine learning and statistics ◮ computer vision; image processing ◮ statistical physics ◮ bioinformatics, computational biology ... some core computational problems ◮ counting/integrating: computing marginal distributions and data likelihoods ◮ optimization: computing most probable configurations (or top M -configurations) ◮ model selection: fitting and selecting models on the basis of data Martin Wainwright (UC Berkeley) High-dimensional graph selection August 2009 2 / 27

  4. What are graphical models? Markov random field: random vector ( X 1 , . . . , X p ) with distribution factoring according to a graph G = ( V, E ): D A B C Hammersley-Clifford Theorem: ( X 1 , . . . , X p ) being Markov w.r.t G implies factorization over graph cliques studied/used in various fields: spatial statistics, language modeling, computational biology, computer vision, statistical physics .... Martin Wainwright (UC Berkeley) High-dimensional graph selection August 2009 3 / 27

  5. Graphical model selection let G = ( V, E ) be an undirected graph on p = | V | vertices Martin Wainwright (UC Berkeley) High-dimensional graph selection August 2009 4 / 27

  6. Graphical model selection let G = ( V, E ) be an undirected graph on p = | V | vertices pairwise Markov random field: family of prob. distributions � � � 1 P ( x 1 , . . . , x p ; θ ) = Z ( θ ) exp � θ st , φ st ( x s , x t ) � . ( s,t ) ∈ E Martin Wainwright (UC Berkeley) High-dimensional graph selection August 2009 4 / 27

  7. Graphical model selection let G = ( V, E ) be an undirected graph on p = | V | vertices pairwise Markov random field: family of prob. distributions � � � 1 P ( x 1 , . . . , x p ; θ ) = Z ( θ ) exp � θ st , φ st ( x s , x t ) � . ( s,t ) ∈ E Problem of graph selection: given n independent and identically distributed (i.i.d.) samples of X = ( X 1 , . . . , X p ), identify the underlying graph structure Martin Wainwright (UC Berkeley) High-dimensional graph selection August 2009 4 / 27

  8. Graphical model selection let G = ( V, E ) be an undirected graph on p = | V | vertices pairwise Markov random field: family of prob. distributions � � � 1 P ( x 1 , . . . , x p ; θ ) = Z ( θ ) exp � θ st , φ st ( x s , x t ) � . ( s,t ) ∈ E Problem of graph selection: given n independent and identically distributed (i.i.d.) samples of X = ( X 1 , . . . , X p ), identify the underlying graph structure complexity constraint: restrict to subset G d,p of graphs with maximum degree d Martin Wainwright (UC Berkeley) High-dimensional graph selection August 2009 4 / 27

  9. Illustration: Voting behavior of US senators Graphical model fit to voting records of US senators (Bannerjee, El Ghaoui, & d’Aspremont, 2008)

  10. Outline of remainder of talk 1 Background and past work 2 A practical scheme for graphical model selection (a) ℓ 1 -regularized neighborhood regression (b) High-dimensional analysis and phase transitions 3 Fundamental limits of graphical model selection (a) An unorthodox channel coding problem (b) Necessary conditions (c) Sufficient conditions (optimal algorithms) 4 Various open questions...... Martin Wainwright (UC Berkeley) High-dimensional graph selection August 2009 6 / 27

  11. Previous/on-going work on graph selection methods for Gaussian MRFs ◮ ℓ 1 -regularized neighborhood regression for Gaussian MRFs (e.g., Meinshausen & Buhlmann, 2005; Wainwright, 2006, Zhao, 2006) ◮ ℓ 1 -regularized log-determinant (e.g., Yuan & Lin, 2006; d’Aspr´ emont et al., 2007; Friedman, 2008; Ravikumar et al., 2008)

  12. Previous/on-going work on graph selection methods for Gaussian MRFs ◮ ℓ 1 -regularized neighborhood regression for Gaussian MRFs (e.g., Meinshausen & Buhlmann, 2005; Wainwright, 2006, Zhao, 2006) ◮ ℓ 1 -regularized log-determinant (e.g., Yuan & Lin, 2006; d’Aspr´ emont et al., 2007; Friedman, 2008; Ravikumar et al., 2008) methods for discrete MRFs ◮ exact solution for trees (Chow & Liu, 1967) ◮ local testing (e.g., Spirtes et al, 2000; Kalisch & Buhlmann, 2008) ◮ distribution fits by KL-divergence (Abeel et al., 2005) ◮ ℓ 1 -regularized logistic regression (Ravikumar, W. & Lafferty et al., 2006, 2008) ◮ approximate max. entropy approach and thinned graphical models (Johnson et al., 2007) ◮ neighborhood-based thresholding method (Bresler, Mossel & Sly, 2008)

  13. Previous/on-going work on graph selection methods for Gaussian MRFs ◮ ℓ 1 -regularized neighborhood regression for Gaussian MRFs (e.g., Meinshausen & Buhlmann, 2005; Wainwright, 2006, Zhao, 2006) ◮ ℓ 1 -regularized log-determinant (e.g., Yuan & Lin, 2006; d’Aspr´ emont et al., 2007; Friedman, 2008; Ravikumar et al., 2008) methods for discrete MRFs ◮ exact solution for trees (Chow & Liu, 1967) ◮ local testing (e.g., Spirtes et al, 2000; Kalisch & Buhlmann, 2008) ◮ distribution fits by KL-divergence (Abeel et al., 2005) ◮ ℓ 1 -regularized logistic regression (Ravikumar, W. & Lafferty et al., 2006, 2008) ◮ approximate max. entropy approach and thinned graphical models (Johnson et al., 2007) ◮ neighborhood-based thresholding method (Bresler, Mossel & Sly, 2008) information-theoretic analysis ◮ pseudolikelihood and BIC criterion (Csiszar & Talata, 2006) ◮ information-theoretic limitations (Santhanam & W., 2008)

  14. High-dimensional analysis classical analysis: dimension p fixed, sample size n → + ∞ high-dimensional analysis: allow both dimension p , sample size n , and maximum degree d to increase at arbitrary rates take n i.i.d. samples from MRF defined by G p,d study probability of success as a function of three parameters: Success( n, p, d ) = P [Method recovers graph G p,d from n samples] theory is non-asymptotic: explicit probabilities for finite ( n, p, d )

  15. Some challenges in distinguishing graphs clearly, a lower bound on the minimum edge weight is required: ( s,t ) ∈ E | θ ∗ min st | ≥ θ min , although θ min ( p, d ) = o (1) is allowed. in contrast to other testing/detection problems, large | θ st | also problematic

  16. Some challenges in distinguishing graphs clearly, a lower bound on the minimum edge weight is required: ( s,t ) ∈ E | θ ∗ min st | ≥ θ min , although θ min ( p, d ) = o (1) is allowed. in contrast to other testing/detection problems, large | θ st | also problematic Toy example: Graphs from G 3 , 2 (i.e., p = 3; d = 2) θ θ θ θ θ θ As θ increases, all three Markov random fields become arbitrarily close to: � if x ∈ { ( − 1) 3 , (+1) 3 } 1 / 2 P ( x 1 , x 2 , x 3 ) = 0 otherwise.

  17. Markov property and neighborhood structure Markov properties encode neighborhood structure: d ( X s | X V \ s ) = ( X s | X N ( s ) ) � �� � � �� � Condition on full graph Condition on Markov blanket N ( s ) = { s, t, u, v, w } X s X t X w X s X u X v basis of pseudolikelihood method (Besag, 1974) used for Gaussian model selection (Meinshausen & Buhlmann, 2006) Martin Wainwright (UC Berkeley) High-dimensional graph selection August 2009 10 / 27

  18. § 2. Practical method via neighborhood regression Observation: Recovering graph G equivalent to recovering neighborhood set N ( s ) for all s ∈ V . Method: Given n i.i.d. samples { X (1) , . . . , X ( n ) } , perform logistic regression of each node X s on X \ s := { X s , t � = s } to estimate neighborhood structure b N ( s ). 1 For each node s ∈ V , perform ℓ 1 regularized logistic regression of X s on the remaining variables X \ s : ( ) X n 1 b f ( θ ; X ( i ) θ [ s ] := arg min \ s ) + ρ n � θ � 1 n |{z} θ ∈ R p − 1 | {z } i =1 logistic likelihood regularization 2 Estimate the local neighborhood b N ( s ) as the support (non-negative entries) of the regression vector b θ [ s ]. 3 Combine the neighborhood estimates in a consistent manner (AND, or OR rule). Martin Wainwright (UC Berkeley) High-dimensional graph selection August 2009 11 / 27

  19. Empirical behavior: Unrescaled plots Star graph; Linear fraction neighbors 1 0.8 Prob. success 0.6 0.4 0.2 p = 64 p = 100 p = 225 0 0 100 200 300 400 500 600 Number of samples Martin Wainwright (UC Berkeley) High-dimensional graph selection August 2009 12 / 27

  20. Empirical behavior: Appropriately rescaled Star graph; Linear fraction neighbors 1 0.8 Prob. success 0.6 0.4 0.2 p = 64 p = 100 p = 225 0 0 0.5 1 1.5 2 Control parameter Martin Wainwright (UC Berkeley) High-dimensional graph selection August 2009 13 / 27 Plots of success probability versus control parameter θ ( n, p, d ).

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend