Galen Reeves Departments of ECE and Statistical Science Duke - PowerPoint PPT Presentation

Scalable Posterior Approximation Galen Reeves Departments of ECE and Statistical Science Duke University August 2015

Collaborators at Duke David B. Dunson Willem van den Boom 2

variable selection / support recovery • identify the locations / identities of agents which have significant e ff ects on observed behaviors ‣ e.g., gene expression, face recognition, etc. • find relevant features for building a model ‣ machine learning • recover a sparse signal from noisy linear measurements ‣ compressed sensing • determine which entries of a unknown parameter vector are significant ‣ statistics 3

high-dimensional inference p unknown n observations parameters (the data) β 1 y 1 β 2 y 2 β 3 y 3 β 4 y n β p 4

high-dimensional inference p unknown n observations Types of questions: parameters (the data) • posterior distribution β 1 high-dimensional p ( β | y ) y 1 distribution β 2 • posterior mean and y 2 covariance β 3 p x 1 vector, E [ β | y ] Cov [ β | y ] p x p matrix y 3 • posterior marginal β 4 distribution one-dimensional p ( β 1 | y ) distribution y n β p 5

edges mean dependencies p unknown n observations parameters (the data) β 1 y 1 β 2 y 2 β 3 y 3 β 4 y n β p 6

inference is easy if graph is sparse… p unknown n observations parameters (the data) β 1 y 1 β 2 y 2 β 3 y 3 β 4 y n β p 7

… but dense graphs are challenging p unknown n observations parameters (the data) β 1 y 1 β 2 y 2 β 3 y 3 β 4 y n β p 8

statistical model for parameters entries of β conditionally independent given hyper parameters θ p Y p ( β | θ ) = p ( β j | θ ) j =1 mixed discrete-continuous distribution for marginal prior probability distribution if equal to zero nonzero prior distribution p ( β j | θ ) 9

standard linear model n x p matrix y = X � + ✏ p unknown Gaussian n observations N (0 , σ 2 I ) parameters errors 10

why challenging? • number of feature subsets grows exponentially with p ‣ curse of dimensionality • exact inference requires computing high-dimensional integrals • brute-force integration is computationally infeasible • extensive research focuses on methods for approximate inference 11

tradeo ff s for high-dimensional inference linear methods (least-squares) focus of recent research LASSO AMP MCMC BCR scalability YFA MCMC (unbounded time) brute-force numerical integration accuracy 12

problems with existing methods 13

problems with existing methods • regularized least-squares (e.g. LASSO) ‣ lack measures of statistical significance 13

problems with existing methods • regularized least-squares (e.g. LASSO) ‣ lack measures of statistical significance • sampling methods like MCMC ‣ not clear when su ffi ciently converged / sampled 13

problems with existing methods • regularized least-squares (e.g. LASSO) ‣ lack measures of statistical significance • sampling methods like MCMC ‣ not clear when su ffi ciently converged / sampled • variational approximations ‣ di ffi culty with multimodal posteriors, hard to interpret 13

Example: one-dimensional problem with spike & slab (Bernoulli-Gaussian) prior y = � + ✏ probability mass at zero prior distribution 14

Example: one-dimensional problem with spike & slab (Bernoulli-Gaussian) prior y = � + ✏ probability probability large mass at zero mass at zero observation posterior distribution prior distribution 14

Example: one-dimensional problem with spike & slab (Bernoulli-Gaussian) prior y = � + ✏ probability probability large mass at zero mass at zero observation posterior distribution probability prior distribution mass at zero small observation posterior distribution 14

Example: one-dimensional problem with spike & slab (Bernoulli-Gaussian) prior y = � + ✏ Gaussian approximation large observation posterior distribution Gaussian prior distribution approximation small observation posterior distribution 15

problems with existing methods • regularized least-squares (e.g. LASSO) ‣ lack measures of statistical significance • sampling methods like MCMC ‣ not clear when su ffi ciently converged / sampled • variational approximations ‣ di ffi culty with multimodal posteriors, hard to interpret 16

problems with existing methods • regularized least-squares (e.g. LASSO) ‣ lack measures of statistical significance • sampling methods like MCMC ‣ not clear when su ffi ciently converged / sampled • variational approximations ‣ di ffi culty with multimodal posteriors, hard to interpret • loopy belief propagation, approximate message passing (AMP) ‣ lack theoretical guarantees for general matrices 16

high-dimensional variable selection n x p matrix ✏ ∼ N (0 , � 2 I ) y = X � + ✏ Gaussian errors n observations p unknown parameters drawn independently with known distribution (e.g. spike & slab) 17

high-dimensional variable selection n x p matrix ✏ ∼ N (0 , � 2 I ) y = X � + ✏ Gaussian errors n observations p unknown parameters drawn independently with known distribution (e.g. spike & slab) Goal: compute posterior marginal distribution of first entry Z p ( β | y ) d β p p ( β 1 | y ) = 2 17

overview of our approach • rotate the data to isolate the parameter of interest • introduce an auxiliary variable which summarizes the influence of the other parameters • use any means possible to compute / estimate the posterior mean and posterior variance of the auxiliary variable • apply Gaussian approx. to auxiliary variable and solve one- dimensional integration problem to obtain posterior approximation 18

1: reparameterize • Apply rotation matrix to the data which zeros out all but one entry in the first column of the data 2 · · · 3 ˜ ˜ ˜ x 1 , 1 x 1 , 2 x 1 ,p · · · 0 ˜ ˜ x 2 , 2 x 2 ,p 6 7 y = ˜ 5 � + ˜ ✏ 6 . . . 7 . . . 6 7 . . . 4 · · · 0 x n, 2 ˜ x n,p ˜ • Only first observation depends on first entry. p X x 1 , 1 � 1 + x 1 ,j � j + ˜ ✏ 1 y 1 = ˜ ˜ ˜ j =2 auxiliary variable captures φ ( β p 2 ) influence of other parameters 19

step 1: reparameterize β 1 y 1 β 2 y 2 β 3 y 3 β 4 y n β p 20

step 1: reparameterize y 1 β 2 y 2 β 3 β 1 y 3 β 4 y n β p 21

step 1: reparameterize y 1 β 2 y 2 β 3 y 3 β 1 β 4 y n β p 22

step 1: reparameterize ˜ y 1 β 2 ˜ y 2 β 3 ˜ y 3 β 1 β 4 ˜ y n β p 23

step 1: reparameterize φ ( β p 2 ) ˜ y 1 β 2 ˜ y 2 p φ ( β p X 2 ) = ˜ x 1 ,j β j β 3 j =2 ˜ y 3 β 1 auxiliary variable encapsulates influence β 4 of other parameters ˜ y n β p 24

step 1: reparameterize Z y p y 1 | φ ( β p 2 )) p ( φ ( β p y p 2 ) d φ ( β p p ( β 1 , ˜ y 1 | ˜ 2 ) = p ( β 1 , ˜ 2 ) | ˜ 2 ) φ ( β p 2 ) ˜ y 1 β 2 ˜ y 2 p φ ( β p X 2 ) = ˜ x 1 ,j β j β 3 j =2 ˜ y 3 β 1 auxiliary variable encapsulates influence β 4 of other parameters ˜ y n β p 25

  step 2: estimate / compute • compute the posterior mean and variance of auxiliary variable   E [ φ ( β p y p V ar [ φ ( β p y p 2 ) | ˜ 2 ] 2 ) | ˜ 2 ] • can use a variety of methods ‣ AMP (if iterations converge) ‣ LASSO ‣ Bayesian Compressed Regression (BCR) ‣ [your favorite method] • the quantities are independent of target parameter! 26

step 3: approximate • apply Gaussian approximation to auxiliary variable to compute posterior approximation prior distribution Z y 1 | φ ( β p 2 ) , β 1 ) p ( β 1 ) p ( φ ( β p y p 2 ) d φ ( β p p ( β 1 | y ) ∝ p (˜ 2 ) | ˜ 2 ) replace with Gaussian by Gaussian using mean assumption on noise and variance from previous step • approximation can be accurate even if the prior and posterior are highly non-Gaussian 27

advantages of our framework • does not apply Gaussian approximation directly to posterior • has precise theoretical guarantees under the same assumptions as AMP • can leverage other methods (e.g. LASSO) to produce accurate approximations in settings where AMP fails 28

results: accuracy posterior inclusion probabilities p ( β 1 6 = 0 | y ) for small problem (p = 12) can compute MSE with respect to true posterior inclusion probability approximate message 0.06 MSE passing (AMP) ● ● ● ● ● ● ● Bayesian compressed ● ● ● ● ● regression (BCR) 0.00 ● ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 correlation between columns of matrix 29

results: accuracy posterior inclusion probabilities p ( β 1 6 = 0 | y ) for large problems, ground true is intractable compare methods using empirical ROC curves 1 1 AMP LASSO True positive rate True positive rate AMP LASSO matrix with ! matrix with incoherent columns iid entries correlated columns 0 0 0 1 0 1 False positive rate False positive rate 30

further directions framework extends to more general models Z p ( β , y ) = p ( β | θ ) p ( y | β , θ ) d θ conditionally conditionally independent Gaussian p Y p ( β | θ ) = p ( β j | θ ) j =1 31

Galen Reeves Departments of ECE and Statistical Science Duke - PowerPoint PPT Presentation

Scalable Posterior Approximation Galen Reeves Departments of ECE and Statistical Science Duke University August 2015 Collaborators at Duke David B. Dunson Willem van den Boom 2 variable selection / support recovery identify the locations

Galen Framework - Responsive Design Look and Feel Automation - Deepshikha Singh - Soumyajit Basu

Compressed Sensing under Optimal Quantization Alon Kipnis (Stanford) Galen Reeves (Duke) Yonina

Trail Talk Presentation by Don Reeves Radios Roundtable Discussion Radios Roundtable Discussion

Village and Community Hall Grants Mark Reeves Tel: 03000 417160 Email: mark.reeves@kent.gov.uk

The Drupal 8 The Drupal 8 Theming Experience Theming Experience Scott Reeves (Cottser)

Implementing Non-Native Quorum Sensing in E. coli Galen Gao Yuanyuan (Tina) Xu Bianca Lepe

Low Income Housing Tax Credit (LIHTC) Program City Council Meeting January 7, 2020 Galen Price

Renewables Portfolio Standards in the United States: A Status Update Galen Barbose Lawrence

Commercial Solar Nam Darghouth, Galen Barbose, Andrew Mills, Ryan Wiser Lawrence Berkeley

Designing PV Incentive Programs to Promote Performance: A Review of Current Practice Galen

Physics Teacher Education Coalition Project at CSU, Long Beach Galen Pickett, Chuhee Kwon, and

8 th Grade Parent Course Selection Night Dr. Scott Galen EHS Principal 8 th Grade Parent Course

Western States Forum June 2008 ODOT Traveler Information Systems Galen McGill PE 1 ATIS Vision

Water-quality monitoring of Canyon Lake, 2017-2018 Galen Hoogestraat Hydrologist, U.S.

When and Why to use a Classifier? When and Why to use a Classifier? Alan Rector Alan Rector

Reverberation Chambers for EM Applications Christopher L. Holloway John Ladbury, Galen Koepke,

TOWARDS GENDER EQUALITY IN PARENTING? A COMPARATIVE ANALYSIS OF PARENTAL LEAVE AND CHILD

UNCLASSIFIED Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's

Dillards November 2014 DISCLOSURE Marcato Capital Management LP (Marcato) is an SEC

A m em ber of the Minnesota State Colleges and Universities system , Bem idji State University is

Financial Management Presentation May 15, 2016 1 Outline 1. Introduction: Financial Council 2.

Josh Williams Talent Pipeline System Leader Strategy 4: Strategy 1: Analyze Talent Flows

SeeTest Continuous Testing Platform 1 Layout (Visual) Testing Integrate visual testing into your

Water Department Denton City Council Department Presentation 1 Water Department FTEs By