Empirical Bayes Newton Method Bayesian Linear Models MAP Learning - PowerPoint PPT Presentation

Empirical Bayes Will Penny Linear Models fMRI analysis Gradient Ascent Online learning Delta Rule Empirical Bayes Newton Method Bayesian Linear Models MAP Learning Will Penny MEG Source Reconstruction Empirical Bayes Model Evidence Isotropic Covariances 3rd March 2011 Linear Covariances Gradient Ascent MEG Source Reconstruction Restricted Maximum Likelihood Augmented Form ReML Objective Function References

Empirical Bayes General Linear Model Will Penny Linear Models The General Linear Model (GLM) is given by fMRI analysis Gradient Ascent y = Xw + e Online learning Delta Rule Newton Method where y are data, X is a design matrix, and e are zero Bayesian Linear mean Gaussian errors with covariance V . The above Models MAP Learning equation implicitly defines the likelihood function MEG Source Reconstruction p ( y | w ) = N ( y ; Xw , V ) Empirical Bayes Model Evidence Isotropic Covariances Linear Covariances where the Normal density is given by Gradient Ascent MEG Source 1 � − 1 � Reconstruction 2 ( x − µ ) T C − 1 ( x − µ ) N ( x ; µ, C ) = ( 2 π ) N / 2 | C | 1 / 2 exp Restricted Maximum Likelihood Augmented Form ReML Objective Function References

Empirical Bayes Maximum Likelihood Will Penny If we know V then we can estimate w by maximising the Linear Models likelihood or equivalently the log-likelihood fMRI analysis Gradient Ascent − N 2 log 2 π − 1 2 log | V | − 1 2 ( y − Xw ) T V − 1 ( y − Xw ) Online learning L = Delta Rule Newton Method We can compute the gradient with help from the Matrix Bayesian Linear Models Reference Manual MAP Learning MEG Source dL Reconstruction dw = X T V − 1 y − X T V − 1 Xw Empirical Bayes Model Evidence Isotropic Covariances to zero. This leads to the solution Linear Covariances Gradient Ascent MEG Source w ML = ( X T V − 1 X ) − 1 X T V − 1 y ˆ Reconstruction Restricted Maximum This is often referred to as Weighted Least Squares Likelihood Augmented Form (WLS), ˆ w ML = ˆ w WLS . For example, some observations ReML Objective Function may be more reliable than others (Penny et al, 2007). References

Empirical Bayes fMRI analysis Will Penny For fMRI time series analysis we have a linear model at each voxel i Linear Models fMRI analysis y i = Xw i + e i Gradient Ascent V i = Cov ( e i ) is estimated first (see later) and then the Online learning Delta Rule regression coefficients are computed using Maximum Newton Method Likelihood (ML) estimation. Bayesian Linear Models w i = ( X T V − 1 X ) − 1 X T V − 1 ˆ y i MAP Learning i i MEG Source The fitted responses are then ˆ y i = X ˆ w i (SPM Manual) Reconstruction Empirical Bayes Model Evidence Isotropic Covariances Linear Covariances Gradient Ascent MEG Source Reconstruction Restricted Maximum Likelihood Augmented Form ReML Objective Function References

Empirical Bayes fMRI analysis Will Penny The uncertainty in the ML estimates is given by Linear Models S = ( X T V − 1 X ) − 1 fMRI analysis i Gradient Ascent Contrast vectors c can then be used to test for specific Online learning Delta Rule effects Newton Method µ c = c T ˆ w i Bayesian Linear Models The uncertainty in the effect is then MAP Learning MEG Source Reconstruction σ 2 c = c T Sc Empirical Bayes Model Evidence and a t-score is then given by t = µ c /σ c Isotropic Covariances Linear Covariances Gradient Ascent MEG Source Reconstruction Restricted Maximum Likelihood Augmented Form ReML Objective Function References

Empirical Bayes Least Squares Will Penny Linear Models fMRI analysis Gradient Ascent For isotropic error covariance V = λ I , the normal Online learning equations are Delta Rule Newton Method dL dw = λ X T y − λ X T Xw Bayesian Linear Models MAP Learning MEG Source Reconstruction Empirical Bayes Model Evidence Isotropic Covariances This leads to the Ordinary Least Squares (OLS) solution Linear Covariances Gradient Ascent w ML = ˆ ˆ w OLS , MEG Source w OLS = ( X T X ) − 1 X T y ˆ Reconstruction Restricted Maximum Likelihood Augmented Form ReML Objective Function References

Empirical Bayes Gradient Ascent Will Penny In gradient ascent approaches an objective function, L , is Linear Models maximised by changing parameters w to follow the local fMRI analysis gradient Gradient Ascent τ dw dt = dL Online learning Delta Rule dw Newton Method where τ is the time constant that defines the learning Bayesian Linear Models rate. In discrete time, parameters are then updated as MAP Learning MEG Source Reconstruction w t = w t − 1 + 1 dL Empirical Bayes τ dw t − 1 Model Evidence Isotropic Covariances Linear Covariances Smaller time constants τ correspond to bigger updates at Gradient Ascent each step. That is, faster learning rates. In the batch MEG Source Reconstruction version of gradient ascent the gradient is computed Restricted Maximum based on all pattern pairs x n , y n for n = 1 .. N . In the Likelihood sequential version updates are based on gradients from Augmented Form ReML Objective Function individual patterns (see later). References

Empirical Bayes Neural Implementations Will Penny Linear Models fMRI analysis Many ’neural implementations’ or neural network models Gradient Ascent Online learning are derived by taking a standard statistical model eg. Delta Rule linear models, hierarchical linear models, (non-)linear Newton Method dynamical systems, and then maximimising some cost Bayesian Linear Models function (eg the likelihood or posterior probability) using a MAP Learning sequential gradient ascent approach. MEG Source Reconstruction Empirical Bayes Model Evidence Isotropic Covariances Linear Covariances When the same model is applied to, for example, Gradient Ascent MEG Source neuroimaging data more sophisticated optimisation Reconstruction methods eg. Newton Methods (see later) are used. Restricted Maximum Likelihood Augmented Form ReML Objective Function References

Empirical Bayes Online Learning - Sequential Gradient Ascent Will Penny In some situations observations may be made sequentially. For independent observations we have Linear Models fMRI analysis N Gradient Ascent � p ( y | w ) = p ( y n | w ) Online learning Delta Rule n = 1 Newton Method where Bayesian Linear Models N ( y n ; x n w , λ − 1 ) p ( y n | w ) = MAP Learning MEG Source 1 � − λ � Reconstruction 2 ( y n − x n w ) 2 = Z exp Empirical Bayes Model Evidence and x n is the n th row of X . Now take logs to give Isotropic Covariances Linear Covariances Gradient Ascent L n = log p ( y n | w ) MEG Source Reconstruction − λ 2 ( y n − x n w ) 2 − log Z = Restricted Maximum Likelihood Predictions with smaller error have higher likelihood. Augmented Form ReML Objective Function Online learning then proceeds by following the gradients References based on individual patterns.

Empirical Bayes Online Learning Will Penny Linear Models fMRI analysis Gradient Ascent For the linear model the learning rule for the i th Online learning Delta Rule coefficient is Newton Method Bayesian Linear τ dw i dL n Models = dt dw i MAP Learning MEG Source = λ x n ( i )( y n − x n w ) Reconstruction Empirical Bayes Model Evidence Learning is faster for high precision observations, larger Isotropic Covariances inputs and bigger prediction errors. One can use this in Linear Covariances Gradient Ascent signal processing applications such as Real-Time fMRI. MEG Source Reconstruction Restricted Maximum Likelihood Augmented Form ReML Objective Function References

Empirical Bayes Delta Rule Will Penny Linear Models fMRI analysis If λ is the same for all observations it can be absorbed Gradient Ascent into the learning rate. The above expression then Online learning reduces to the Delta Rule (Widrow and Hoff, 1960). Delta Rule Newton Method τ dw i Bayesian Linear dt = x n ( i )( y n − x n w ) Models MAP Learning MEG Source Reconstruction Empirical Bayes Model Evidence Isotropic Covariances If observations have different precisions then Linear Covariances Gradient Ascent MEG Source τ dw i Reconstruction dt = λ n x n ( i )( y n − x n w ) Restricted Maximum Likelihood Augmented Form ReML Objective Function References

Empirical Bayes Example - Linear Regression Will Penny For the linear model Linear Models fMRI analysis Gradient Ascent Y = Xw + e Online learning Delta Rule with Cov ( e ) = λ − 1 I the log-likelihood is Newton Method Bayesian Linear − λ Models 2 ( y − Xw ) T ( y − Xw ) L ( w ) = MAP Learning MEG Source Reconstruction The gradient is Empirical Bayes Model Evidence Isotropic Covariances dL Linear Covariances j ( w ) = Gradient Ascent dw MEG Source λ X T y − λ X T Xw = Reconstruction Restricted λ X T ( y − Xw ) = Maximum Likelihood Augmented Form Following this gradient corresponds to the Delta rule. ReML Objective Function References

Empirical Bayes Newton Method Bayesian Linear Models MAP Learning - PowerPoint PPT Presentation

Empirical Bayes Will Penny Linear Models fMRI analysis Gradient Ascent Online learning Delta Rule Empirical Bayes Newton Method Bayesian Linear Models MAP Learning Will Penny MEG Source Reconstruction Empirical Bayes Model Evidence

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Bayes Theorem Thomas Bayes (1701-1761) Simple form of Bayes Theorem, for

Arthur Berg Pennsylvania State University Introduction Bayes Estimation Empirical Bayes

DATA MINING: NAVE BAYES 1 Nave Bayes Classifier Thomas Bayes 1702 - 1761 We will start off

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

STAT 339 Naive Bayes Classification 8-10 March 2017 Colin Reimer Dawson Outline Naive Bayes

Bayes Classifiers Nave Bayes Classification Patrick Mair Bayes Classifiers Weather data

I ntroduction to Mobile Robotics Bayes Filter Kalm an Filter Wolfram Burgard 1 Bayes

Introduction to Machine Learning Vapnik Chervonenkis Theory Barnabs Pczos Empirical Risk

Functional Principal Component Analysis May 14, 2018 Empirical Principal Component FPC for the

Formal Modeling in Cognitive Science Independence Lecture 23: Conditional Probability; Bayes

Nave Bayes Classification Nickolai Riabov, Kenneth Tiong Brown University Fall 2013 Nickolai

BAYES FORMULA a two-stage experiment Xingru Chen xingru.chen.gr@dartmouth.edu XC 2020

Another Walkthrough of Variational Bayes Bevan Jones ML for NLP Reading Group The University of

Probabilistic Diagnosis Albert R Meyer, May 3, 2013 Albert R Meyer, May 3, 2013 bayes.1

Adopting Semi-supervised Learning Algorithms for Mining Remote Sensing Imagery: Summary of Results

Expectation Maximization, and Learning from Partly Unobserved Data Recommended readings:

S9226 Fast singular value decomposition on GPU Lung-Sheng Chien, NVIDIA lchien@nvidia.com Samuel

A study dy of Jumper r FIV due to multip iphas hase intern rnal al flow: w: under erst

Conducting Patient-Centered Outcomes Research (PCOR) Applicant Town Hall January 30, 2018

OMOP and Mini-Sentinel Collaborations Supporting Routine Prospective Surveillance Jennifer Nelson

Proforma for empirical studies[1] Sampling Measures sample origin name or description of

Sets and Set Operations Dr. Philip C. Ritchey Set Notation Set: an unordered collection of