Optimal Information Passing: How much vs. How fast
Abbas Kazemipour
MAST Group Meeting University of Maryland. College Park kaazemi@umd.edu
March 24, 2016
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 1 / 20
Optimal Information Passing: How much vs. How fast Abbas Kazemipour - - PowerPoint PPT Presentation
Optimal Information Passing: How much vs. How fast Abbas Kazemipour MAST Group Meeting University of Maryland. College Park kaazemi@umd.edu March 24, 2016 Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 1 / 20 Overview 1 Introduction
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 1 / 20
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 2 / 20
1 Discrete Hawkes Process:
2 History components form a Markov Chain
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 3 / 20
1 Discrete Hawkes Process:
2 History components form a Markov Chain
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 3 / 20
1 Each spike train under this model, corresponds to a walk across
2 The corresponding likelihood is the product of the weights of the
3
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 4 / 20
1 Each spike train under this model, corresponds to a walk across
2 The corresponding likelihood is the product of the weights of the
3
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 4 / 20
1 Each spike train under this model, corresponds to a walk across
2 The corresponding likelihood is the product of the weights of the
3
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 4 / 20
1 We observe n consecutive snapshots of length p (a total of
2 xn
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 5 / 20
1 We observe n consecutive snapshots of length p (a total of
2 xn
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 5 / 20
1 We observe n consecutive snapshots of length p (a total of
2 xn
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 5 / 20
1 We observe n consecutive snapshots of length p (a total of
2 xn
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 5 / 20
1 We observe n consecutive snapshots of length p (a total of
2 xn
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 5 / 20
1 We observe n consecutive snapshots of length p (a total of
2 xn
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 5 / 20
1 Consider the Discrete Hawkes process model
2 Negative (conditional) log-likelihood
3 Bernoulli approximation
4 Negative log-likelihood equals the joint entropy (information) of
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 6 / 20
1 Consider the Discrete Hawkes process model
2 Negative (conditional) log-likelihood
3 Bernoulli approximation
4 Negative log-likelihood equals the joint entropy (information) of
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 6 / 20
1 Consider the Discrete Hawkes process model
2 Negative (conditional) log-likelihood
3 Bernoulli approximation
4 Negative log-likelihood equals the joint entropy (information) of
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 6 / 20
1 Consider the Discrete Hawkes process model
2 Negative (conditional) log-likelihood
3 Bernoulli approximation
4 Negative log-likelihood equals the joint entropy (information) of
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 6 / 20
1 Maximizes the joint entropy of spiking to have maximum
2 What does regularization do apart from motivating sparsity? 3 To show: regularization determines the speed of data transfer. 4 Battle between speed and amount of information. Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 7 / 20
1 Maximizes the joint entropy of spiking to have maximum
2 What does regularization do apart from motivating sparsity? 3 To show: regularization determines the speed of data transfer. 4 Battle between speed and amount of information. Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 7 / 20
1 Maximizes the joint entropy of spiking to have maximum
2 What does regularization do apart from motivating sparsity? 3 To show: regularization determines the speed of data transfer. 4 Battle between speed and amount of information. Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 7 / 20
1 The Markov chain defined by the history components of the
2 Converges to π irrespective of the initial state. 3 How fast this happens determines how fast the data has been
4 The transition probability matrix is a function of θ. 5 Perron-Frobenius theorem: has unique largest eigenvalue λ1 = 1. 6 The second largest eigenvalue modulus determines the speed of
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 8 / 20
1 The Markov chain defined by the history components of the
2 Converges to π irrespective of the initial state. 3 How fast this happens determines how fast the data has been
4 The transition probability matrix is a function of θ. 5 Perron-Frobenius theorem: has unique largest eigenvalue λ1 = 1. 6 The second largest eigenvalue modulus determines the speed of
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 8 / 20
1 The Markov chain defined by the history components of the
2 Converges to π irrespective of the initial state. 3 How fast this happens determines how fast the data has been
4 The transition probability matrix is a function of θ. 5 Perron-Frobenius theorem: has unique largest eigenvalue λ1 = 1. 6 The second largest eigenvalue modulus determines the speed of
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 8 / 20
1 The Markov chain defined by the history components of the
2 Converges to π irrespective of the initial state. 3 How fast this happens determines how fast the data has been
4 The transition probability matrix is a function of θ. 5 Perron-Frobenius theorem: has unique largest eigenvalue λ1 = 1. 6 The second largest eigenvalue modulus determines the speed of
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 8 / 20
1 The Markov chain defined by the history components of the
2 Converges to π irrespective of the initial state. 3 How fast this happens determines how fast the data has been
4 The transition probability matrix is a function of θ. 5 Perron-Frobenius theorem: has unique largest eigenvalue λ1 = 1. 6 The second largest eigenvalue modulus determines the speed of
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 8 / 20
1 The Markov chain defined by the history components of the
2 Converges to π irrespective of the initial state. 3 How fast this happens determines how fast the data has been
4 The transition probability matrix is a function of θ. 5 Perron-Frobenius theorem: has unique largest eigenvalue λ1 = 1. 6 The second largest eigenvalue modulus determines the speed of
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 8 / 20
1 λ = 0
2 λ = 0.5
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 9 / 20
1 λ = 0
2 λ = 0.5
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 9 / 20
1 Mixing rate of the chain, spectral gap 2 Bounds on TV minimization, Eigenvalue decomposition 3 Dirichlet forms and Poincare inequality 4 Comparison techniques 5 Wilson’s method 6 Nash inequalities 7 Evolving sets and martingales 8 Representation theory Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 10 / 20
1 Mixing rate of the chain, spectral gap 2 Bounds on TV minimization, Eigenvalue decomposition 3 Dirichlet forms and Poincare inequality 4 Comparison techniques 5 Wilson’s method 6 Nash inequalities 7 Evolving sets and martingales 8 Representation theory Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 10 / 20
1 Mixing rate of the chain, spectral gap 2 Bounds on TV minimization, Eigenvalue decomposition 3 Dirichlet forms and Poincare inequality 4 Comparison techniques 5 Wilson’s method 6 Nash inequalities 7 Evolving sets and martingales 8 Representation theory Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 10 / 20
1 Mixing rate of the chain, spectral gap 2 Bounds on TV minimization, Eigenvalue decomposition 3 Dirichlet forms and Poincare inequality 4 Comparison techniques 5 Wilson’s method 6 Nash inequalities 7 Evolving sets and martingales 8 Representation theory Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 10 / 20
1 Mixing rate of the chain, spectral gap 2 Bounds on TV minimization, Eigenvalue decomposition 3 Dirichlet forms and Poincare inequality 4 Comparison techniques 5 Wilson’s method 6 Nash inequalities 7 Evolving sets and martingales 8 Representation theory Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 10 / 20
1 Mixing rate of the chain, spectral gap 2 Bounds on TV minimization, Eigenvalue decomposition 3 Dirichlet forms and Poincare inequality 4 Comparison techniques 5 Wilson’s method 6 Nash inequalities 7 Evolving sets and martingales 8 Representation theory Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 10 / 20
1 Mixing rate of the chain, spectral gap 2 Bounds on TV minimization, Eigenvalue decomposition 3 Dirichlet forms and Poincare inequality 4 Comparison techniques 5 Wilson’s method 6 Nash inequalities 7 Evolving sets and martingales 8 Representation theory Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 10 / 20
1 Mixing rate of the chain, spectral gap 2 Bounds on TV minimization, Eigenvalue decomposition 3 Dirichlet forms and Poincare inequality 4 Comparison techniques 5 Wilson’s method 6 Nash inequalities 7 Evolving sets and martingales 8 Representation theory Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 10 / 20
1 FMMC problem: choose P such that the SLEM is minimized 2 Quick review: SDP
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 11 / 20
1 FMMC problem: choose P such that the SLEM is minimized 2 Quick review: SDP
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 11 / 20
1 Assumption:
2 Stationary distribution: (1/n)✶. 3 Idea: project onto orthogonal complement of (1/n)✶.
4 Thus
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 12 / 20
1 Assumption:
2 Stationary distribution: (1/n)✶. 3 Idea: project onto orthogonal complement of (1/n)✶.
4 Thus
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 12 / 20
1 Assumption:
2 Stationary distribution: (1/n)✶. 3 Idea: project onto orthogonal complement of (1/n)✶.
4 Thus
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 12 / 20
1 Assumption:
2 Stationary distribution: (1/n)✶. 3 Idea: project onto orthogonal complement of (1/n)✶.
4 Thus
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 12 / 20
1
2 Equivalently:
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 13 / 20
1
2 Equivalently:
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 13 / 20
1 Cannot directly apply to DHP. 2 P = P T . 3 States grow exponentially. 4 What can we do then?
5 Important implication: ℓ1-regularization not only enforces sparsity
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 14 / 20
1 Cannot directly apply to DHP. 2 P = P T . 3 States grow exponentially. 4 What can we do then?
5 Important implication: ℓ1-regularization not only enforces sparsity
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 14 / 20
1 Cannot directly apply to DHP. 2 P = P T . 3 States grow exponentially. 4 What can we do then?
5 Important implication: ℓ1-regularization not only enforces sparsity
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 14 / 20
1 Cannot directly apply to DHP. 2 P = P T . 3 States grow exponentially. 4 What can we do then?
5 Important implication: ℓ1-regularization not only enforces sparsity
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 14 / 20
1 Cannot directly apply to DHP. 2 P = P T . 3 States grow exponentially. 4 What can we do then?
5 Important implication: ℓ1-regularization not only enforces sparsity
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 14 / 20
1 Given a large P find the stationary distributions 2 Applications: statistical inference, network analysis etc. 3 Power iteration methods (Lanczos algorithm, Arnoldi algorithm
4 Can we write this problem as a convex optimization problem? Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 15 / 20
1 Given a large P find the stationary distributions 2 Applications: statistical inference, network analysis etc. 3 Power iteration methods (Lanczos algorithm, Arnoldi algorithm
4 Can we write this problem as a convex optimization problem? Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 15 / 20
1 Given a large P find the stationary distributions 2 Applications: statistical inference, network analysis etc. 3 Power iteration methods (Lanczos algorithm, Arnoldi algorithm
4 Can we write this problem as a convex optimization problem? Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 15 / 20
1 Given a large P find the stationary distributions 2 Applications: statistical inference, network analysis etc. 3 Power iteration methods (Lanczos algorithm, Arnoldi algorithm
4 Can we write this problem as a convex optimization problem? Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 15 / 20
1 Quick review:
2 For a Markov chain n-step transition probabilities is given by
3 Goal: minimize the stationary states of the Markov chain to find
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 16 / 20
1 Quick review:
2 For a Markov chain n-step transition probabilities is given by
3 Goal: minimize the stationary states of the Markov chain to find
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 16 / 20
1 Quick review:
2 For a Markov chain n-step transition probabilities is given by
3 Goal: minimize the stationary states of the Markov chain to find
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 16 / 20
1 Suppose x is a positive s-sparse vector. 2 We have measurements y = Ax. 3 wlog assume x1= 1 (measure y1 = ✶T x for example). 4 Suppose x is the stationary distribution of a Markov chain !
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 17 / 20
1 Suppose x is a positive s-sparse vector. 2 We have measurements y = Ax. 3 wlog assume x1= 1 (measure y1 = ✶T x for example). 4 Suppose x is the stationary distribution of a Markov chain !
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 17 / 20
1 Suppose x is a positive s-sparse vector. 2 We have measurements y = Ax. 3 wlog assume x1= 1 (measure y1 = ✶T x for example). 4 Suppose x is the stationary distribution of a Markov chain !
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 17 / 20
1 Suppose x is a positive s-sparse vector. 2 We have measurements y = Ax. 3 wlog assume x1= 1 (measure y1 = ✶T x for example). 4 Suppose x is the stationary distribution of a Markov chain !
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 17 / 20
1 By Borel-Cantelli’s lemma (and a little bit more work) state i is
2 Goal: try to minimize
3
4 Unfortunately the objective function does not converge! Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 18 / 20
1 By Borel-Cantelli’s lemma (and a little bit more work) state i is
2 Goal: try to minimize
3
4 Unfortunately the objective function does not converge! Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 18 / 20
1 By Borel-Cantelli’s lemma (and a little bit more work) state i is
2 Goal: try to minimize
3
4 Unfortunately the objective function does not converge! Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 18 / 20
1 By Borel-Cantelli’s lemma (and a little bit more work) state i is
2 Goal: try to minimize
3
4 Unfortunately the objective function does not converge! Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 18 / 20
1
2 Need to remove λ1 = 1 ! 3 Functional matrix Z of the Markov chain:
4 Elements of Z are finite!
5 Zjk represents how quickly the probability mass at node k from a
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 19 / 20
1
2 Need to remove λ1 = 1 ! 3 Functional matrix Z of the Markov chain:
4 Elements of Z are finite!
5 Zjk represents how quickly the probability mass at node k from a
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 19 / 20
1
2 Need to remove λ1 = 1 ! 3 Functional matrix Z of the Markov chain:
4 Elements of Z are finite!
5 Zjk represents how quickly the probability mass at node k from a
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 19 / 20
1
2 Need to remove λ1 = 1 ! 3 Functional matrix Z of the Markov chain:
4 Elements of Z are finite!
5 Zjk represents how quickly the probability mass at node k from a
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 19 / 20
1
2 Need to remove λ1 = 1 ! 3 Functional matrix Z of the Markov chain:
4 Elements of Z are finite!
5 Zjk represents how quickly the probability mass at node k from a
Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 19 / 20
1
2 Still not convex ! Need to relax. 3 Might be tougher than ℓ1-regularization, just a new perspective 4 Might help come up with new algorithms [Ozdaglar et.al.] Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 20 / 20
1
2 Still not convex ! Need to relax. 3 Might be tougher than ℓ1-regularization, just a new perspective 4 Might help come up with new algorithms [Ozdaglar et.al.] Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 20 / 20
1
2 Still not convex ! Need to relax. 3 Might be tougher than ℓ1-regularization, just a new perspective 4 Might help come up with new algorithms [Ozdaglar et.al.] Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 20 / 20
1
2 Still not convex ! Need to relax. 3 Might be tougher than ℓ1-regularization, just a new perspective 4 Might help come up with new algorithms [Ozdaglar et.al.] Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 20 / 20