optimal information passing how much vs how fast
play

Optimal Information Passing: How much vs. How fast Abbas Kazemipour - PowerPoint PPT Presentation

Optimal Information Passing: How much vs. How fast Abbas Kazemipour MAST Group Meeting University of Maryland. College Park kaazemi@umd.edu March 24, 2016 Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 1 / 20 Overview 1 Introduction


  1. Optimal Information Passing: How much vs. How fast Abbas Kazemipour MAST Group Meeting University of Maryland. College Park kaazemi@umd.edu March 24, 2016 Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 1 / 20

  2. Overview 1 Introduction Discrete Hawkes Process as a Markov Chain 2 Part 2: Stationary Distributions Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 2 / 20

  3. Discrete Hawkes Process as a Markov Chain 1 Discrete Hawkes Process: x k = Ber ( φ ( θ T x k − 1 k − p )) , (1) 2 History components form a Markov Chain x k − 1 k − p , a binary vector of length p . Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 3 / 20

  4. Discrete Hawkes Process as a Markov Chain 1 Discrete Hawkes Process: x k = Ber ( φ ( θ T x k − 1 k − p )) , (1) 2 History components form a Markov Chain x k − 1 k − p , a binary vector of length p . Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 3 / 20

  5. Simulation: p = 100, n = 500, s = 3 and γ n = 0 . 1. 1 Each spike train under this model, corresponds to a walk across the states. 2 The corresponding likelihood is the product of the weights of the edges visited along the walk. 3 Figure: State Space for p = 3 Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 4 / 20

  6. Simulation: p = 100, n = 500, s = 3 and γ n = 0 . 1. 1 Each spike train under this model, corresponds to a walk across the states. 2 The corresponding likelihood is the product of the weights of the edges visited along the walk. 3 Figure: State Space for p = 3 Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 4 / 20

  7. Simulation: p = 100, n = 500, s = 3 and γ n = 0 . 1. 1 Each spike train under this model, corresponds to a walk across the states. 2 The corresponding likelihood is the product of the weights of the edges visited along the walk. 3 Figure: State Space for p = 3 Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 4 / 20

  8. Introduction 1 We observe n consecutive snapshots of length p (a total of n + p − 1 samples): { x k } n k = − p +1 2 x n 1 can be approximated by a sequence of Bernoulli random variables with rates λ n 1 What is a good optimization problem for estimating θ ? Answer: ℓ 1 -regularized ML. How does a suitable n compare to p, s order-wise? Answer: n = O ( p 2 / 3 ). How does such an estimator perform compared to the traditional estimation methods? Answer: much better! But why? Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 5 / 20

  9. Introduction 1 We observe n consecutive snapshots of length p (a total of n + p − 1 samples): { x k } n k = − p +1 2 x n 1 can be approximated by a sequence of Bernoulli random variables with rates λ n 1 What is a good optimization problem for estimating θ ? Answer: ℓ 1 -regularized ML. How does a suitable n compare to p, s order-wise? Answer: n = O ( p 2 / 3 ). How does such an estimator perform compared to the traditional estimation methods? Answer: much better! But why? Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 5 / 20

  10. Introduction 1 We observe n consecutive snapshots of length p (a total of n + p − 1 samples): { x k } n k = − p +1 2 x n 1 can be approximated by a sequence of Bernoulli random variables with rates λ n 1 What is a good optimization problem for estimating θ ? Answer: ℓ 1 -regularized ML. How does a suitable n compare to p, s order-wise? Answer: n = O ( p 2 / 3 ). How does such an estimator perform compared to the traditional estimation methods? Answer: much better! But why? Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 5 / 20

  11. Introduction 1 We observe n consecutive snapshots of length p (a total of n + p − 1 samples): { x k } n k = − p +1 2 x n 1 can be approximated by a sequence of Bernoulli random variables with rates λ n 1 What is a good optimization problem for estimating θ ? Answer: ℓ 1 -regularized ML. How does a suitable n compare to p, s order-wise? Answer: n = O ( p 2 / 3 ). How does such an estimator perform compared to the traditional estimation methods? Answer: much better! But why? Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 5 / 20

  12. Introduction 1 We observe n consecutive snapshots of length p (a total of n + p − 1 samples): { x k } n k = − p +1 2 x n 1 can be approximated by a sequence of Bernoulli random variables with rates λ n 1 What is a good optimization problem for estimating θ ? Answer: ℓ 1 -regularized ML. How does a suitable n compare to p, s order-wise? Answer: n = O ( p 2 / 3 ). How does such an estimator perform compared to the traditional estimation methods? Answer: much better! But why? Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 5 / 20

  13. Introduction 1 We observe n consecutive snapshots of length p (a total of n + p − 1 samples): { x k } n k = − p +1 2 x n 1 can be approximated by a sequence of Bernoulli random variables with rates λ n 1 What is a good optimization problem for estimating θ ? Answer: ℓ 1 -regularized ML. How does a suitable n compare to p, s order-wise? Answer: n = O ( p 2 / 3 ). How does such an estimator perform compared to the traditional estimation methods? Answer: much better! But why? Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 5 / 20

  14. Preliminaries 1 Consider the Discrete Hawkes process model λ i = µ + θ ′ x i − 1 i − p , (2) 2 Negative (conditional) log-likelihood n � L ( θ ) = − 1 x i log λ i − λ i . (3) n i =1 3 Bernoulli approximation n � L ( θ ) ≈ − 1 x i log λ i + (1 − x i ) log(1 − λ i ) = h ( x 1 , x 2 , · · · , x n ) . n i =1 (4) 4 Negative log-likelihood equals the joint entropy (information) of the spiking. Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 6 / 20

  15. Preliminaries 1 Consider the Discrete Hawkes process model λ i = µ + θ ′ x i − 1 i − p , (2) 2 Negative (conditional) log-likelihood n � L ( θ ) = − 1 x i log λ i − λ i . (3) n i =1 3 Bernoulli approximation n � L ( θ ) ≈ − 1 x i log λ i + (1 − x i ) log(1 − λ i ) = h ( x 1 , x 2 , · · · , x n ) . n i =1 (4) 4 Negative log-likelihood equals the joint entropy (information) of the spiking. Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 6 / 20

  16. Preliminaries 1 Consider the Discrete Hawkes process model λ i = µ + θ ′ x i − 1 i − p , (2) 2 Negative (conditional) log-likelihood n � L ( θ ) = − 1 x i log λ i − λ i . (3) n i =1 3 Bernoulli approximation n � L ( θ ) ≈ − 1 x i log λ i + (1 − x i ) log(1 − λ i ) = h ( x 1 , x 2 , · · · , x n ) . n i =1 (4) 4 Negative log-likelihood equals the joint entropy (information) of the spiking. Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 6 / 20

  17. Preliminaries 1 Consider the Discrete Hawkes process model λ i = µ + θ ′ x i − 1 i − p , (2) 2 Negative (conditional) log-likelihood n � L ( θ ) = − 1 x i log λ i − λ i . (3) n i =1 3 Bernoulli approximation n � L ( θ ) ≈ − 1 x i log λ i + (1 − x i ) log(1 − λ i ) = h ( x 1 , x 2 , · · · , x n ) . n i =1 (4) 4 Negative log-likelihood equals the joint entropy (information) of the spiking. Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 6 / 20

  18. ML vs. ℓ 1 -regularization Maximum Likelihood Estimation � θ ML = arg min L ( θ ) , (5) θ ∈ Θ 1 Maximizes the joint entropy of spiking to have maximum transferred information. ℓ 1 -regularized estimate � θ sp := arg min L ( θ ) + γ n � θ � 1 . (6) θ ∈ Θ 2 What does regularization do apart from motivating sparsity? 3 To show: regularization determines the speed of data transfer. 4 Battle between speed and amount of information. Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 7 / 20

  19. ML vs. ℓ 1 -regularization Maximum Likelihood Estimation � θ ML = arg min L ( θ ) , (5) θ ∈ Θ 1 Maximizes the joint entropy of spiking to have maximum transferred information. ℓ 1 -regularized estimate � θ sp := arg min L ( θ ) + γ n � θ � 1 . (6) θ ∈ Θ 2 What does regularization do apart from motivating sparsity? 3 To show: regularization determines the speed of data transfer. 4 Battle between speed and amount of information. Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 7 / 20

  20. ML vs. ℓ 1 -regularization Maximum Likelihood Estimation � θ ML = arg min L ( θ ) , (5) θ ∈ Θ 1 Maximizes the joint entropy of spiking to have maximum transferred information. ℓ 1 -regularized estimate � θ sp := arg min L ( θ ) + γ n � θ � 1 . (6) θ ∈ Θ 2 What does regularization do apart from motivating sparsity? 3 To show: regularization determines the speed of data transfer. 4 Battle between speed and amount of information. Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 7 / 20

  21. Second Largest Eigenvalue Modulus and its Significance 1 The Markov chain defined by the history components of the Hawkes process has a stationary distribution π . 2 Converges to π irrespective of the initial state. 3 How fast this happens determines how fast the data has been transferred. 4 The transition probability matrix is a function of θ . 5 Perron-Frobenius theorem: has unique largest eigenvalue λ 1 = 1. 6 The second largest eigenvalue modulus determines the speed of convergence. λ = max { λ 2 , − λ n } i ∈ S � P t ( i, . ) − π ( . ) �∼ Cλ t , max t → ∞ Abbas Kazemipour (UMD) FMMC, SLEM March 24, 2016 8 / 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend