on line estimation with the multivariate gaussian
play

On-line estimation with the multivariate Gaussian distribution - PowerPoint PPT Presentation

On-line estimation with the multivariate Gaussian distribution Sanjoy Dasgupta and Daniel Hsu UC San Diego On-line estimation with the multivariate Gaussian distribution Nr. 1 Outline 1. On-line density estimation and previous work 2.


  1. On-line estimation with the multivariate Gaussian distribution Sanjoy Dasgupta and Daniel Hsu UC San Diego On-line estimation with the multivariate Gaussian distribution Nr. 1

  2. Outline 1. On-line density estimation and previous work 2. On-line multivariate Gaussian density estimation 3. Regret analysis of follow-the-leader 4. Open problem On-line estimation with the multivariate Gaussian distribution Nr. 2

  3. On-line density estimation Learning protocol For trial t = 1 , 2 , . . . 1. Learner chooses parameter θ t ∈ Θ 2. Nature chooses instance x t ∈ X 3. Learner incurs loss ℓ t ( θ t ) = ℓ ( θ t , x t ) In on-line (parametric) density estimation, ℓ ( θ, x ) = − log p ( x | θ ) where { p ( ·| θ ) : θ ∈ Θ } is a parametric family of densities. On-line estimation with the multivariate Gaussian distribution Nr. 3

  4. On-line density estimation Loss and regret T � L T = ℓ t ( θ t ) Total loss of learner after T trials t =1 T � L ∗ T = inf ℓ t ( θ ) Best-in-hindsight fixed-parameter loss after T trials θ ∈ Θ t =1 R T = L T − L ∗ Regret of learner after T trials T Goal: on-line density estimation strategies with regret bounds. As usual, no stochastic assumption about how Nature generates data. On-line estimation with the multivariate Gaussian distribution Nr. 4

  5. Previous work in on-line density estimation Some on-line learning literature Freund, 1996 Bernoulli (weighted coins) Azoury & Warmuth, 1999 some exponential families, including ↓ Takimoto & Warmuth, 2000a fixed-covariance Gaussian Takimoto & Warmuth, 2000b some one-dimensional exponential families • Straightforward on-line parameter estimation yields O (log T ) regret; subtle variations can improve the constants. – In case of fixed-covariance Gaussian, a recursively-defined update rule yields minimax strategy. • Often, simple random sequences yield lower bounds. On-line estimation with the multivariate Gaussian distribution Nr. 5

  6. On-line Gaussian density estimation For simplicity, just look at one-dimensional case; results generalize to multi- variate case with linear algebra. • Parameter space: Θ = R × R > 0 (mean and variance) – Learner chooses ( µ t , σ 2 t ) in trial t • Data space: X = R • Loss function: ℓ (( µ, σ 2 ) , x ) = ( x − µ ) 2 + 1 2 ln σ 2 2 σ 2 Can view as squared-loss on prediction µ with “confidence” parameter σ 2 . On-line estimation with the multivariate Gaussian distribution Nr. 6

  7. Main results • Standard formulation suffers from degenerate cases – similar to problems in MLE of Gaussian distributions. • Instead, consider alternative formulation with hallucinated zeroth trial. • We study the strategy that chooses sample mean and variance of previ- ous instances (follow-the-leader). Trivial regret bound is O ( T 2 ) . 1. For any p > 1 , there are sequences ( � x t ) for which regret is Ω( T 1 − 1 /p ) . Similar for any sublinear function in T . 2. Linear bound on regret for all sequences. 3. For any sequence, average regret is → 0 ; i.e. for any sequence, R T lim sup ≤ 0 . T T ≥ 1 On-line estimation with the multivariate Gaussian distribution Nr. 7

  8. Problems with standard formulation Unbounded instances • Learner’s means have | µ t | < ∞ . • Nature chooses x t so | x t − µ t | arbitrarily large. • ∴ Regret unbounded. Fix: assume all | x t | ≤ r for some r ≥ 0 (same as in fixed-variance case). On-line estimation with the multivariate Gaussian distribution Nr. 8

  9. Problems with standard formulation Non-varying instances 2 ln σ 2 = −∞ . • If x 1 = x 2 = . . . = x T , then L ∗ T T = lim σ 2 → 0 • ∴ Regret unbounded. Fix: force some variance by hallucinating a zeroth trial, and include in loss and regret quantities. � ( x − µ ) 2 � ℓ 0 ( µ, σ 2 ) = 1 + 1 � 2 ln σ 2 for some constant s > 0 . 2 σ 2 2 x ∈{± s } Consequence: L ∗ T > −∞ for all T . (Alternative: compare to best-in-hindsight loss plus Bregman divergence to initial parameter.) On-line estimation with the multivariate Gaussian distribution Nr. 9

  10. Follow-the-leader Follow-the-leader: use parameter setting that minimizes total loss over all previous trials. This is the “natural” strategy: choose sample mean and variance of previously seen instances. for some s 2 > 0 due to trial zero ( µ 1 , σ 2 1 ) = ( µ 0 , σ 2 0 ) = (0 , s 2 ) � � t t 1 1 � � s 2 + σ 2 x 2 − µ 2 µ t +1 = x i and t +1 = i t +1 t + 1 t + 1 i =1 i =1 • No randomization / perturbation (cf. follow-the- perturbed -leader). • Similar to algorithms proposed in Azoury & Warmuth (1999) for expo- nential families, which enjoyed O (log T ) regret bounds. In our setting, looks like O ( T 2 ) bounds (without further assumptions). On-line estimation with the multivariate Gaussian distribution Nr. 10

  11. #1: Regret lower bound for follow-the-leader Finite sequence : s = 1 ; sequence is x 1 = . . . = x T − 1 = 0 and x T = 1 . Learner’s parameters: µ t ≡ 0 , σ 2 t = 1 /t ; Final regret: R T = Ω(1 /σ 2 T ) = Ω( T ) . 1 0.5 2 σ t 0 5 10 15 20 t 10 5 R t 0 5 10 15 20 t On-line estimation with the multivariate Gaussian distribution Nr. 11

  12. #1: Regret lower bound for follow-the-leader Infinite sequence : iterate the finite sequence; regret arbitrarily close to linear. 60 50 40 30 R t 20 10 0 0.5 1 1.5 2 2.5 t 4 x 10 Let f : N → N be increasing. There is a sequence such that for any T in the range of f , T + 1 R T ≥ C · ( f − 1 ( T ) + 1) 2 . On-line estimation with the multivariate Gaussian distribution Nr. 12

  13. #2: Regret upper bound Can derive expression for regret of follow-the-leader either directly or, say, via Bregman divergence formulation (Azoury & Warmuth, 1999): T � 2 � ( x t − µ t ) 2 1 +1 � R T ≤ 4 ln( T + 1) + Θ(1) σ 2 4( t + 1) t t =1 � �� � O ( t 2 ) � �� � O ( t ) � �� � O ( T 2 ) (bound due to second-order Taylor approximation). t ≥ s 2 /t (where s 2 is trial-zero variance), have trivial bound of O ( T 2 ) . Since σ 2 Problem: small variances . . . (but they can’t always be small). On-line estimation with the multivariate Gaussian distribution Nr. 13

  14. #2: Regret upper bound Rewrite variance parameter: � � t − 1 t = 1 k � s 2 + σ 2 k + 1( x k − µ k ) 2 . ∆ k where ∆ k = t k =1 ( ∆ k is ≈ square distance of new instance to average of old instances.) ∴ Use potential function argument plus algebra: � � 2 T R T ≤ 1 ∆ t � ( t + 1) + O (log T ) s 2 + � t − 1 4 k =1 ∆ k t =1 � � s 2 ≤ C 4 · ( T + 1) 1 − + O (log T ) s 2 + � T k =1 ∆ k i.e. linear bound. On-line estimation with the multivariate Gaussian distribution Nr. 14

  15. Two regimes for follow-the-leader 1. Sequences achieving lower bounds have σ 2 t → 0 : regret can be arbitrarily close to linear. 2. If, instead, lim inf σ 2 t > 0 , then there is some T 0 such that for all T > T 0 , � � log T R T ≤ c · T 0 + O T 0 � � log T i.e. eventually , average regret R T /T tends to zero at rate O . T On-line estimation with the multivariate Gaussian distribution Nr. 15

  16. #3: lim sup average regret bound Actually, even when σ 2 t → 0 , average regret tends to zero. Formally, for any sequence, R T lim sup ≤ 0 . T T ≥ 1 Proof idea: show lim sup R T /T ≤ ǫ for any ǫ > 0 . • Two types of trials, depending on ∆ t ≈ ( x t − µ t ) 2 : 1. ∆ t small → contribute ≪ ǫ to regret. 2. ∆ t large → cause variance to rise substantially; behavior is more like second regime. On-line estimation with the multivariate Gaussian distribution Nr. 16

  17. Multivariate Gaussians • For d -dimensional Gaussians, essentially have extra factor of d in front of all bounds. • “Progress” in covariance happens one dimension at a time, so lower bounds can also exploit each dimension (almost) independently. • Potential function for upper bound is tr( Σ − 1 ) . On-line estimation with the multivariate Gaussian distribution Nr. 17

  18. Open problem • This work: analysis of follow-the-leader for on-line Gaussian density es- timation with arbitrary covariance. • Still open (from Takimoto & Warmuth, 2000a): What is the min-max strategy? On-line estimation with the multivariate Gaussian distribution Nr. 18

  19. Thanks! Authors supported by: • NSF grant IIS-0347646 • Engineering Instititue (Los Alamos National Laborartory / UC San Diego) graduate fellowship On-line estimation with the multivariate Gaussian distribution Nr. 19

  20. Incremental off-line algorithm Derived by Azoury & Warmuth (1999) for general exponential families. Update rule: choose initial parameter ( µ 1 , σ 2 1 ) ∈ R × R > 0 ; then � � t � ( µ t +1 , σ 2 η − 1 1 ∆(( µ 2 , σ 2 ) , ( µ 1 , σ 2 ℓ i ( µ, σ 2 ) t +1 ) = arg min 1 )) + ( µ,σ 2 ) i =1 where ∆( · , · ) is the Bregman divergence for Gaussians � ( µ − ˜ � µ ) 2 σ 2 σ 2 σ 2 )) = 1 + ˜ σ 2 − ln ˜ ∆(( µ, σ 2 ) , (˜ µ, ˜ σ 2 − 1 σ 2 2 and η − 1 is a parameter (e.g. η − 1 = 1 ). 1 1 On-line estimation with the multivariate Gaussian distribution Nr. 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend