nonlinear signal processing 2007 2008
play

Nonlinear Signal Processing 2007-2008 Course Overview Instituto - PowerPoint PPT Presentation

Nonlinear Signal Processing 2007-2008 Course Overview Instituto Superior T ecnico, Lisbon, Portugal Jo ao Xavier jxavier@isr.ist.utl.pt Introduction This course is about applications of differential geometry in signal processing


  1. Nonlinear Signal Processing 2007-2008 Course Overview Instituto Superior T´ ecnico, Lisbon, Portugal Jo˜ ao Xavier jxavier@isr.ist.utl.pt

  2. Introduction • This course is about applications of differential geometry in signal processing • What is differential geometry ? − generalization of differential calculus to manifolds • What is a manifold ? − smooth curved set − no vector space structure, no canonical coordinate system − looks locally like an Euclidean space, but not globally

  3. Introduction • General idea Manifold Manifold Not a manifold

  4. Introduction • Example: graph of f ( x, y ) = 1 − x 2 − y 2 R 3 { ( x, y, z ) : z = f ( x, y ) }

  5. Introduction • Example: n × n orthogonal matrices R n × n { X : X ⊤ X = I n }

  6. Introduction • Example: n × m matrices with rank r R n × m { X : rank X = r } • Note: n × m matrices with rank ≤ r is not a manifold

  7. Introduction • Example: n × m matrices with prescribed singular values s i R n × m { X : σ i ( X ) = s i }

  8. Introduction • Example: n × n symmetric matrices s.t. λ max has multiplicity k R n × n : λ 1 ( X ) = · · · = λ k ( X ) > λ k +1 ( X ) }

  9. Introduction • Not all manifolds are “naturally” embedded in an Euclidean space • Example: set of k -dimensional subspaces in R n (Grassmann manifold) R n Manifold

  10. Introduction • How is differential geometry useful ? − systematic framework for nonlinear problems (generalizes linear algebra) − elegant geometric re-interpretations of existing solutions • Karmakar’s algorithm for linear programming • Sequential Quadratic Programming methods in optimization • Rao distance between pdf’s in parametric statistical families • Jeffrey’s noninformative prior in Bayesian setups • Cram´ er-Rao bound for parametric estimation with ambiguities • ... many more − suggests new powerful solutions

  11. Introduction • Where has differential geometry been applied ? − Optimization on manifolds − Kendall’s theory of shapes − Random matrix theory − Information geometry − Geometrical interpretation of Jeffreys’ prior − Performance bounds for estimation problems posed on manifolds − Doing statistics on manifolds (generalized PCA) − ... a lot more (signal processing, econometrics, control, etc)

  12. Application: optimization on manifolds • Unconstrained problem x ∈ R n f ( x ) min • Line-search algorithm: x k +1 = x k + α k d k x k +1 d k x k • d k = −∇ f ( x k ) [gradient], d k = −∇ 2 f ( x k ) − 1 ∇ f ( x k ) [Newton], others . . .

  13. Application: optimization on manifolds • Constrained problem x ∈ M f ( x ) min • Re-interpreted as an unconstrained problem on manifold M • Geodesic-search algorithm: x k +1 = exp x k ( α k d k ) d k x k x k +1 M

  14. Application: optimization on manifolds • Works for abstract spaces (e.g. Grassmann manifold) • Theory provides generalization of gradient, Newton direction (not obvious) • Closed-form solutions for important manifolds (e.g. orthogonal matrices) • Geodesic-search is not the only possibility: − optimization in local coordinates − generalization of trust-region methods • Innumerous applications : − blind source separation, image processing, rank-reduced Wiener filter,. . .

  15. Application: optimization on manifolds • Example: Signal model y [ t ] = Qx [ t ] + w [ t ] t = 1 , 2 , . . . , T − Q : unknown orthogonal matrix ( Q ⊤ Q = I N ) − x [ t ] : known landmarks − w [ t ] iid ∼ N (0 , Σ) • Maximum-Likelihood estimate: Q ∗ = arg Q ∈ O ( N ) p ( Y ; Q ) max − O ( N ) = group of N × N orthogonal matrices � � − Y = matrix of observations y [1] y [2] · · · y [ T ] � � − X = matrix of landmarks x [1] x [2] · · · x [ T ]

  16. Application: optimization on manifolds • Optimization problem: Orthogonal Procrustes rotation Q ∗ Q ∈ O ( N ) � Y − QX � 2 = arg min Σ − 1 � � � � Q T Σ − 1 Q � Q T Σ − 1 � = arg Q ∈ O ( N ) tr min R xx − tr R yx � T � T t =1 y [ t ] x [ t ] ⊤ and � − � R yx = 1 R xx = 1 t =1 x [ t ] x [ t ] ⊤ T T • The eigenstructure of Σ controls the Hessian of the objective: κ (Σ − 1 ) = λ max (Σ − 1 ) λ min (Σ − 1 ) is the condition number of Σ − 1

  17. Application: optimization on manifolds • Example: N = 5 , T = 100 , Σ = diag (1 , 1 , 1 , 1 , 1) , κ (Σ − 1 ) = 1 2 10 1 10 0 10 −1 10 −2 10 −3 10 0 5 10 15 20 25 30 Iteration ◦ =projected gradient � =gradient geodesic descent ⋄ =Newton geodesic descent

  18. Application: optimization on manifolds • Example: N = 5 , T = 100 , Σ = diag (0 . 2 , 0 . 4 , 0 . 6 , 0 . 8 , 1) , κ (Σ − 1 ) = 5 2 10 1 10 0 10 −1 10 −2 10 −3 10 0 5 10 15 20 25 30 Iteration ◦ =projected gradient � =gradient geodesic descent ⋄ =Newton geodesic descent

  19. Application: optimization on manifolds • Example: N = 5 , T = 100 , Σ = diag (0 . 02 , 0 . 05 , 0 . 14 , 0 . 37 , 1) , κ (Σ − 1 ) = 50 3 10 2 10 1 10 0 10 −1 10 −2 10 0 5 10 15 20 25 30 Iteration ◦ =projected gradient � =gradient geodesic descent ⋄ =Newton geodesic descent

  20. Application: Kendall’s theory of shapes Manifold (quotient space) • Applications: − Morph one shape into another, statistics (“mean” shape), clustering, . . .

  21. Application: random matrix theory • Basic statistics: transformation of random objects in Euclidean spaces   x is a random vector in R n     y ∼ p Y ( y ) = p X ( F − 1 ( y )) J ( y )  x ∼ p X ( x ) ⇒ 1 F : R n → R n smooth, bijective  J ( y ) =  det( DF ( F − 1 ( y )))     y = F ( x ) p X p Y F R n R n

  22. Application: random matrix theory • Generalization: transformation of random objects in manifolds M, N   x is a random point in M      x ∼ Ω X (exterior form) ⇒ y ∼ Ω Y = . . .  F : M → N smooth, bijective      y = F ( x ) • The answer is provided by the calculus of exterior differential forms Ω X Ω Y F M N

  23. Application: random matrix theory • Example: decoupling a random vector in amplitude and direction � � x F ( x ) = � x � , � x � M = R n − { 0 } N = R ++ × S n − 1 = { ( R, u ) : R > 0 , � u � = 1 } p ( R, u ) = p X ( Ru ) R n − 1 • Answer: x ∼ p X ( x ) ⇒

  24. Application: random matrix theory • Example: decoupling a random matrix by the polar decomposition X = PQ Polar decomposition N = S n ++ × O ( n ) M = GL ( n ) � � � � X ∈ R n × n : | X | � = 0 ( P, Q ) : P ≻ 0 , Q ⊤ Q = I n = = • Answer: X ∼ p X ( X ) ⇒ p ( P, Q ) = . . . (known)

  25. Application: random matrix theory • Example: decoupling a random symmetric matrix by eigendecomposition X = Q Λ Q ⊤ EVD M = S n N = O ( n ) × D ( n ) � X ∈ R n × n : X = X ⊤ � � � ( Q, Λ) : Q ⊤ Q = I n , Λ : diag = = • Answer: X ∼ p X ( X ) ⇒ p ( Q, Λ) = . . . (known) • Technicality: in fact, the range of F is a quotient of an open subset of N

  26. Application: random matrix theory • Many more examples: − Cholesky decomposition (e.g., leads to Wishart distribution) − LU − QR − SVD

  27. Application of RMT: coherent capacity of multi-antenna systems • Scenario: point-to-point single-user communication with multiple Tx antennas h 11 y 1 h 1 ,N t x 1 h 21 y 2 h N r , 1 � b Tx Rx b h N r ,N t x N t y N r

  28. Application of RMT: coherent capacity of multi-antenna systems • Data model: y = Hx + n with y, n ∈ C N r , H ∈ C N r × N t , x ∈ C N t − N t = number of Tx antennas − N r = number of Rx antennas iid Assumption: n i ∼ C N (0 , 1) • Decoupled data model: − SVD: H = U Σ V H with U ∈ U ( N r ) , V ∈ U ( N t ) , Σ = Diag ( σ 1 , . . . , σ f , 0) , ( σ 1 , . . . , σ f ) = nonzero singular values of H , f = min { N r , N t } y = U H y , � x = V H x and � n = U H n − Transform the data: � − Equivalent diagonal model: � y = Σ � x + � n

  29. Application of RMT: coherent capacity of multi-antenna systems • Interpretation: The matrix channel H is equivalent to f parallel scalar channels n 1 � σ 1 y 1 x 1 � � + � n f σ f x f y f � � +

  30. Application of RMT: coherent capacity of multi-antenna systems • Assumption: channel matrix H is random and known only at the Rx • Channel capacity: C = max I ( x ; ( y, H )) p ( x ) , E { � x � 2 ≤ P } I = mutual information • Solution:     f � � � 1 + ( P/N t ) σ 2 C = E H log i   i =1 Recall: ( σ 1 , . . . , σ f ) = random singular values of H , f = min { N r , N t }

  31. Application of RMT: coherent capacity of multi-antenna systems • H is random and H = U Σ V H (SVD) p ( H ) p ( U, Σ , V ) SVD C N r × N t U ( N r ) × D ( f ) × U ( N t ) • Capacity: when [ H ij ] iid ∼ C N (0 , 1) � ∞ f − 1 � k ! ( λ )) 2 λ g − f e − λ dλ ( k + g − f )! ( L g − f C = log(1 + ( P/N t ) λ ) k 0 k =0 g = max { N r , N t } and L i j =Laguerre polynomials

  32. Application: information geometry • Problem: given a parametric statistical family F = { p ( x ; θ ) : θ ∈ Θ } assign a distance function d : F × F → R • Example: F = {N ( θ, Σ) : θ ∈ Θ = R n } (covariance Σ is fixed) • Naive choice: d : Θ × Θ → R d ( θ, η ) = � θ − η � η θ • This method does not produce “intrinsic” distances (parameter invariant)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend