mit 9 520 6 860 statistical learning theory and
play

MIT 9.520/6.860 Statistical Learning Theory and Applications Class - PowerPoint PPT Presentation

MIT 9.520/6.860 Statistical Learning Theory and Applications Class 0: Mathcamp Lorenzo Rosasco Vector Spaces Hilbert Spaces Functionals and Operators (Matrices) Linear Operators Probability Theory R D We like R D because we can add


  1. MIT 9.520/6.860 Statistical Learning Theory and Applications Class 0: Mathcamp Lorenzo Rosasco

  2. Vector Spaces Hilbert Spaces Functionals and Operators (Matrices) Linear Operators Probability Theory

  3. R D We like R D because we can ◮ add elements v + w ◮ multiply by numbers 3 v ◮ take scalar products v T w = � D j = 1 v j w j √ � � D ◮ . . . and norms � v � = v T v = j = 1 ( v j ) 2 ◮ . . . and distances d ( v , w ) = � v − w � = � D j = 1 ( v j − w j ) 2 . We want to do the same thing with D = ∞ . . .

  4. Vector Space ◮ A vector space is a set V with binary operations +: V × V → V · : R × V → V and such that for all a , b ∈ R and v , w , x ∈ V : 1. v + w = w + v 2. ( v + w ) + x = v + ( w + x ) 3. There exists 0 ∈ V such that v + 0 = v for all v ∈ V 4. For every v ∈ V there exists − v ∈ V such that v + (− v ) = 0 5. a ( bv ) = ( ab ) v 6. 1 v = v 7. ( a + b ) v = av + bv 8. a ( v + w ) = av + aw ◮ Example: R n , space of polynomials, space of functions.

  5. Inner Product ◮ An inner product is a function �· , ·� : V × V → R such that for all a , b ∈ R and v , w , x ∈ V : 1. � v , w � = � w , v � 2. � av + bw , x � = a � v , x � + b � w , x � 3. � v , v � � 0 and � v , v � = 0 if and only if v = 0. ◮ v , w ∈ V are orthogonal if � v , w � = 0. ◮ Given W ⊆ V , we have V = W ⊕ W ⊥ , where W ⊥ = { v ∈ V | � v , w � = 0 for all w ∈ W } . ◮ Cauchy-Schwarz inequality: � v , w � � � v , v � 1 / 2 � w , w � 1 / 2 .

  6. Norm ◮ A norm is a function � · � : V → R such that for all a ∈ R and v , w ∈ V : 1. � v � � 0, and � v � = 0 if and only if v = 0 2. � av � = | a | � v � 3. � v + w � � � v � + � w � ◮ Can define norm from inner product: � v � = � v , v � 1 / 2 .

  7. Metric ◮ A metric is a function d : V × V → R such that for all v , w , x ∈ V : 1. d ( v , w ) � 0, and d ( v , w ) = 0 if and only if v = w 2. d ( v , w ) = d ( w , v ) 3. d ( v , w ) � d ( v , x ) + d ( x , w ) ◮ Can define metric from norm: d ( v , w ) = � v − w � .

  8. Basis ◮ B = { v 1 , . . . , v n } is a basis of V if every v ∈ V can be uniquely decomposed as v = a 1 v 1 + · · · + a n v n for some a 1 , . . . , a n ∈ R . ◮ An orthonormal basis is a basis that is orthogonal ( � v i , v j � = 0 for i � = j ) and normalized ( � v i � = 1).

  9. Vector Spaces Hilbert Spaces Functionals and Operators (Matrices) Linear Operators Probability Theory

  10. Hilbert Space, overview ◮ Goal: to understand Hilbert spaces (complete inner product spaces) and to make sense of the expression ∞ � f = � f , φ i � φ i , f ∈ H i = 1 ◮ Need to talk about: 1. Cauchy sequence 2. Completeness 3. Density 4. Separability

  11. Cauchy Sequence ◮ Recall: lim n → ∞ x n = x if for every ǫ > 0 there exists N ∈ N such that � x − x n � < ǫ whenever n � N . ◮ ( x n ) n ∈ N is a Cauchy sequence if for every ǫ > 0 there exists N ∈ N such that � x m − x n � < ǫ whenever m , n � N . ◮ Every convergent sequence is a Cauchy sequence (why?)

  12. Completeness ◮ A normed vector space V is complete if every Cauchy sequence converges. ◮ Examples: 1. Q is not complete. 2. R is complete (axiom). 3. R n is complete. 4. Every finite dimensional normed vector space (over R ) is complete.

  13. Hilbert Space ◮ A Hilbert space is a complete inner product space. ◮ Examples: 1. R n 2. Every finite dimensional inner product space. n = 1 | a n ∈ R , � ∞ n = 1 a 2 3. ℓ 2 = { ( a n ) ∞ n < ∞ } � 1 0 f ( x ) 2 dx < ∞ } 4. L 2 ([ 0, 1 ]) = { f : [ 0, 1 ] → R |

  14. Density ◮ Y is dense in X if Y = X . ◮ Examples: 1. Q is dense in R . 2. Q n is dense in R n . 3. Weierstrass approximation theorem: polynomials are dense in continuous functions (with the supremum norm, on compact domains).

  15. Separability ◮ X is separable if it has a countable dense subset. ◮ Examples: 1. R is separable. 2. R n is separable. 3. ℓ 2 , L 2 ([ 0, 1 ]) are separable.

  16. Orthonormal Basis ◮ A Hilbert space has a countable orthonormal basis if and only if it is separable. ◮ Can write: ∞ � � f , φ i � φ i for all f ∈ H . f = i = 1 ◮ Examples: 1. Basis of ℓ 2 is ( 1, 0, . . . , ) , ( 0, 1, 0, . . . ) , ( 0, 0, 1, 0, . . . ) , . . . 2. Basis of L 2 ([ 0, 1 ]) is 1, 2 sin 2 π nx , 2 cos 2 π nx for n ∈ N

  17. Vector Spaces Hilbert Spaces Functionals and Operators (Matrices) Linear Operators Probability Theory

  18. Maps Next we are going to review basic properties of maps on a Hilbert space. ◮ functionals: Ψ : H → R ◮ linear operators A : H → H , such that A ( af + bg ) = aAf + bAg , with a , b ∈ R and f , g ∈ H .

  19. Representation of Continuous Functionals Let H be a Hilbert space and g ∈ H , then Ψ g ( f ) = � f , g � , f ∈ H is a continuous linear functional. Riesz representation theorem The theorem states that every continuous linear functional Ψ can be written uniquely in the form, Ψ ( f ) = � f , g � for some appropriate element g ∈ H .

  20. Matrix ◮ Every linear operator L : R m → R n can be represented by an m × n matrix A . ◮ If A ∈ R m × n , the transpose of A is A ⊤ ∈ R n × m satisfying � Ax , y � R m = ( Ax ) ⊤ y = x ⊤ A ⊤ y = � x , A ⊤ y � R n for every x ∈ R n and y ∈ R m . ◮ A is symmetric if A ⊤ = A .

  21. Eigenvalues and Eigenvectors ◮ Let A ∈ R n × n . A nonzero vector v ∈ R n is an eigenvector of A with corresponding eigenvalue λ ∈ R if Av = λ v . ◮ Symmetric matrices have real eigenvalues. ◮ Spectral Theorem: Let A be a symmetric n × n matrix. Then there is an orthonormal basis of R n consisting of the eigenvectors of A . ◮ Eigendecomposition: A = V Λ V ⊤ , or equivalently, n � λ i v i v ⊤ A = i . i = 1

  22. Singular Value Decomposition ◮ Every A ∈ R m × n can be written as A = U Σ V ⊤ , where U ∈ R m × m is orthogonal, Σ ∈ R m × n is diagonal, and V ∈ R n × n is orthogonal. ◮ Singular system: AA ⊤ u i = σ 2 Av i = σ i u i i u i A ⊤ u i = σ i v i A ⊤ Av i = σ 2 i v i

  23. Matrix Norm ◮ The spectral norm of A ∈ R m × n is � � � A � spec = σ max ( A ) = λ max ( AA ⊤ ) = λ max ( A ⊤ A ) . ◮ The Frobenius norm of A ∈ R m × n is � � min { m , n } m n � � � � � � � a 2 σ 2 � A � F = ij = i . � � i = 1 j = 1 i = 1

  24. Positive Definite Matrix A real symmetric matrix A ∈ R m × m is positive definite if x T Ax > 0, ∀ x ∈ R m . A positive definite matrix has positive eigenvalues. Note: for positive semi-definite matrices > is replaced by � .

  25. Vector Spaces Hilbert Spaces Functionals and Operators (Matrices) Linear Operators Probability Theory

  26. Linear Operator ◮ An operator L : H 1 → H 2 is linear if it preserves the linear structure. ◮ A linear operator L : H 1 → H 2 is bounded if there exists C > 0 such that � Lf � H 2 � C � f � H 1 for all f ∈ H 1 . ◮ A linear operator is continuous if and only if it is bounded.

  27. Adjoint and Compactness ◮ The adjoint of a bounded linear operator L : H 1 → H 2 is a bounded linear operator L ∗ : H 2 → H 1 satisfying � Lf , g � H 2 = � f , L ∗ g � H 1 for all f ∈ H 1 , g ∈ H 2 . ◮ L is self-adjoint if L ∗ = L . Self-adjoint operators have real eigenvalues. ◮ A bounded linear operator L : H 1 → H 2 is compact if the image of the unit ball in H 1 has compact closure in H 2 .

  28. Spectral Theorem for Compact Self-Adjoint Operator ◮ Let L : H → H be a compact self-adjoint operator. Then there exists an orthonormal basis of H consisting of the eigenfunctions of L , L φ i = λ i φ i and the only possible limit point of λ i as i → ∞ is 0. ◮ Eigendecomposition: ∞ � λ i � φ i , ·� φ i . L = i = 1

  29. Probability Space A triple ( Ω , A , P ) , where Ω is a set, A a Sigma Algebra, i.e. a family of subsets of Ω s.t. ◮ X , ∅ ∈ A , ◮ A ∈ A ⇒ Ω \ A ∈ A , ◮ A i ∈ A , i = 1, 2 · · · ⇒ ∪ ∞ i = 1 A i ∈ A . P a probability measure, i.e a function P : A → [ 0, 1 ] ◮ P ( X ) = 1 (hence and P ( ∅ ) = 0), ◮ Sigma additivity: If A i ∈ A , i = 1, 2 . . . are disjoint, then ∞ � P ( ∪ ∞ i = 1 A i ) = P ( A i ) i = 1

  30. Real Random Variables (RV) A measurable function X : Ω → R , i.e. mapping elements of the sigma algebra in open subsets of R . ◮ Law of a random variable: probability measure on R defined as ρ ( I ) = P ( X − 1 ( I )) for all open subsets I ⊂ R . ◮ Probability density function of a probability measure ρ on X : a function p : R → R such that � � d ρ ( x ) = p ( x ) dx I I for open subsets I ⊂ R .

  31. Convergence of Random Variables X i , i = 1, 2, . . . , a sequence of random variables. ◮ Convergence in probability: ∀ ǫ ∈ ( 0, ∞ ) , i → ∞ P ( | X i − X | > ǫ ) = 0. lim ◮ Almost Sure Convergence: � � i → ∞ X i = X lim = 1. P

  32. Law of Large Numbers X i , i = 1, 2, . . . , sequence of independent copies of a random variable X Weak Law of Large Numbers: �� n � � 1 � � � ∀ ǫ ∈ ( 0, ∞ ) , lim X i − E [ X ] = 0. n → ∞ P � > ǫ � � � n � � i = 1 Strong Law of Large Numbers: � n � 1 � P lim X i = E [ X ] = 1. n n → ∞ i = 1

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend