vector spaces
play

Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data - PowerPoint PPT Presentation

Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/OBDA_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar field (usually R or C ) Two


  1. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/OBDA_fall17/index.html Carlos Fernandez-Granda

  2. Vector space Consists of: ◮ A set V ◮ A scalar field (usually R or C ) ◮ Two operations + and ·

  3. Properties ◮ For any � x , � y ∈ V , � x + � y belongs to V ◮ For any � x ∈ V and any scalar α , α · � x ∈ V ◮ There exists a zero vector � x + � 0 such that � 0 = � x for any � x ∈ V y = � ◮ For any � x ∈ V there exists an additive inverse � y such that � x + � 0, usually denoted by − � x

  4. Properties ◮ The vector sum is commutative and associative, i.e. for all � x , � y , � z ∈ V � x + � y = � y + � x , ( � x + � y ) + � z = � x + ( � y + � z ) ◮ Scalar multiplication is associative, for any scalars α and β and any � x ∈ V α ( β · � x ) = ( α β ) · � x ◮ Scalar and vector sums are both distributive, i.e. for any scalars α and β and any � x , � y ∈ V ( α + β ) · � x = α · � x + β · � x , α · ( � x + � y ) = α · � x + α · � y

  5. Subspaces A subspace of a vector space V is any subset of V that is also itself a vector space

  6. Linear dependence/independence A set of m vectors � x 1 , � x 2 , . . . , � x m is linearly dependent if there exist m scalar coefficients α 1 , α 2 , . . . , α m which are not all equal to zero and m � x i = � α i � 0 i = 1 Equivalently, any vector in a linearly dependent set can be expressed as a linear combination of the rest

  7. Span The span of { � x 1 , . . . , � x m } is the set of all possible linear combinations � m � � span ( � x 1 , . . . , � x m ) := y | � � y = α i � for some scalars α 1 , α 2 , . . . , α m x i i = 1 The span of any set of vectors in V is a subspace of V

  8. Basis and dimension A basis of a vector space V is a set of independent vectors { � x 1 , . . . , � x m } such that V = span ( � x 1 , . . . , � x m ) If V has a basis with finite cardinality then every basis contains the same number of vectors The dimension dim ( V ) of V is the cardinality of any of its bases Equivalently, the dimension is the number of linearly independent vectors that span V

  9. Standard basis       1 0 0 0 1 0       e 1 = �  , � e 2 =  , . . . , e n = �  .   .   .  . . .       . . .     0 0 1 The dimension of R n is n

  10. Inner product Operation �· , ·� that maps a pair of vectors to a scalar

  11. Properties ◮ If the scalar field is R , it is symmetric. For any � x , � y ∈ V � � x , � y � = � � y , � x � If the scalar field is C , then for any � x , � y ∈ V � � x , � y � = � � y , � x � , where for any α ∈ C α is the complex conjugate of α

  12. Properties ◮ It is linear in the first argument, i.e. for any α ∈ R and any � x , � y , � z ∈ V � α � x , � y � = α � � x , � y � , � � x + � y , � z � = � � x , � z � + � � y , � z � . If the scalar field is R , it is also linear in the second argument ◮ It is positive definite: � � x , � x � is nonnegative for all � x ∈ V and if x = � � � x , � x � = 0 then � 0

  13. Dot product y ∈ R n Inner product between � x , � � � x · � y := � x [ i ] � y [ i ] i R n endowed with the dot product is usually called a Euclidean space of dimension n y ∈ C n If � x , � � � x · � � x [ i ] � y := y [ i ] i

  14. Sample covariance Quantifies joint fluctuations of two quantities or features For a data set ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x n , y n ) n 1 � cov (( x 1 , y 1 ) , . . . , ( x n , y n )) := ( x i − av ( x 1 , . . . , x n )) ( y i − av ( y 1 , . . . , y n )) n − 1 i = 1 where the average or sample mean is defined by n av ( a 1 , . . . , a n ) := 1 � a i n i = 1 If ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x n , y n ) are iid samples from x and y E ( cov (( x 1 , y 1 ) , . . . , ( x n , y n ))) = Cov ( x , y ) := E (( x − E ( x )) ( y − E ( y )))

  15. Matrix inner product The inner product between two m × n matrices A and B is � � A T B � A , B � := tr m n � � = A ij B ij i = 1 j = 1 where the trace of an n × n matrix is defined as the sum of its diagonal n � tr ( M ) := M ii i = 1 For any pair of m × n matrices A and B � � � AB T � B T A tr := tr

  16. Function inner product The inner product between two complex-valued square-integrable functions f , g defined in an interval [ a , b ] of the real line is � b � f · � g := f ( x ) g ( x ) d x a

  17. Norm Let V be a vector space, a norm is a function ||·|| from V to R with the following properties ◮ It is homogeneous. For any scalar α and any � x ∈ V || α � x || = | α | || � x || . ◮ It satisfies the triangle inequality || � x + � y || ≤ || � x || + || � y || . In particular, || � x || ≥ 0 x = � ◮ || � x || = 0 implies � 0

  18. Inner-product norm Square root of inner product of vector with itself � || � x || �· , ·� := � � x , � x �

  19. Inner-product norm ◮ Vectors in R n or C n : ℓ 2 norm � n √ � � � x [ i ] 2 || � x || 2 := � x · � x = � � i = 1 ◮ Matrices in R m × n or C m × n : Frobenius norm � m n � � � � � A 2 || A || F := tr ( A T A ) = � ij i = 1 j = 1 ◮ Square-integrable complex-valued functions: L 2 norm �� b | f ( x ) | 2 d x � || f || L 2 := � f , f � = a

  20. Cauchy-Schwarz inequality For any two vectors � x and � y in an inner-product space |� � x , � y �| ≤ || � x || �· , ·� || � y || �· , ·� Assume || � x || �· , ·� � = 0, then || � y || �· , ·� � � x , � y � = − || � x || �· , ·� || � y || �· , ·� ⇐ ⇒ � y = − � x || � x || �· , ·� || � y || �· , ·� � � x , � y � = || � x || �· , ·� || � ⇒ � � y || �· , ·� ⇐ y = x || � x || �· , ·�

  21. Sample variance and standard deviation The sample variance quantifies fluctuations around the average n 1 � ( x i − av ( x 1 , x 2 , . . . , x n )) 2 var ( x 1 , x 2 , . . . , x n ) := n − 1 i = 1 If x 1 , x 2 , . . . , x n are iid samples from x � ( x − E ( x )) 2 � E ( var ( x 1 , x 2 , . . . , x n )) = Var ( x ) := E The sample standard deviation is � std ( x 1 , x 2 , . . . , x n ) := var ( x 1 , x 2 , . . . , x n )

  22. Correlation coefficient Normalized covariance cov (( x 1 , y 1 ) , . . . , ( x n , y n )) ρ ( x 1 , y 1 ) ,..., ( x n , y n ) := std ( x 1 , . . . , x n ) std ( y 1 , . . . , y n ) Corollary of Cauchy-Schwarz − 1 ≤ ρ ( x 1 , y 1 ) ,..., ( x n , y n ) ≤ 1 and ⇒ y i = av ( y 1 , . . . , y n ) − std ( y 1 , . . . , y n ) ρ � y = − 1 ⇐ std ( x 1 , . . . , x n ) ( x i − av ( x 1 , . . . , x n )) x ,� ⇒ y i = av ( y 1 , . . . , y n ) + std ( y 1 , . . . , y n ) ρ � y = 1 ⇐ std ( x 1 , . . . , x n ) ( x i − av ( x 1 , . . . , x n )) x ,�

  23. Correlation coefficient ρ � 0.50 0.90 0.99 x ,� y ρ � 0.00 -0.90 -0.99 x ,� y

  24. Temperature data Temperature in Oxford over 150 years ◮ Feature 1: Temperature in January ◮ Feature 1: Temperature in August ρ = 0 . 269 20 18 16 April 14 12 10 8 16 18 20 22 24 26 28 August

  25. Temperature data Temperature in Oxford over 150 years (monthly) ◮ Feature 1: Maximum temperature ◮ Feature 1: Minimum temperature ρ = 0 . 962 20 Minimum temperature 15 10 5 0 5 10 5 0 5 10 15 20 25 30 Maximum temperature

  26. Parallelogram law A norm � · � on a vector space V is an inner-product norm if and only if x � 2 + 2 � � y � 2 = � � y � 2 + � � y � 2 2 � � x − � x + � for any � x , � y ∈ V

  27. ℓ 1 and ℓ ∞ norms Norms in R n or C n not induced by an inner product n � || � x || 1 := | � x [ i ] | i = 1 || � x || ∞ := max | � x [ i ] | i Hölder’s inequality |� � x , � y �| ≤ || � x || 1 || � y || ∞

  28. Norm balls ℓ 1 ℓ 2 ℓ ∞

  29. Distance The distance between two vectors � x and � y induced by a norm ||·|| is d ( � x , � y ) := || � x − � y ||

  30. Classification Aim: Assign a signal to one of k predefined classes Training data: n pairs of signals (represented as vectors) and labels: { � x 1 , l 1 } , . . . , { � x n , l n }

  31. Nearest-neighbor classification nearest neighbor

  32. Face recognition Training set: 360 64 × 64 images from 40 different subjects (9 each) Test set: 1 new image from each subject We model each image as a vector in R 4096 and use the ℓ 2 -norm distance

  33. Face recognition Training set

  34. Nearest-neighbor classification Errors: 4 / 40 Test image Closest image

  35. Orthogonality Two vectors � x and � y are orthogonal if and only if � � x , � y � = 0 A vector � x is orthogonal to a set S , if � � x ,� s � = 0 , for all � s ∈ S Two sets of S 1 , S 2 are orthogonal if for any � x ∈ S 1 , � y ∈ S 2 � � x , � y � = 0 The orthogonal complement of a subspace S is S ⊥ := { � x | � � x , � y � = 0 for all � y ∈ S}

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend