background material
play

Background Material DS-GA 1013 / MATH-GA 2824 Mathematical Tools for - PowerPoint PPT Presentation

Background Material DS-GA 1013 / MATH-GA 2824 Mathematical Tools for Data Science https://cims.nyu.edu/~cfgranda/pages/MTDS_spring20/index.html Sreyas Mohan and Carlos Fernandez-Granda Vector spaces Inner product Norms Mean, Variance and


  1. Background Material DS-GA 1013 / MATH-GA 2824 Mathematical Tools for Data Science https://cims.nyu.edu/~cfgranda/pages/MTDS_spring20/index.html Sreyas Mohan and Carlos Fernandez-Granda

  2. Vector spaces Inner product Norms Mean, Variance and Correlation Sample mean, variance and correlation Orthogonality Orthogonal projection Denoising

  3. Vector space Consists of: ◮ A set V ◮ A scalar field (usually R or C ) ◮ Two operations + and ·

  4. Properties ◮ For any � x , � y ∈ V , � x + � y belongs to V ◮ For any � x ∈ V and any scalar α , α · � x ∈ V ◮ There exists a zero vector � x + � 0 such that � 0 = � x for any � x ∈ V y = � ◮ For any � x ∈ V there exists an additive inverse � y such that � x + � 0, usually denoted by − � x

  5. Properties ◮ The vector sum is commutative and associative, i.e. for all � x , � y , � z ∈ V � x + � y = � y + � x , ( � x + � y ) + � z = � x + ( � y + � z ) ◮ Scalar multiplication is associative, for any scalars α and β and any � x ∈ V α ( β · � x ) = ( α β ) · � x ◮ Scalar and vector sums are both distributive, i.e. for any scalars α and β and any � x , � y ∈ V ( α + β ) · � x = α · � x + β · � x , α · ( � x + � y ) = α · � x + α · � y

  6. Concept Check Let V = { x | x ∈ R , x ≥ 0 } . Define addition operation for x , y ∈ V as x + y = x + y (normal addition) and scalar multiplication for x ∈ V and α ∈ R as α x = α. x (regular scaling). Is V a vector field?

  7. Subspaces A subspace of a vector space V is any subset of V that is also itself a vector space

  8. Linear dependence/independence A set of m vectors � x 1 , � x 2 , . . . , � x m is linearly dependent if there exist m scalar coefficients α 1 , α 2 , . . . , α m which are not all equal to zero and m � x i = � α i � 0 i = 1 Equivalently, any vector in a linearly dependent set can be expressed as a linear combination of the rest

  9. Span The span of { � x 1 , . . . , � x m } is the set of all possible linear combinations � m � � span ( � x 1 , . . . , � x m ) := y | � � y = α i � x i for some scalars α 1 , α 2 , . . . , α m i = 1 The span of any set of vectors in V is a subspace of V

  10. Basis and dimension A basis of a vector space V is a set of independent vectors { � x 1 , . . . , � x m } such that V = span ( � x 1 , . . . , � x m ) If V has a basis with finite cardinality then every basis contains the same number of vectors The dimension dim ( V ) of V is the cardinality of any of its bases Equivalently, the dimension is the number of linearly independent vectors that span V

  11. Standard basis       1 0 0 0 1 0       e 1 = �  , � e 2 =  , . . . , e n = �  .   .   .  . . .       . . .     0 0 1 The dimension of R n is n

  12. Concept Check ◮ (True/False) If S is a subset of vector space V , then span ( S ) contains the intersection of all subspace of V that contain S . ◮ The set of all n × n matrices with trace as zero forms a subspace W of the space of n × n matrices. Find a basis for W and calculate it’s dimension.

  13. Concept Check - Answers ◮ True. ◮ We need to enforce that the sum of diagonal entries is zero, or that A 11 + A 22 + · · · + A nn = 0. The basis vectors can be { E ij } i � = j ∪ { E ii − E nn } i = 1 , 2 ,..., n − 1 . The dimension of W is n 2 − 1

  14. Vector spaces Inner product Norms Mean, Variance and Correlation Sample mean, variance and correlation Orthogonality Orthogonal projection Denoising

  15. Inner product Operation �· , ·� that maps a pair of vectors to a scalar

  16. Properties ◮ If the scalar field is R , it is symmetric. For any � x , � y ∈ V � � x , � y � = � � y , � x � If the scalar field is C , then for any � x , � y ∈ V � � x , � y � = � � y , � x � , where for any α ∈ C α is the complex conjugate of α

  17. Properties ◮ It is linear in the first argument, i.e. for any α ∈ R and any � x , � y , � z ∈ V � α � x , � y � = α � � x , � y � , � � x + � y , � z � = � � x , � z � + � � y , � z � . If the scalar field is R , it is also linear in the second argument ◮ It is positive definite: � � x , � x � is nonnegative for all � x ∈ V and if x = � � � x , � x � = 0 then � 0

  18. Dot product y ∈ R n Inner product between � x , � � � x · � y := � x [ i ] � y [ i ] i R n endowed with the dot product is usually called a Euclidean space of dimension n y ∈ C n If � x , � � � x · � � x [ i ] � y := y [ i ] i

  19. Matrix inner product The inner product between two m × n matrices A and B is � � A T B � A , B � := tr m n � � = A ij B ij i = 1 j = 1 where the trace of an n × n matrix is defined as the sum of its diagonal n � tr ( M ) := M ii i = 1 For any pair of m × n matrices A and B � � � AB T � B T A tr := tr

  20. Function inner product The inner product between two complex-valued square-integrable functions f , g defined in an interval [ a , b ] of the real line is � b � f · � g := f ( x ) g ( x ) d x a

  21. Vector spaces Inner product Norms Mean, Variance and Correlation Sample mean, variance and correlation Orthogonality Orthogonal projection Denoising

  22. Norms Let V be a vector space, a norm is a function ||·|| from V to R with the following properties ◮ It is homogeneous. For any scalar α and any � x ∈ V || α � x || = | α | || � x || . ◮ It satisfies the triangle inequality || � x + � y || ≤ || � x || + || � y || . In particular, || � x || ≥ 0 x = � ◮ || � x || = 0 implies � 0

  23. Inner-product norm Square root of inner product of vector with itself � || � x || �· , ·� := � � x , � x �

  24. Inner-product norm ◮ Vectors in R n or C n : ℓ 2 norm � n √ � � � x [ i ] 2 || � x || 2 := � x · � x = � � i = 1 ◮ Matrices in R m × n or C m × n : Frobenius norm � m n � � � � � A 2 || A || F := tr ( A T A ) = � ij i = 1 j = 1 ◮ Square-integrable complex-valued functions: L 2 norm �� b | f ( x ) | 2 d x � || f || L 2 := � f , f � = a

  25. Cauchy-Schwarz inequality For any two vectors � x and � y in an inner-product space |� � x , � y �| ≤ || � x || �· , ·� || � y || �· , ·� Assume || � x || �· , ·� � = 0, then || � y || �· , ·� � � x , � y � = − || � x || �· , ·� || � y || �· , ·� ⇐ ⇒ � y = − � x || � x || �· , ·� || � y || �· , ·� � � x , � y � = || � x || �· , ·� || � ⇒ � � y || �· , ·� ⇐ y = x || � x || �· , ·�

  26. ℓ 1 and ℓ ∞ norms Norms in R n or C n not induced by an inner product n � || � x || 1 := | � x [ i ] | i = 1 || � x || ∞ := max | � x [ i ] | i

  27. Norm balls ℓ 1 ℓ 2 ℓ ∞

  28. Distance The distance between two vectors � x and � y induced by a norm ||·|| is d ( � x , � y ) := || � x − � y ||

  29. Classification Aim: Assign a signal to one of k predefined classes Training data: n pairs of signals (represented as vectors) and labels: { � x 1 , l 1 } , . . . , { � x n , l n }

  30. Nearest-neighbor classification nearest neighbor

  31. Face recognition Training set: 360 64 × 64 images from 40 different subjects (9 each) Test set: 1 new image from each subject We model each image as a vector in R 4096 and use the ℓ 2 -norm distance

  32. Face recognition Training set

  33. Nearest-neighbor classification Errors: 4 / 40 Test image Closest image

  34. Vector spaces Inner product Norms Mean, Variance and Correlation Sample mean, variance and correlation Orthogonality Orthogonal projection Denoising

  35. Mean, Variance and Correlation ◮ Consider real-valued data corresponding to a single quantity or feature. We model such data as a scalar continuous random variable. ◮ In reality we usually have access to a finite number of data points, not to a continuous distribution. ◮ Mean of a random variable is the point that minimizes the expected distance to the random variable. ◮ Intuitively, it is the center of mass of the probability density, and hence of the dataset.

  36. Mean Lemma: For any random variable ˜ a with mean E (˜ a ) , a ) 2 � � E (˜ a ) = arg min c ∈ R E ( c − ˜ .

  37. Proof = c 2 − 2 c E (˜ � a ) 2 � � a 2 � Let g ( c ) := E ( c − ˜ a ) + E ˜ , we have f ′ ( c ) = 2 ( c − E (˜ a )) , f ′′ ( c ) = 2 . The function is strictly convex and has a minimum where the derivative equals zero, i.e. when c is equal to the mean.

  38. Variance The variance of a random variable ˜ a � a )) 2 � Var (˜ a ) := E (˜ a − E (˜ quantifies how much it fluctuates around its mean. The standard deviation, defined as the square root of the variance, is therefore a measure of how spread out the dataset is around its center.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend