cs7015 deep learning lecture 6
play

CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, - PowerPoint PPT Presentation

CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition Prof. Mitesh M. Khapra Department of Computer Science and Engineering Indian Institute of


  1. CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition Prof. Mitesh M. Khapra Department of Computer Science and Engineering Indian Institute of Technology Madras 1/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

  2. Module 6.1 : Eigenvalues and Eigenvectors 2/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

  3. y � 1 What happens when a matrix hits a � 2 A = vector? � 7 2 1 � The vector gets transformed into a Ax = 5 new vector (it strays from its path) The vector may also get scaled � 1 � (elongated or shortened) in the x = 3 process. x 3/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

  4. y � 1 For a given square matrix A , there � 2 A = exist special vectors which refuse to 2 1 stray from their path. These vectors are called eigenvectors. More formally, � 3 � 1 � � Ax = = 3 3 1 Ax = λx [direction remains the same] � 1 The vector will only get scaled but � x = will not change its direction. 1 x 4/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

  5. y � 1 So what is so special about � 2 A = eigenvectors? 2 1 Why are they always in the limelight? It turns out that several properties of matrices can be analyzed based � 3 � 1 � � on their eigenvalues (for example, see Ax = = 3 3 1 spectral graph theory) We will now see two cases where � 1 � eigenvalues/vectors will help us in x = 1 this course x 5/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

  6. Let us assume that on day 0, k 1 students eat Chinese Mexican Chinese food, and k 2 students eat Mexican food. k 1 k 2 (Of course, no one eats in the mess!) On each subsequent day i , a fraction p of the � k 1 � students who ate Chinese food on day ( i − 1), v (0) = k 2 continue to eat Chinese food on day i , and (1 − p ) � pk 1 + (1 − q ) k 2 shift to Mexican food. � v (1) = Similarly a fraction q of students who ate Mexican (1 − p ) k 1 + qk 2 � � k 1 food on day ( i − 1) continue to eat Mexican food � � p 1 − q on day i , and (1 − q ) shift to Chinese food. = 1 − p q k 2 The number of customers in the two restaurants v (1) = Mv (0) is thus given by the following series: v (2) = Mv (1) v (0) , Mv (0) , M 2 v (0) , M 3 v (0) , . . . M 2 v (0) = In general, v ( n ) = M n v (0) 6/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

  7. This is a problem for the two restaurant owners. The number of patrons is changing constantly. Or is it? Will the system eventually reach a steady state? (i.e. will the number of customers in the two restaurants become 1 − p constant over time?) Turns out they will! p q k 1 k 2 Let’s see how? 1 − q 7/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

  8. Definition Definition Let λ 1 , λ 2 , . . . , λ n be the A matrix M is called a stochastic matrix if all the eigenvectors of an n × n matrix entries are positive and the sum of the elements in A . λ 1 is called the dominant each column is equal to 1. eigen value of A if (Note that the matrix in our example is a stochastic matrix) | λ 1 | ≥ | λ i | i = 2 , . . . , n Theorem Theorem If A is a n × n square matrix with a dominant The largest (dominant) eigenvalue, then the sequence of vectors given by eigenvalue of a stochastic matrix Av 0 , A 2 v 0 , . . . , A n v 0 , . . . approaches a multiple of is 1. the dominant eigenvector of A . See proof here (the theorem is slightly misstated here for ease of explanation) 8/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

  9. 1 − p Let e d be the dominant eigenvector of M and λ d = 1 the corresponding dominant eigenvalue p q k 1 k 2 Given the previous definitions and theorems, what can you say about the sequence Mv (0) , M 2 v (0) , M 3 v (0) , . . . ? 1 − q There exists an n such that v ( n ) = M n v (0) = ke d (some multiple of e d ) Now what happens at time step ( n + 1)? v ( n +1) = Mv ( n ) = M ( ke d ) = k ( Me d ) = k ( λ d e d ) = ke d The population in the two restaurants becomes constant after time step n . See Proof Here 9/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

  10. Now instead of a stochastic matrix let us consider any square matrix A Let p be the time step at which the sequence x 0 , Ax 0 , A 2 x 0 , . . . approaches a multiple of e d (the dominant eigenvector of A ) A p x 0 = ke d A p +1 x 0 = A ( A p x 0 ) = kAe d = kλ d e d A p +2 x 0 = A ( A p +1 x 0 ) = kλ d Ae d = kλ 2 d e d A p + n x 0 = k ( λ d ) n e d In general, if λ d is the dominant eigenvalue of a matrix A , what would happen to the sequence x 0 , Ax 0 , A 2 x 0 , . . . if | λ d | > 1 (will explode) | λ d | < 1 (will vanish) | λ d | = 1 (will reach a steady state) (We will use this in the course at some point) 10/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

  11. Module 6.2 : Linear Algebra - Basic Definitions 11/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

  12. We will see some more examples where eigenvectors are important, but before that let’s revisit some basic definitions from linear algebra. 12/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

  13. Basis A set of vectors ∈ R n is called a basis, if they are linearly independent and every vector ∈ R n can be expressed as a linear combination of these vectors. Linearly independent vectors A set of n vectors v 1 , v 2 , . . . , v n is linearly independent if no vector in the set can be expressed as a linear combination of the remaining n − 1 vectors. In other words, the only solution to c 1 v 1 + c 2 v 2 + . . . c n v n = 0 is c 1 = c 2 = · · · = c n = 0( c i ’s are scalars ) 13/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

  14. For example consider the space R 2 Now consider the vectors � 1 � 0 � � x = and y = 0 1 � a � y = (0 , 1) ∈ R 2 , can be expressed as a Any vector b linear combination of these two vectors i.e � a � 1 � 0 x = (1 , 0) � � � = a + b b 0 1 Further, x and y are linearly independent. (the only solution to c 1 x + c 2 y = 0 is c 1 = c 2 = 0) 14/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

  15. In fact, turns out that x and y are unit vectors in the direction of the co-ordinate axes. And indeed we are used to representing all vectors in R 2 as a linear combination of these two vectors. But there is nothing sacrosanct about the y = (0 , 1) particular choice of x and y . We could have chosen any 2 linearly independent vectors in R 2 as the basis vectors. x = (1 , 0) For example, consider the linearly [5 , 7] T . [2 , 3] T independent vectors, and � a � � 2 � � 5 � R 2 can be See how any vector [ a, b ] T ∈ = x 1 + x 2 b 3 7 expressed as a linear combination of these two vectors. a = 2 x 1 + 5 x 2 We can find x 1 and x 2 by solving a system of b = 3 x 1 + 7 x 2 linear equations. 15/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

  16. In general, given a set of linearly independent � z 1 vectors u 1 , u 2 , . . . u n ∈ R n , we can express any � vector z ∈ R n as a linear combination of these z = z 2 vectors. z = α 1 u 1 + α 2 u 2 + · · · + α n u n         u 2 z 1 u 11 u 21 u n 1 u 1 z 2 u 12 u 22 u n 2                  = α 1  + α 2  + . . . + α n . . . .  .   .   .   .  . . . .      z n u 1 n u 2 n u nn       z 1 u 11 u 21 . . . u n 1 α 1 z 2 u 12 u 22 . . . u n 2 α 2              = . . . . . .  .   . . . .   .  . . . . . .      z n u 1 n u 2 n . . . u nn α n (Basically rewriting in matrix form) We can now find the α i s using Gaussian Elimination (Time Complexity: O ( n 3 )) 16/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

  17. Now let us see if we have orthonormal basis. � a � i u i = � u i � 2 = 1 u T i u j = 0 ∀ i � = j and u T z = b Again we have: | → z = α 1 u 1 + α 2 u 2 + . . . + α n u n z u 2 | u 1 u T 1 z = α 1 u T 1 u 1 + . . . + α n u T 1 u n θ α 1 α 2 = α 1 z | z T u 1 We can directly find each α i using a dot → → = z T u 1 α 1 = | z | cosθ = | product between z and u i (time complexity → | z || u 1 | O ( N )) Similarly, α 2 = z T u 2 . The total complexity will be O ( N 2 ) When u 1 and u 2 are unit vectors along the co-ordinate axes � a � 1 � 0 � � � z = = a + b b 0 1 17/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

  18. Remember An orthogonal basis is the most convenient basis that one can hope for. 18/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend