CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, - PowerPoint PPT Presentation

CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition Prof. Mitesh M. Khapra Department of Computer Science and Engineering Indian Institute of Technology Madras 1/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Module 6.1 : Eigenvalues and Eigenvectors 2/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

y � 1 What happens when a matrix hits a � 2 A = vector? � 7 2 1 � The vector gets transformed into a Ax = 5 new vector (it strays from its path) The vector may also get scaled � 1 � (elongated or shortened) in the x = 3 process. x 3/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

y � 1 For a given square matrix A , there � 2 A = exist special vectors which refuse to 2 1 stray from their path. These vectors are called eigenvectors. More formally, � 3 � 1 � � Ax = = 3 3 1 Ax = λx [direction remains the same] � 1 The vector will only get scaled but � x = will not change its direction. 1 x 4/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

y � 1 So what is so special about � 2 A = eigenvectors? 2 1 Why are they always in the limelight? It turns out that several properties of matrices can be analyzed based � 3 � 1 � � on their eigenvalues (for example, see Ax = = 3 3 1 spectral graph theory) We will now see two cases where � 1 � eigenvalues/vectors will help us in x = 1 this course x 5/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Let us assume that on day 0, k 1 students eat Chinese Mexican Chinese food, and k 2 students eat Mexican food. k 1 k 2 (Of course, no one eats in the mess!) On each subsequent day i , a fraction p of the � k 1 � students who ate Chinese food on day ( i − 1), v (0) = k 2 continue to eat Chinese food on day i , and (1 − p ) � pk 1 + (1 − q ) k 2 shift to Mexican food. � v (1) = Similarly a fraction q of students who ate Mexican (1 − p ) k 1 + qk 2 � � k 1 food on day ( i − 1) continue to eat Mexican food � � p 1 − q on day i , and (1 − q ) shift to Chinese food. = 1 − p q k 2 The number of customers in the two restaurants v (1) = Mv (0) is thus given by the following series: v (2) = Mv (1) v (0) , Mv (0) , M 2 v (0) , M 3 v (0) , . . . M 2 v (0) = In general, v ( n ) = M n v (0) 6/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

This is a problem for the two restaurant owners. The number of patrons is changing constantly. Or is it? Will the system eventually reach a steady state? (i.e. will the number of customers in the two restaurants become 1 − p constant over time?) Turns out they will! p q k 1 k 2 Let’s see how? 1 − q 7/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Definition Definition Let λ 1 , λ 2 , . . . , λ n be the A matrix M is called a stochastic matrix if all the eigenvectors of an n × n matrix entries are positive and the sum of the elements in A . λ 1 is called the dominant each column is equal to 1. eigen value of A if (Note that the matrix in our example is a stochastic matrix) | λ 1 | ≥ | λ i | i = 2 , . . . , n Theorem Theorem If A is a n × n square matrix with a dominant The largest (dominant) eigenvalue, then the sequence of vectors given by eigenvalue of a stochastic matrix Av 0 , A 2 v 0 , . . . , A n v 0 , . . . approaches a multiple of is 1. the dominant eigenvector of A . See proof here (the theorem is slightly misstated here for ease of explanation) 8/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

1 − p Let e d be the dominant eigenvector of M and λ d = 1 the corresponding dominant eigenvalue p q k 1 k 2 Given the previous definitions and theorems, what can you say about the sequence Mv (0) , M 2 v (0) , M 3 v (0) , . . . ? 1 − q There exists an n such that v ( n ) = M n v (0) = ke d (some multiple of e d ) Now what happens at time step ( n + 1)? v ( n +1) = Mv ( n ) = M ( ke d ) = k ( Me d ) = k ( λ d e d ) = ke d The population in the two restaurants becomes constant after time step n . See Proof Here 9/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Now instead of a stochastic matrix let us consider any square matrix A Let p be the time step at which the sequence x 0 , Ax 0 , A 2 x 0 , . . . approaches a multiple of e d (the dominant eigenvector of A ) A p x 0 = ke d A p +1 x 0 = A ( A p x 0 ) = kAe d = kλ d e d A p +2 x 0 = A ( A p +1 x 0 ) = kλ d Ae d = kλ 2 d e d A p + n x 0 = k ( λ d ) n e d In general, if λ d is the dominant eigenvalue of a matrix A , what would happen to the sequence x 0 , Ax 0 , A 2 x 0 , . . . if | λ d | > 1 (will explode) | λ d | < 1 (will vanish) | λ d | = 1 (will reach a steady state) (We will use this in the course at some point) 10/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Module 6.2 : Linear Algebra - Basic Definitions 11/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

We will see some more examples where eigenvectors are important, but before that let’s revisit some basic definitions from linear algebra. 12/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Basis A set of vectors ∈ R n is called a basis, if they are linearly independent and every vector ∈ R n can be expressed as a linear combination of these vectors. Linearly independent vectors A set of n vectors v 1 , v 2 , . . . , v n is linearly independent if no vector in the set can be expressed as a linear combination of the remaining n − 1 vectors. In other words, the only solution to c 1 v 1 + c 2 v 2 + . . . c n v n = 0 is c 1 = c 2 = · · · = c n = 0( c i ’s are scalars ) 13/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

For example consider the space R 2 Now consider the vectors � 1 � 0 � � x = and y = 0 1 � a � y = (0 , 1) ∈ R 2 , can be expressed as a Any vector b linear combination of these two vectors i.e � a � 1 � 0 x = (1 , 0) � � � = a + b b 0 1 Further, x and y are linearly independent. (the only solution to c 1 x + c 2 y = 0 is c 1 = c 2 = 0) 14/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

In fact, turns out that x and y are unit vectors in the direction of the co-ordinate axes. And indeed we are used to representing all vectors in R 2 as a linear combination of these two vectors. But there is nothing sacrosanct about the y = (0 , 1) particular choice of x and y . We could have chosen any 2 linearly independent vectors in R 2 as the basis vectors. x = (1 , 0) For example, consider the linearly [5 , 7] T . [2 , 3] T independent vectors, and � a � � 2 � � 5 � R 2 can be See how any vector [ a, b ] T ∈ = x 1 + x 2 b 3 7 expressed as a linear combination of these two vectors. a = 2 x 1 + 5 x 2 We can find x 1 and x 2 by solving a system of b = 3 x 1 + 7 x 2 linear equations. 15/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

In general, given a set of linearly independent � z 1 vectors u 1 , u 2 , . . . u n ∈ R n , we can express any � vector z ∈ R n as a linear combination of these z = z 2 vectors. z = α 1 u 1 + α 2 u 2 + · · · + α n u n         u 2 z 1 u 11 u 21 u n 1 u 1 z 2 u 12 u 22 u n 2                  = α 1  + α 2  + . . . + α n . . . .  .   .   .   .  . . . .      z n u 1 n u 2 n u nn       z 1 u 11 u 21 . . . u n 1 α 1 z 2 u 12 u 22 . . . u n 2 α 2              = . . . . . .  .   . . . .   .  . . . . . .      z n u 1 n u 2 n . . . u nn α n (Basically rewriting in matrix form) We can now find the α i s using Gaussian Elimination (Time Complexity: O ( n 3 )) 16/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Now let us see if we have orthonormal basis. � a � i u i = � u i � 2 = 1 u T i u j = 0 ∀ i � = j and u T z = b Again we have: | → z = α 1 u 1 + α 2 u 2 + . . . + α n u n z u 2 | u 1 u T 1 z = α 1 u T 1 u 1 + . . . + α n u T 1 u n θ α 1 α 2 = α 1 z | z T u 1 We can directly find each α i using a dot → → = z T u 1 α 1 = | z | cosθ = | product between z and u i (time complexity → | z || u 1 | O ( N )) Similarly, α 2 = z T u 2 . The total complexity will be O ( N 2 ) When u 1 and u 2 are unit vectors along the co-ordinate axes � a � 1 � 0 � � � z = = a + b b 0 1 17/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Remember An orthogonal basis is the most convenient basis that one can hope for. 18/71 Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, - PowerPoint PPT Presentation

CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition Prof. Mitesh M. Khapra Department of Computer Science and Engineering Indian Institute of

CS7015 (Deep Learning) : Lecture 10 Learning Vectorial Representations Of Words Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 16 Encoder Decoder Models, Attention Mechanism Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 18 Markov Networks Mitesh M. Khapra Department of Computer

CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks (GANs) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 21 Variational Autoencoders Mitesh M. Khapra Department of

CS7015 (Deep Learning): Lecture 4 Feedforward Neural Networks, Backpropagation Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 22 Autoregressive Models (NADE, MADE) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 21 Variational Autoencoders Mitesh M. Khapra Department of

CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks (GANs) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 22 Autoregressive Models (NADE, MADE) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 15 Long Short Term Memory Cells (LSTMs), Gated Recurrent Units

CS7015 (Deep Learning) : Lecture 15 Long Short Term Memory Cells (LSTMs), Gated Recurrent Units

CS7015 (Deep Learning) : Lecture 1 (Partial/Brief) History of Deep Learning Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 1 (Partial/Brief) History of Deep Learning Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 13 Visualizing Convolutional Neural Networks, Guided

CS7015 (Deep Learning) : Lecture 19 Using joint distributions for classification and sampling,

Epistasis 02-715 Advanced Topics in Computa8onal Genomics

Understanding the Fundamentals: The Language of Genetics Genetics Webinar Series for Blue Plans

D Dominant Resource Fairness (DRF) i t R F i (DRF) Fair Allocation of Multiple Resource Types

Laurette TUCKERMAN laurette@pmmh.espci.fr Numerical Methods for Differential Equations in

PTSD as a Shame Disorder (A Work in Progress) Judith L. Herman, M.D. ISTSS Webinar, October 2014

Sampling Effect on Performance Prediction of Configurable Systems : A Case Study Juliana Alves

(Basic Concepts) CSC304 - Nisarg Shah 1 Game Theory How do rational, self-interested agents

DRFQ : Multi-Resource Fair Queueing for Packet Processing Ali Ghodsi 1,3 , Vyas Sekar 2 , Matei