Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of - - PowerPoint PPT Presentation

linear algebra
SMART_READER_LITE
LIVE PREVIEW

Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of - - PowerPoint PPT Presentation

Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 1 / 26 Outline Span &


slide-1
SLIDE 1

Linear Algebra

Shan-Hung Wu

shwu@cs.nthu.edu.tw

Department of Computer Science, National Tsing Hua University, Taiwan

Large-Scale ML, Fall 2016

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 1 / 26

slide-2
SLIDE 2

Outline

1

Span & Linear Dependence

2

Norms

3

Eigendecomposition

4

Singular Value Decomposition

5

Traces and Determinant

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 2 / 26

slide-3
SLIDE 3

Outline

1

Span & Linear Dependence

2

Norms

3

Eigendecomposition

4

Singular Value Decomposition

5

Traces and Determinant

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 3 / 26

slide-4
SLIDE 4

Matrix Representation of Linear Functions

A linear function (or map or transformation) f : Rn ! Rm can be represented by a matrix A, A 2 Rm⇥n, such that f(x) = Ax = y,8x 2 Rn,y 2 Rm

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 4 / 26

slide-5
SLIDE 5

Matrix Representation of Linear Functions

A linear function (or map or transformation) f : Rn ! Rm can be represented by a matrix A, A 2 Rm⇥n, such that f(x) = Ax = y,8x 2 Rn,y 2 Rm span(A:,1,··· ,A:,n) is called the column space of A rank(A) = dim(span(A:,1,··· ,A:,n))

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 4 / 26

slide-6
SLIDE 6

System of Linear Equations

Given A and y , solve x in Ax = y

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 5 / 26

slide-7
SLIDE 7

System of Linear Equations

Given A and y , solve x in Ax = y What kind of A that makes Ax = y always have a solution?

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 5 / 26

slide-8
SLIDE 8

System of Linear Equations

Given A and y , solve x in Ax = y What kind of A that makes Ax = y always have a solution?

Since Ax = ΣixiA:,i, the column space of A must contain Rm, i.e., Rm ✓ span(A:,1,··· ,A:,n) Implies n m

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 5 / 26

slide-9
SLIDE 9

System of Linear Equations

Given A and y , solve x in Ax = y What kind of A that makes Ax = y always have a solution?

Since Ax = ΣixiA:,i, the column space of A must contain Rm, i.e., Rm ✓ span(A:,1,··· ,A:,n) Implies n m

When does Ax = y always have exactly one solution?

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 5 / 26

slide-10
SLIDE 10

System of Linear Equations

Given A and y , solve x in Ax = y What kind of A that makes Ax = y always have a solution?

Since Ax = ΣixiA:,i, the column space of A must contain Rm, i.e., Rm ✓ span(A:,1,··· ,A:,n) Implies n m

When does Ax = y always have exactly one solution?

A has at most m columns; otherwise there is more than one x parametrizing each y Implies n = m and the columns of A are linear independent with each

  • ther

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 5 / 26

slide-11
SLIDE 11

System of Linear Equations

Given A and y , solve x in Ax = y What kind of A that makes Ax = y always have a solution?

Since Ax = ΣixiA:,i, the column space of A must contain Rm, i.e., Rm ✓ span(A:,1,··· ,A:,n) Implies n m

When does Ax = y always have exactly one solution?

A has at most m columns; otherwise there is more than one x parametrizing each y Implies n = m and the columns of A are linear independent with each

  • ther

A1 exists at this time, and x = A1y

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 5 / 26

slide-12
SLIDE 12

Outline

1

Span & Linear Dependence

2

Norms

3

Eigendecomposition

4

Singular Value Decomposition

5

Traces and Determinant

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 6 / 26

slide-13
SLIDE 13

Vector Norms

A norm of vectors is a function k·k that maps vectors to non-negative values satisfying

kxk = 0 ) x = 0 kx+yk  kxk+kyk (the triangle inequality) kcxk = |c|·kxk,8c 2 R

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 7 / 26

slide-14
SLIDE 14

Vector Norms

A norm of vectors is a function k·k that maps vectors to non-negative values satisfying

kxk = 0 ) x = 0 kx+yk  kxk+kyk (the triangle inequality) kcxk = |c|·kxk,8c 2 R

E.g., the Lp norm kxkp =

i

|xi|p !1/p

L2(Euclidean) norm: kxk = (x>x)1/2 L1norm: kxk1 = ∑i |xi| Max norm: kxk∞ = maxi |xi|

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 7 / 26

slide-15
SLIDE 15

Vector Norms

A norm of vectors is a function k·k that maps vectors to non-negative values satisfying

kxk = 0 ) x = 0 kx+yk  kxk+kyk (the triangle inequality) kcxk = |c|·kxk,8c 2 R

E.g., the Lp norm kxkp =

i

|xi|p !1/p

L2(Euclidean) norm: kxk = (x>x)1/2 L1norm: kxk1 = ∑i |xi| Max norm: kxk∞ = maxi |xi|

x>y = kxkkykcosθ, where θ is the angle between x and y

x and y are orthonormal iff x>y = 0 (orthogonal) and kxk = kyk = 1 (unit vectors)

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 7 / 26

slide-16
SLIDE 16

Matrix Norms

Frobenius norm kAkF = r

i,j

A2

i,j

Analogous to the L2 norm of a vector

An orthogonal matrix is a square matrix whose column (resp. rows) are mutually orthonormal, i.e., A>A = I = AA>

Implies A1 = A>

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 8 / 26

slide-17
SLIDE 17

Outline

1

Span & Linear Dependence

2

Norms

3

Eigendecomposition

4

Singular Value Decomposition

5

Traces and Determinant

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 9 / 26

slide-18
SLIDE 18

Decomposition

Integers can be decomposed into prime factors

E.g., 12 = 2⇥2⇥3 Helps identify useful properties, e.g., 12 is not divisible by 5

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 10 / 26

slide-19
SLIDE 19

Decomposition

Integers can be decomposed into prime factors

E.g., 12 = 2⇥2⇥3 Helps identify useful properties, e.g., 12 is not divisible by 5

Can we decompose matrices to identify information about their functional properties more easily?

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 10 / 26

slide-20
SLIDE 20

Eigenvectors and Eigenvalues

An eigenvector of a square matrix A is a non-zero vector v such that multiplication by A alters only the scale of v : Av = λv, where λ 2 R is called the eigenvalue corresponding to this eigenvector

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 11 / 26

slide-21
SLIDE 21

Eigenvectors and Eigenvalues

An eigenvector of a square matrix A is a non-zero vector v such that multiplication by A alters only the scale of v : Av = λv, where λ 2 R is called the eigenvalue corresponding to this eigenvector If v is an eigenvector, so is any its scaling cv,c 2 R,c 6= 0

cv has the same eigenvalue Thus, we usually look for unit eigenvectors

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 11 / 26

slide-22
SLIDE 22

Eigendecomposition I

Every real symmetric matrix A 2 Rn⇥n can be decomposed into A = Qdiag(λ)Q>

λ 2 Rn consists of real-valued eigenvalues (usually sorted in descending

  • rder)

Q = [v(1),··· ,v(n)] is an orthogonal matrix whose columns are corresponding eigenvectors

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 12 / 26

slide-23
SLIDE 23

Eigendecomposition I

Every real symmetric matrix A 2 Rn⇥n can be decomposed into A = Qdiag(λ)Q>

λ 2 Rn consists of real-valued eigenvalues (usually sorted in descending

  • rder)

Q = [v(1),··· ,v(n)] is an orthogonal matrix whose columns are corresponding eigenvectors

Eigendecomposition may not be unique

When any two or more eigenvectors share the same eigenvalue Then any set of orthogonal vectors lying in their span are also eigenvectors with that eigenvalue

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 12 / 26

slide-24
SLIDE 24

Eigendecomposition I

Every real symmetric matrix A 2 Rn⇥n can be decomposed into A = Qdiag(λ)Q>

λ 2 Rn consists of real-valued eigenvalues (usually sorted in descending

  • rder)

Q = [v(1),··· ,v(n)] is an orthogonal matrix whose columns are corresponding eigenvectors

Eigendecomposition may not be unique

When any two or more eigenvectors share the same eigenvalue Then any set of orthogonal vectors lying in their span are also eigenvectors with that eigenvalue

What can we tell after decomposition?

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 12 / 26

slide-25
SLIDE 25

Eigendecomposition II

Because Q = [v(1),··· ,v(n)] is an orthogonal matrix, we can think of A as scaling space by λi in direction v(i)

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 13 / 26

slide-26
SLIDE 26

Rayleigh’s Quotient

Theorem (Rayleigh’s Quotient) Given a symmetric matrix A 2 Rn⇥n, then 8x 2 Rn, λmin  x>Ax x>x  λmax, where λmin and λmax are the smallest and largest eigenvalues of A.

x>Px x>x = λi when x is the corresponding eigenvector of λi

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 14 / 26

slide-27
SLIDE 27

Singularity

Suppose A = Qdiag(λ)Q>, then A1 = Qdiag(λ)1Q>

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 15 / 26

slide-28
SLIDE 28

Singularity

Suppose A = Qdiag(λ)Q>, then A1 = Qdiag(λ)1Q> A is non-singular (invertible) iff none of the eigenvalues is zero

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 15 / 26

slide-29
SLIDE 29

Positive Definite Matrices I

A is positive semidefinite (denoted as A ⌫ O) iff its eigenvalues are all non-negative

x>Ax 0 for any x

A is positive definite (denoted as A O) iff its eigenvalues are all positive

Further ensures that x>Ax = 0 ) x = 0

Why these matter?

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 16 / 26

slide-30
SLIDE 30

Positive Definite Matrices II

A function f is quadratic iff it can be written as f(x) = 1

2x>Axb>x+c, where A is symmetric

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 17 / 26

slide-31
SLIDE 31

Positive Definite Matrices II

A function f is quadratic iff it can be written as f(x) = 1

2x>Axb>x+c, where A is symmetric

x>Ax is called the quadratic form

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 17 / 26

slide-32
SLIDE 32

Positive Definite Matrices II

A function f is quadratic iff it can be written as f(x) = 1

2x>Axb>x+c, where A is symmetric

x>Ax is called the quadratic form Figure: Graph of a quadratic form when A is a) positive definite; b) negative definite; c) positive semidefinite (singular); d) indefinite matrix.

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 17 / 26

slide-33
SLIDE 33

Outline

1

Span & Linear Dependence

2

Norms

3

Eigendecomposition

4

Singular Value Decomposition

5

Traces and Determinant

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 18 / 26

slide-34
SLIDE 34

Singular Value Decomposition (SVD)

Eigendecomposition requires square matrices. What if A is not square?

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 19 / 26

slide-35
SLIDE 35

Singular Value Decomposition (SVD)

Eigendecomposition requires square matrices. What if A is not square? Every real matrix A 2 Rm⇥n has a singular value decomposition: A = UDV>, where U 2 Rm⇥m, D 2 Rm⇥n, and V 2 Rn⇥n

U and V are orthogonal matrices, and their columns are called the left- and right-singular vectors respectively Elements along the diagonal of D are called the singular values

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 19 / 26

slide-36
SLIDE 36

Singular Value Decomposition (SVD)

Eigendecomposition requires square matrices. What if A is not square? Every real matrix A 2 Rm⇥n has a singular value decomposition: A = UDV>, where U 2 Rm⇥m, D 2 Rm⇥n, and V 2 Rn⇥n

U and V are orthogonal matrices, and their columns are called the left- and right-singular vectors respectively Elements along the diagonal of D are called the singular values

Left-singular vectors of A are eigenvectors of AA> Right-singular vectors of A are eigenvectors of A>A Non-zero singular values of A are square roots of eigenvalues of AA> (or A>A)

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 19 / 26

slide-37
SLIDE 37

Moore-Penrose Pseudoinverse I

Matrix inversion is not defined for matrices that are not square

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 20 / 26

slide-38
SLIDE 38

Moore-Penrose Pseudoinverse I

Matrix inversion is not defined for matrices that are not square Suppose we want to make a left-inverse B 2 Rn⇥m of a matrix A 2 Rm⇥n so that we can solve a linear equation Ax = y by left-multiplying each side to obtain x = By

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 20 / 26

slide-39
SLIDE 39

Moore-Penrose Pseudoinverse I

Matrix inversion is not defined for matrices that are not square Suppose we want to make a left-inverse B 2 Rn⇥m of a matrix A 2 Rm⇥n so that we can solve a linear equation Ax = y by left-multiplying each side to obtain x = By

If m > n, then it is possible to have no such B If m < n, then there could be multiple B’s

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 20 / 26

slide-40
SLIDE 40

Moore-Penrose Pseudoinverse I

Matrix inversion is not defined for matrices that are not square Suppose we want to make a left-inverse B 2 Rn⇥m of a matrix A 2 Rm⇥n so that we can solve a linear equation Ax = y by left-multiplying each side to obtain x = By

If m > n, then it is possible to have no such B If m < n, then there could be multiple B’s

By letting B = A† the Moore-Penrose pseudoinverse, we can make headway in these cases:

When m = n and A1 exists, A† degenerates to A1

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 20 / 26

slide-41
SLIDE 41

Moore-Penrose Pseudoinverse I

Matrix inversion is not defined for matrices that are not square Suppose we want to make a left-inverse B 2 Rn⇥m of a matrix A 2 Rm⇥n so that we can solve a linear equation Ax = y by left-multiplying each side to obtain x = By

If m > n, then it is possible to have no such B If m < n, then there could be multiple B’s

By letting B = A† the Moore-Penrose pseudoinverse, we can make headway in these cases:

When m = n and A1 exists, A† degenerates to A1 When m > n, A† returns the x for which Ax is closest to y in terms of Euclidean norm kAxyk

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 20 / 26

slide-42
SLIDE 42

Moore-Penrose Pseudoinverse I

Matrix inversion is not defined for matrices that are not square Suppose we want to make a left-inverse B 2 Rn⇥m of a matrix A 2 Rm⇥n so that we can solve a linear equation Ax = y by left-multiplying each side to obtain x = By

If m > n, then it is possible to have no such B If m < n, then there could be multiple B’s

By letting B = A† the Moore-Penrose pseudoinverse, we can make headway in these cases:

When m = n and A1 exists, A† degenerates to A1 When m > n, A† returns the x for which Ax is closest to y in terms of Euclidean norm kAxyk When m < n, A† returns the solution x = A†y with minimal Euclidean norm kxk among all possible solutions

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 20 / 26

slide-43
SLIDE 43

Moore-Penrose Pseudoinverse II

The Moore-Penrose pseudoinverse is defined as: A† = lim

α&0(A>A+αIn)1A>

A†A = I

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 21 / 26

slide-44
SLIDE 44

Moore-Penrose Pseudoinverse II

The Moore-Penrose pseudoinverse is defined as: A† = lim

α&0(A>A+αIn)1A>

A†A = I

In practice, it is computed by A† = VD†U>, where UDV> = A

D† 2 Rn⇥m is obtained by taking the inverses of its non-zero elements then taking the transpose

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 21 / 26

slide-45
SLIDE 45

Outline

1

Span & Linear Dependence

2

Norms

3

Eigendecomposition

4

Singular Value Decomposition

5

Traces and Determinant

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 22 / 26

slide-46
SLIDE 46

Traces

tr(A) = ∑i Ai,i

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 23 / 26

slide-47
SLIDE 47

Traces

tr(A) = ∑i Ai,i tr(A) = tr(A>) tr(aA+bB) = atr(A)+btr(B) kAk2

F = tr(AA>) = tr(A>A)

tr(ABC) = tr(BCA) = tr(CAB)

Holds even if the products have different shapes

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 23 / 26

slide-48
SLIDE 48

Determinant I

Determinant det(·) is a function that maps a square matrix A 2 Rn⇥n to a real value: det(A) = ∑

i

(1)i+1A1,i det(A1,i), where A1,i is the (n1)⇥(n1) matrix obtained by deleting the i-th row and j-th column

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 24 / 26

slide-49
SLIDE 49

Determinant I

Determinant det(·) is a function that maps a square matrix A 2 Rn⇥n to a real value: det(A) = ∑

i

(1)i+1A1,i det(A1,i), where A1,i is the (n1)⇥(n1) matrix obtained by deleting the i-th row and j-th column det(A>) = det(A) det(A1) = 1/det(A) det(AB) = det(A)det(B)

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 24 / 26

slide-50
SLIDE 50

Determinant I

Determinant det(·) is a function that maps a square matrix A 2 Rn⇥n to a real value: det(A) = ∑

i

(1)i+1A1,i det(A1,i), where A1,i is the (n1)⇥(n1) matrix obtained by deleting the i-th row and j-th column det(A>) = det(A) det(A1) = 1/det(A) det(AB) = det(A)det(B) det(A) = ∏i λi What does it mean?

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 24 / 26

slide-51
SLIDE 51

Determinant I

Determinant det(·) is a function that maps a square matrix A 2 Rn⇥n to a real value: det(A) = ∑

i

(1)i+1A1,i det(A1,i), where A1,i is the (n1)⇥(n1) matrix obtained by deleting the i-th row and j-th column det(A>) = det(A) det(A1) = 1/det(A) det(AB) = det(A)det(B) det(A) = ∏i λi What does it mean? det(A) can be also regarded as the signed area

  • f the image of the “unit square”

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 24 / 26

slide-52
SLIDE 52

Determinant II

Let A =  a b c d

  • , we have [1,0]A = [a,b], [0,1]A = [c,d], and

det(A) = ad bc

Figure: The area of the parallelogram is the absolute value of the determinant of the matrix formed by the images of the standard basis vectors representing the parallelogram’s sides.

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 25 / 26

slide-53
SLIDE 53

Determinant III

The absolute value of the determinant can be thought of as a measure

  • f how much multiplication by the matrix expands or contracts space

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 26 / 26

slide-54
SLIDE 54

Determinant III

The absolute value of the determinant can be thought of as a measure

  • f how much multiplication by the matrix expands or contracts space

If det(A) = 0, then space is contracted completely along at least one dimension

A is invertible iff det(A) 6= 0

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 26 / 26

slide-55
SLIDE 55

Determinant III

The absolute value of the determinant can be thought of as a measure

  • f how much multiplication by the matrix expands or contracts space

If det(A) = 0, then space is contracted completely along at least one dimension

A is invertible iff det(A) 6= 0

If det(A) = 1, then the transformation is volume-preserving

Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 26 / 26