Linear Algebra Review (with a Small Dose of Optimization) Hristo - - PowerPoint PPT Presentation

linear algebra review with a small dose of optimization
SMART_READER_LITE
LIVE PREVIEW

Linear Algebra Review (with a Small Dose of Optimization) Hristo - - PowerPoint PPT Presentation

Linear Algebra Review (with a Small Dose of Optimization) Hristo Paskov CS246 Outline Basic definitions Subspaces and Dimensionality Matrix functions: inverses and eigenvalue decompositions Convex optimization Vectors and


slide-1
SLIDE 1

Linear Algebra Review (with a Small Dose of Optimization)

Hristo Paskov CS246

slide-2
SLIDE 2

Outline

  • Basic definitions
  • Subspaces and Dimensionality
  • Matrix functions: inverses and eigenvalue

decompositions

  • Convex optimization
slide-3
SLIDE 3

Vectors and Matrices

  • Vector ∈ ℝ

=

  • May also write

=

slide-4
SLIDE 4

Vectors and Matrices

  • Matrix ∈ ℝ×

=

⋱ ⋮

  • Written in terms of rows or columns

=

  • =

  • =

… = …

slide-5
SLIDE 5

Multiplication

  • Vector-vector: , ∈ ℝ → ℝ

=

  • Matrix-vector: ∈ ℝ, ∈ ℝ× → ℝ

=

  • =
slide-6
SLIDE 6

Multiplication

  • Matrix-matrix: ∈ ℝ×, ∈ ℝ× → ℝ×

=

5 3 3 4 4 5

slide-7
SLIDE 7

Multiplication

  • Matrix-matrix: ∈ ℝ×, ∈ ℝ× → ℝ×

– rows of , !" cols of

= ! … ! =

  • =
  • !

  • !

  • !"

  • !

  • !
slide-8
SLIDE 8

Multiplication Properties

  • Associative

# = #

  • Distributive

+ # = + #

  • NOT commutative

– Dimensions may not even be conformable

slide-9
SLIDE 9

Useful Matrices

  • Identity matrix & ∈ ℝ×

– & = , & =

1 1 1 &" = )0* ≠ + 1* = +

  • Diagonal matrix ∈ ℝ×

= diag 0, … , 0 = ⋯ ⋮ ⋮ ⋯

slide-10
SLIDE 10

Useful Matrices

  • Symmetric ∈ ℝ×: =
  • Orthogonal 2 ∈ ℝ×:

22 = 22 = &

– Columns/ rows are orthonormal

  • Positive semidefinite ∈ ℝ×:

≥ 0forall ∈ ℝ

– Equivalently, there exists 8 ∈ ℝ× = 88

slide-11
SLIDE 11

Outline

  • Basic definitions
  • Subspaces and Dimensionality
  • Matrix functions: inverses and eigenvalue

decompositions

  • Convex optimization
slide-12
SLIDE 12

Norms

  • Quantify “size” of a vector
  • Given ∈ ℝ, a norm satisfies

1. 9 = 9

  • 2.
  • = 0 ⇔ = 0

3. + ≤ +

  • Common norms:
  • 1. Euclidean 8-norm: =
  • + ⋯ +
  • 2.

8-norm: = + ⋯ + 3. 8<-norm: < = max

slide-13
SLIDE 13

Linear Subspaces

slide-14
SLIDE 14

Linear Subspaces

  • Subspace ? ⊂ ℝ satisfies

1. 0 ∈ ? 2. If , ∈ ? and 9 ∈ ℝ, then 9 + ∈ ?

  • Vectors A, … , A span ? if

? = BA

  • B ∈ ℝ
slide-15
SLIDE 15

Linear Independence and Dimension

  • Vectors A, … , A are linearly independent if

C BA

  • = 0 ⟺ B = 0

– Every linear combination of the A is unique

  • Dim ? = F if A, … , A span ? and are

linearly independent

– If G, … , G span ?then

  • H ≥ F
  • If H > F then G are NOT linearly independent
slide-16
SLIDE 16

Linear Independence and Dimension

slide-17
SLIDE 17

Matrix Subspaces

  • Matrix ∈ ℝ× defines two subspaces

– Column space col = B B ∈ ℝ ⊂ ℝ – Row space row = L L ∈ ℝ ⊂ ℝ

  • Nullspace of : null = ∈ ℝ = 0

– null ⊥ row – dim null + dim row = P – Analog for column space

slide-18
SLIDE 18

Matrix Rank

  • rank gives dimensionality of row and

column spaces

  • If ∈ ℝ× has rank H, can decompose into

product of F × H and H × P matrices

  • rank = H

=

F P F P H H

slide-19
SLIDE 19

Properties of Rank

  • For , ∈ ℝ×
  • 1. rank ≤ min F, P
  • 2. rank = rank
  • 3. rank ≤ min rank , rank
  • 4. rank + ≤ rank + rank
  • has full rank if rank = min F, P
  • If F > rank rows not linearly independent

– Same for columns if P > rank

slide-20
SLIDE 20

Outline

  • Basic definitions
  • Subspaces and Dimensionality
  • Matrix functions: inverses and eigenvalue

decompositions

  • Convex optimization
slide-21
SLIDE 21

Matrix Inverse

  • ∈ ℝ× is invertible iff rank = F
  • Inverse is unique and satisfies

1. T = T = & 2. T T = 3. T = T 4. If is invertible then is invertible and T = TT

slide-22
SLIDE 22

Systems of Equations

  • Given ∈ ℝ×, ∈ ℝ wish to solve

=

– Exists only if ∈ col

  • Possibly infinite number of solutions
  • If is invertible then = T

– Notational device, do not actually invert matrices – Computationally, use solving routines like Gaussian elimination

slide-23
SLIDE 23

Systems of Equations

  • What if ∉ col ?
  • Find that gives

V = closest to

– V is projection of onto col – Also known as regression

  • Assume rank = P < F

= T V = T

Invertible Projection matrix

slide-24
SLIDE 24

Systems of Equations

' ( S ' R XSR = R X'R X' ' ' ' X' XR = R X'R

slide-25
SLIDE 25

Eigenvalue Decomposition

  • Eigenvalue decomposition of symmetric ∈ ℝ× is

= YΣY = [\\

  • – Σ = diag [, … , [ contains eigenvalues of

– Y is orthogonal and contains eigenvectors \ of

  • If is not symmetric but diagonalizable

= YΣYT

– Σ is diagonal by possibly complex – Y not necessarily orthogonal

slide-26
SLIDE 26

Characterizations of Eigenvalues

  • Traditional formulation

= [

– Leads to characteristic polynomial

det X [& = 0

  • Rayleigh quotient (symmetric )

max

_

slide-27
SLIDE 27

Eigenvalue Properties

  • For ∈ ℝ× with eigenvalues [

1. tr = C [

  • 2.

det = [[ … [ 3. rank = #[ ≠ 0

  • When is symmetric

– Eigenvalue decomposition is singular value decomposition – Eigenvectors for nonzero eigenvalues give

  • rthogonal basis for row = col
slide-28
SLIDE 28

Simple Eigenvalue Proof

  • Why det − [& = 0?
  • Assume is symmetric and full rank
  • 1. = YΣY
  • 2. − [& = YΣY − [& = Y Σ − [& Y
  • 3. If [ = [, *ab eigenvalue of − [& is 0
  • 4. Since det − [& is product of eigenvalues,
  • ne of the terms is 0, so product is 0

YY = &

slide-29
SLIDE 29

Outline

  • Basic definitions
  • Subspaces and Dimensionality
  • Matrix functions: inverses and eigenvalue

decompositions

  • Convex optimization
slide-30
SLIDE 30

Convex Optimization

  • Find minimum of a function subject to

solution constraints

  • Business/economics/ game theory

– Resource allocation – Optimal planning and strategies

  • Statistics and Machine Learning

– All forms of regression and classification – Unsupervised learning

  • Control theory

– Keeping planes in the air!

slide-31
SLIDE 31

Convex Sets

  • A set # is convex if ∀, ∈ # and ∀B ∈ 0,1

B + 1 − B ∈ #

– Line segment between points in # also lies in #

  • Ex

– Intersection of halfspaces – 8d balls – Intersection of convex sets

slide-32
SLIDE 32

Convex Functions

  • A real-valued function e is convex if dome is

convex and ∀, ∈ dome and ∀B ∈ 0,1 e B + 1 − B ≤ Be + 1 − B e

– Graph of e upper bounded by line segment between points on graph

, e , e

slide-33
SLIDE 33

Gradients

  • Differentiable convex e with dome = ℝ
  • Gradient fe at gives linear approximation

fe = ge g … ge g

  • e
  • e + hfe
slide-34
SLIDE 34

Gradients

  • Differentiable convex e with dome = ℝ
  • Gradient fe at gives linear approximation

fe = ge g … ge g

  • e
  • e + hfe
slide-35
SLIDE 35

Gradient Descent

  • To minimize emove down gradient

– But not too far! – Optimum when fe = 0

  • Given e, learning rate B, starting point i

= i Do until fe = 0 = − Bfe

slide-36
SLIDE 36

Stochastic Gradient Descent

  • Many learning problems have extra structure

e j = 8 j; A

  • Computing gradient requires iterating over all

points, can be too costly

  • Instead, compute gradient at single training

example

slide-37
SLIDE 37

Stochastic Gradient Descent

  • Given e j = C

8 j; A

  • , learning rate B,

starting point ji j = ji Do until e j nearly optimal For * = 1toP in random order j = j − Bf8 j; A

  • Finds nearly optimal j
slide-38
SLIDE 38

Minimize C − jA

slide-39
SLIDE 39

Learning Parameter