Conjugate Gradient (CG) Majid Lesani Alireza Masoum Overview - - PDF document

conjugate gradient cg
SMART_READER_LITE
LIVE PREVIEW

Conjugate Gradient (CG) Majid Lesani Alireza Masoum Overview - - PDF document

Conjugate Gradient (CG) Majid Lesani Alireza Masoum Overview Backpropagation Gradient Descent Quadratic Forms Gradient Descent in Quadratic Forms Eigen vectors and values Gradient Descent Convergence Conjugate


slide-1
SLIDE 1
slide-2
SLIDE 2

Conjugate Gradient (CG)

Majid Lesani Alireza Masoum

slide-3
SLIDE 3

Overview

Backpropagation Gradient Descent Quadratic Forms Gradient Descent in Quadratic Forms Eigen vectors and values Gradient Descent Convergence Conjugate Gradient

slide-4
SLIDE 4

BackPropagation

Abstraction Generalization problem

  • Heuristic features
  • Small networks
  • Early stopping
  • Regularization

Search Convergence problem

slide-5
SLIDE 5

Or Steepest Descent

Gradient Descent

x y x f ∂ ∂ ) , ( y y x f ∂ ∂ ) , (

slide-6
SLIDE 6

Faster Training

Gradient Descent modification

Gradient Descent BP with Momentum Variable Learning Rate BP

numerical optimization techniques

Conjugate Gradient BP Quasi-Newton BP

slide-7
SLIDE 7

Gradient Descent

The problem is choosing the step size

slide-8
SLIDE 8

Gradient Descent Choosing Best Step Size

Choose Where

is minimum

(By chain rule)

i

α

) (

1 + i

x f

) (

1 =

∂ ∂

+ i i

x f α

). ( ) (

1

= ∇ = ∂ + ∂ ⇒

+ i i i i i i

r x f r x f α α

1 =

+ i T i r

r

slide-9
SLIDE 9

Gradient Descent Choosing Best Step Size

slide-10
SLIDE 10

Quadratic forms

Our discussion is to minimize the quadratic

function:

c x b Ax x x f

T T

+ − = 2 1 ) (

slide-11
SLIDE 11

Positive definite (for every vector v, )

> Av vT

slide-12
SLIDE 12

Quadratic Forms

A Symmetric Positive-Definite Matrix have a

global minimum where gradient is zero

Solving equation Ax = b equals to minimize f

c x b Ax x x f

T T

+ − = 2 1 ) (

b Ax x f − = ∇ = ) (

slide-13
SLIDE 13

Gradient Descent for Quadratic Forms

slide-14
SLIDE 14
slide-15
SLIDE 15

steepest descent for quadratic form is

slide-16
SLIDE 16
slide-17
SLIDE 17
slide-18
SLIDE 18
slide-19
SLIDE 19

Eigen Vectors and Eigen Values

An eigenvector of a matrix A is a nonzero vector that

does not rotate when A is applied to it. Only scale by constant

Every symmetric matrix have n orthogonal eigen

vector with it’s related eigen value

slide-20
SLIDE 20

Using Eigen Vectors

think of a vector as a sum of other

vectors whose behavior is understood

slide-21
SLIDE 21

Using Eigen Vectors

Positive definite matrix is a matrix that

all its eigen values are positive

Eigen vectors are axis of our rotated

ellipse and each radius relate to corresponding eigen value

slide-22
SLIDE 22

General Convergence of Steepest Descent

Relation between eigen values of A Eigen vector components of error

slide-23
SLIDE 23

Fast Convergence

Same eigen values have fast

convergence

slide-24
SLIDE 24

Poor Convergence

Different Eigen vectors and error component in

direction of eigen vectors of smaller eigen values

slide-25
SLIDE 25

Conjugate Gradient Overview

Orthogonal Directions Conjugate vectors Conjugate Directions Gram-Schmidt algorithm Gradient and error optimality Conjugate Gradient

slide-26
SLIDE 26

Orthogonal Directions

Steepest descent go in one direction

many times

if we have n orthogonal search

directions and choose best step every time After n steps we are at the goal!

slide-27
SLIDE 27
slide-28
SLIDE 28

Orthogonal Directions

We need every time error be

  • rthogonal to previous direction
slide-29
SLIDE 29

Conjugate vectors

slide-30
SLIDE 30

Conjugate vectors

Two vectors

and are A-orthogonal ( or conjugate) if

Being Conjugate in scaled space

means orthogonal in unscaled space

slide-31
SLIDE 31

Conjugate Directions

If we have n conjugate search

directions and like orthogonal directions choose best step every time After n steps we are at the goal!

slide-32
SLIDE 32

Conjugate Directions

slide-33
SLIDE 33

Orthogonal Directions

slide-34
SLIDE 34

Conjugate Directions

We need every time error be

A-orthogonal to previous direction

slide-35
SLIDE 35

Conjugate Directions

i i i i i i

r b Ax Ax Ax Ae x x e − = − = − = − =

slide-36
SLIDE 36
slide-37
SLIDE 37

Gram-Schmidt algorithm

So, only remains to find n conjugate

directions

Gram-Schmidt algorithm do it

have n independent Gives n conjugate directions

slide-38
SLIDE 38

Gram-Schmidt algorithm

slide-39
SLIDE 39

Gram-Schmidt algorithm

slide-40
SLIDE 40

Conjugate Directions

So Algorithm is complete but it’s

!

We had Gaussian elimination

algorithm before

slide-41
SLIDE 41

Conjugate Directions with axial unit vectors

slide-42
SLIDE 42

Gradient and error optimality

For every We have It means

slide-43
SLIDE 43
slide-44
SLIDE 44
slide-45
SLIDE 45

Conjugate Gradient

Use

for

Makes equations very simple Complexity from O(n^2) per iteration reduce to O(m),

m is number of nonzero entries of A

slide-46
SLIDE 46
slide-47
SLIDE 47
slide-48
SLIDE 48
slide-49
SLIDE 49

Line Search

Finding stepsize

compute best step-size

) ( min arg

i i i

d x f ⋅ + ∈

α α

α

slide-50
SLIDE 50

End

Thanks for your patience!