Mathematical Background Lijun Zhang zlj@nju.edu.cn - - PowerPoint PPT Presentation

mathematical background
SMART_READER_LITE
LIVE PREVIEW

Mathematical Background Lijun Zhang zlj@nju.edu.cn - - PowerPoint PPT Presentation

Mathematical Background Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Norms Analysis Functions Derivatives Linear Algebra Inner product Inner product on


slide-1
SLIDE 1

Mathematical Background

Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj

slide-2
SLIDE 2

Outline

 Norms  Analysis  Functions  Derivatives  Linear Algebra

slide-3
SLIDE 3

Inner product

 Inner product on

  •  Euclidean norm, or -norm
  • /
  • /
  •  Cauchy-Schwartz inequality
  •  Angle between nonzero vectors
  • ∠𝑦, 𝑧 cos

𝑦𝑧 𝑦 𝑧 , 𝑦, 𝑧 ∈ 𝐒

slide-4
SLIDE 4

Inner product

 Inner product on

,

  • Here tr denotes trace of a matrix.

 Frobenius norm of a matrix

  •  Inner product on
  • 𝑌 tr 𝑌𝑌

/

𝑌

  • /

⟨𝑌, 𝑍⟩ tr 𝑌𝑍 𝑌

  • 𝑍
  • ⟨𝑌, 𝑍⟩ tr𝑌𝑍 𝑌
  • 𝑍

𝑌

  • 𝑍

2 𝑌

  • 𝑍
slide-5
SLIDE 5

Norms

 A function 𝑔: 𝐒 → 𝐒 with dom 𝑔 𝐒 is called a norm if  𝑔 is nonnegative: 𝑔𝑦 0 for all 𝑦 ∈ 𝐒  𝑔 is definite: 𝑔𝑦 0 only if 𝑦 0  𝑔 is homogeneous: 𝑔𝑢𝑦 |𝑢|𝑔𝑦, for all 𝑦 ∈ 𝐒 and 𝑢 ∈ 𝐒  𝑔 satisfies the triangle inequality: 𝑔𝑦 𝑧 𝑔𝑦 𝑔𝑧, for all 𝑦, 𝑧 ∈ 𝐒  Distance  Between vectors 𝑦 and 𝑧 as the length of their difference, i.e., dist𝑦, 𝑧 𝑦 𝑧

slide-6
SLIDE 6

Norms

 Unit ball

 The set of all vectors with norm less than or equal to one, ℬ 𝑦 ∈ 𝐒 | 𝑦 1 is called the unit ball of the norm ∥⋅∥.  The unit ball satisfies the following properties:

 ℬ is symmetric about the origin, i.e., 𝑦 ∈ ℬ if and

  • nly if 𝑦 ∈ ℬ

 ℬ is convex  ℬ is closed, bounded, and has nonempty interior

 Conversely, if 𝐷 ⊆ 𝐒 is any set satisfying these three conditions, the it is the unit ball of a norm: 𝑦 sup 𝑢 0 𝑢𝑦 ∈ 𝐷

slide-7
SLIDE 7

Norms

 Some common norms on

  •  Sum-absolute-value, or -norm
  •  Chebyshev or -norm
  • norm
  • /

 For

  • , -quadratic norm is
  • /

/

slide-8
SLIDE 8

Norms

 Some common norms on

  •  Sum-absolute-value norm
  •  Maximum-absolute-value norm
slide-9
SLIDE 9

Norms

 Equivalence of norms

 Suppose that

and are norms

  • n

, there exist positive constants

and , for all

  •  If

is any norm on

, then there

exists a quadratic norm

for which

  • holds for all .
slide-10
SLIDE 10

Norms

 Operator norms

 Suppose

and are norms on

  • and

, respectively. Operator norm of induced by and is ,

  •  When

and are Euclidean norms,

the operator norm of is its maximum singular value, and is denoted

  •  Spectral norm or ℓ-norm

𝑌 𝜏𝑌 𝜇 𝑌𝑌

/

slide-11
SLIDE 11

Norms

 Operator norms

 The norm induced by the ℓ-norm on 𝐒 and 𝐒, denoted 𝑌 , is the max-row-sum norm, 𝑌 sup 𝑌𝑣 | 𝑣 1 max,…, 𝑌

  •  The norm induced by the ℓ-norm on 𝐒 and

𝐒, denoted 𝑌 , is the max-column-sum norm, 𝑌 max,…, ∑ 𝑌

slide-12
SLIDE 12

Norms

 Dual norm

 Let be a norm on

.

 The associated dual norm, denoted

∗,

is defined as

  •  We have the inequality

 The dual of Euclidean norm  The dual of the

  • norm

sup 𝑨𝑦| 𝑦 1 𝑨 sup 𝑨𝑦| 𝑦 1 𝑨

slide-13
SLIDE 13

Norms

 Dual Norm

 The dual of

  • norm is the
  • norm such

that  The dual of the

  • norm on

is the

nuclear norm

  • /
slide-14
SLIDE 14

Outline

 Norms  Analysis  Functions  Derivatives  Linear Algebra

slide-15
SLIDE 15

Analysis

 Interior and Open Set

 An element 𝑦 ∈ 𝐷 ⊆ 𝐒 is called an interior point of 𝐷 if there exists an 𝜗 0 for which 𝑧 𝑧 𝑦 𝜗 ⊆ 𝐷 i.e., there exists a ball centered at 𝑦 that lies entirely in 𝐷.  The set of all points interior to 𝐷 is called the interior of 𝐷 and is denoted int 𝐷.

 A set is open if

slide-16
SLIDE 16

Analysis

 Closed Set and Boundary

 A set

  • is closed if its complement is
  • pen

 The closure of a set 𝐷 is defined as cl 𝐷 𝐒 ∖ int𝐒𝐨 ∖ 𝐷  The boundary of the set 𝐷 is defined as bd 𝐷 cl 𝐷 ∖ int 𝐷

 𝐷 is closed if it contains its boundary. It is

  • pen if it contains no boundary points.

𝐒 ∖ 𝐷 𝑦 ∈ 𝐒|𝑦 ∉ 𝐷

slide-17
SLIDE 17

Analysis

 Supremum and infimum

 The least upper bound or supremum

  • f the set

is denoted .  The greatest lower bound or infimum

  • f the set

is denoted .

slide-18
SLIDE 18

Outline

 Norms  Analysis  Functions  Derivatives  Linear Algebra

slide-19
SLIDE 19

Functions

 Notation

 An example

slide-20
SLIDE 20

Functions

 Continuity

 A function

  • is continuous at

if for all there exists a with , such that

  •  Closed functions

 A function

  • is closed if, for each

, the sublevel set is closed. This is equivalent to

slide-21
SLIDE 21

Outline

 Norms  Analysis  Functions  Derivatives  Linear Algebra

slide-22
SLIDE 22

Derivatives

 Definition

 Suppose

  • and

. The function is differentiable at if there exists a matrix

that satisfies

in which case we refer to as the derivative (or Jacobian) of at .

lim

∈ , , →

𝑔 𝑨 𝑔 𝑦 𝐸𝑔 𝑦 𝑨 𝑦

  • 𝑨 𝑦
slide-23
SLIDE 23

Derivatives

 Definition

 The affine function of given by is called the first-order approximation

  • f

at (or near) .

slide-24
SLIDE 24

Derivatives

 Gradient

 When 𝑔 is real-valued (i.e., 𝑔: 𝐒 → 𝐒) the derivative 𝐸𝑔𝑦 is a 1 𝑜 matrix (it is a row vector). Its transpose is called the gradient of the function: 𝛼𝑔𝑦 𝐸𝑔𝑦 which is a column vector (in 𝐒). Its components are the partial derivatives of 𝑔: 𝛼𝑔𝑦 𝜖𝑔𝑦 𝜖𝑦 , 𝑗 1, ⋯ , 𝑜  The first-order approximation of 𝑔 at a point 𝑦 ∈ int dom 𝑔 can be expressed as (the affine function

  • f 𝑨)

𝑔𝑦 𝛼𝑔𝑦𝑨 𝑦

slide-25
SLIDE 25

Derivatives

 Examples

𝑔 𝑦 1 2 𝑦𝑄𝑦 𝑟𝑦 𝑠 𝛼𝑔 𝑦 𝑄𝑦 𝑟 𝑔 𝑌 log det 𝑌 , dom 𝑔 𝐓

  • 𝛼𝑔 𝑌 𝑌
slide-26
SLIDE 26

Derivatives

 Chain rule

 Suppose 𝑔: 𝐒 → 𝐒 is differentiable at 𝑦 ∈ int dom 𝑔 and 𝑕: 𝐒 → 𝐒 is differentiable at 𝑔𝑦 ∈ int dom 𝑕. Define the composition ℎ: 𝐒 → 𝐒 by ℎ𝑨 𝑕𝑔𝑨. Then ℎ is differentiable at 𝑦, with derivate  Suppose 𝑔: 𝐒 → 𝐒, 𝑕: 𝐒 → 𝐒, and ℎ 𝑦 𝑕𝑔 𝑦

𝐸ℎ𝑦 𝐸𝑕𝑔𝑦𝐸𝑔𝑦 𝛼ℎ 𝑦 𝑕 𝑔 𝑦 𝛼𝑔𝑦

slide-27
SLIDE 27

Derivatives

 Composition of Affine Function

𝑕 𝑦 𝑔𝐵𝑦 𝑐 𝛼𝑕 𝑦 𝐵𝛼𝑔𝐵𝑦 𝑐 𝑔: 𝐒 → 𝐒, 𝑕: 𝐒 → 𝐒 𝑕 𝑢 𝑔 𝑦 𝑢𝑤 , 𝑦, 𝑤 ∈ 𝐒 𝑕′ 𝑢 𝑤𝛼𝑔 𝑦 𝑢𝑤

slide-28
SLIDE 28

Example 1

 Consider the function

  •  where

𝑔 𝑦 log exp 𝑏

𝑦 𝑐

  • 𝑕 𝑧 log exp

𝑧

  • 𝛼𝑕 𝑧

1 ∑ exp 𝑧

  • exp 𝑧

⋮ exp 𝑧

slide-29
SLIDE 29

Example 1

 Consider the function

  •  where

𝑔 𝑦 log exp 𝑏

𝑦 𝑐

  • 𝛼𝑔 𝑦 𝐵𝛼𝑕 𝐵𝑦 𝑐

1 1𝑨 𝐵𝑨 𝑨 exp 𝑏

𝑦 𝑐

⋮ exp 𝑏

𝑦 𝑐

slide-30
SLIDE 30

Example 2

 Consider the function

 where

slide-31
SLIDE 31

Second Derivative

 Definition

 Suppose 𝑔: 𝐒 → 𝐒. The second derivative or Hessian matrix of 𝑔 at 𝑦 ∈ int dom 𝑔, denoted 𝛼𝑔𝑦, is given by

 Second-order Approximation

𝛼𝑔𝑦 𝜖𝑔𝑦 𝜖𝑦𝜖𝑦 , 𝑗 1, ⋯ , 𝑜, 𝑘 1, ⋯ , 𝑜. 𝑔𝑦 𝛼𝑔𝑦 𝑨 𝑦 1 2 𝑨 𝑦 𝛼𝑔𝑦𝑨 𝑦

slide-32
SLIDE 32

Derivatives

 Examples

𝑔 𝑦 1 2 𝑦𝑄𝑦 𝑟𝑦 𝑠 𝛼𝑔 𝑦 𝑄𝑦 𝑟 𝑔 𝑌 log det 𝑌 , dom 𝑔 𝐓

  • 𝛼𝑔 𝑌 𝑌

𝛼𝑔 𝑦 𝑄 𝑔 𝑌 tr 𝑌 𝑎 𝑌 1 2 tr 𝑌 𝑎 𝑌 𝑌 𝑎 𝑌

slide-33
SLIDE 33

Second Derivative

 Chain rule

 Suppose

  • ,

, and .  Composition with affine function:

𝛼𝑕𝑦 𝐵𝛼𝑔𝐵𝑦 𝑐𝐵 𝛼ℎ𝑦 𝑕𝑔𝑦𝛼𝑔𝑦 𝑕𝑔𝑦𝛼𝑔𝑦𝛼𝑔𝑦 𝑕 𝑦 𝑔𝐵𝑦 𝑐

slide-34
SLIDE 34

Example 1

 Consider the function

  •  where

𝑔 𝑦 log exp 𝑏

𝑦 𝑐

  • 𝑕 𝑧 log exp

𝑧

  • 𝛼𝑕 𝑧

1 ∑ exp 𝑧

  • exp 𝑧

⋮ exp 𝑧 𝛼𝑕 𝑧 diag𝛼𝑕𝑧 𝛼𝑕 𝑧 𝛼𝑕 𝑧

slide-35
SLIDE 35

Example 1

 Consider the function

  •  where

  • 𝑔 𝑦 log exp

𝑏

𝑦 𝑐

slide-36
SLIDE 36

Outline

 Norms  Analysis  Functions  Derivatives  Linear Algebra

slide-37
SLIDE 37

Linear algebra

 Range and nullspace

 Let 𝐵 ∈ 𝐒, the range of 𝐵, denoted ℛ𝐵, is the set of all vectors in 𝐒 that can be written as linear combinations of the columns of A: ℛ𝐵 𝐵𝑦|𝑦 ∈ 𝐒 ⊆ 𝐒  The nullspace (or kernel) of A, denoted 𝒪𝐵, is the set of all vectors 𝑦 mapped into zero by A: 𝒪𝐵 𝑦|𝐵𝑦 0 ⊆ 𝐒  if 𝒲 is a subspace of 𝐒, its orthogonal complement, denoted 𝒲, is defined as: 𝒲 𝑦|𝑨𝑦 0 for all 𝑨 ∈ 𝒲

slide-38
SLIDE 38

Linear algebra

 Range and nullspace

 Let 𝐵 ∈ 𝐒, the range of 𝐵, denoted ℛ𝐵, is the set of all vectors in 𝐒 that can be written as linear combinations of the columns of A: ℛ𝐵 𝐵𝑦|𝑦 ∈ 𝐒 ⊆ 𝐒  The nullspace (or kernel) of A, denoted 𝒪𝐵, is the set of all vectors 𝑦 mapped into zero by A: 𝒪𝐵 𝑦|𝐵𝑦 0 ⊆ 𝐒  if 𝒲 is a subspace of 𝐒, its orthogonal complement, denoted 𝒲, is defined as: 𝒲 𝑦|𝑨𝑦 0 for all 𝑨 ∈ 𝒲 𝒪𝐵 ℛ 𝐵 𝒪𝐵 ℛ 𝐵

slide-39
SLIDE 39

Linear algebra

 Symmetric eigenvalue decomposition

 Suppose

, i.e.,

is a real symmetric

  • matrix. Then

can be factored as

  • where

is orthogonal, i.e.,

satisfies

  • , and
  •  The determinant and trace can be

expressed in terms of the eigenvalue.

slide-40
SLIDE 40

Linear algebra

 Norms

  • ,…,
  • /
slide-41
SLIDE 41

Linear algebra

 Positive definite Matrix

 A matrix

is called positive

definite, if for all

  • ,

denoted as .  If is positive definite, we say is negative definite, denoted as .  We use

  • to denote the set of

positive definite matrices in

.

 We use

  • to denote the set of

positive semidefinite matrices in

.

slide-42
SLIDE 42

Linear algebra

 Singular value decomposition (SVD)

 Suppose

with

. Then can be factored as

  • where

satisfies

  • satisfies
  • , and
  • with
  •  The singular value decomposition can be

written

slide-43
SLIDE 43

Linear algebra

 Norms

  • /
slide-44
SLIDE 44

Linear algebra

 Pseudo-inverse

 Let 𝐵 𝑉𝛵𝑊 be the singular value decomposition of 𝐵 ∈ 𝐒, with rank 𝐵 𝑠. The pseudo-inverse or Moore-Penrose inverse of 𝐵 is 𝐵 𝑊𝛵𝑉 ∈ 𝐒

 Schur complement

 𝐵 ∈ 𝐓, and a matrix 𝑌 ∈ 𝐓 partitioned as 𝑌 𝐵 𝐶 𝐶 𝐷  If det 𝐵 0, the matrix 𝑇 𝐷 𝐶𝐵𝐶 is called the Schur complement of 𝐵 in 𝑌.

slide-45
SLIDE 45

Application of Schur

complement

 PD Matrices  if and only if and  If , then if and only if  PSD Matrices

𝑌 ≽ 0 ⟺ 𝐵 ≽ 0, 𝐽 𝐵𝐵 𝐶 0, 𝐷 𝐶𝐵𝐶 ≽ 0

slide-46
SLIDE 46

Summary

 Norms of vectors

  • norm, -norm, -norm, -quadratic

norm

 Norms of Matrices

 Frobenius norm, spectral norm, nuclear norm

 Gradients of Common Functions

 The Matrix Cookbook

 Eigendecompostion vs SVD  PSD matrices