Mathematical Background Lijun Zhang zlj@nju.edu.cn - - PowerPoint PPT Presentation
Mathematical Background Lijun Zhang zlj@nju.edu.cn - - PowerPoint PPT Presentation
Mathematical Background Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Norms Analysis Functions Derivatives Linear Algebra Inner product Inner product on
Outline
Norms Analysis Functions Derivatives Linear Algebra
Inner product
Inner product on
- Euclidean norm, or -norm
- /
- /
- Cauchy-Schwartz inequality
- Angle between nonzero vectors
- ∠𝑦, 𝑧 cos
𝑦𝑧 𝑦 𝑧 , 𝑦, 𝑧 ∈ 𝐒
Inner product
Inner product on
,
- Here tr denotes trace of a matrix.
Frobenius norm of a matrix
- Inner product on
- 𝑌 tr 𝑌𝑌
/
𝑌
- /
⟨𝑌, 𝑍⟩ tr 𝑌𝑍 𝑌
- 𝑍
- ⟨𝑌, 𝑍⟩ tr𝑌𝑍 𝑌
- 𝑍
𝑌
- 𝑍
2 𝑌
- 𝑍
Norms
A function 𝑔: 𝐒 → 𝐒 with dom 𝑔 𝐒 is called a norm if 𝑔 is nonnegative: 𝑔𝑦 0 for all 𝑦 ∈ 𝐒 𝑔 is definite: 𝑔𝑦 0 only if 𝑦 0 𝑔 is homogeneous: 𝑔𝑢𝑦 |𝑢|𝑔𝑦, for all 𝑦 ∈ 𝐒 and 𝑢 ∈ 𝐒 𝑔 satisfies the triangle inequality: 𝑔𝑦 𝑧 𝑔𝑦 𝑔𝑧, for all 𝑦, 𝑧 ∈ 𝐒 Distance Between vectors 𝑦 and 𝑧 as the length of their difference, i.e., dist𝑦, 𝑧 𝑦 𝑧
Norms
Unit ball
The set of all vectors with norm less than or equal to one, ℬ 𝑦 ∈ 𝐒 | 𝑦 1 is called the unit ball of the norm ∥⋅∥. The unit ball satisfies the following properties:
ℬ is symmetric about the origin, i.e., 𝑦 ∈ ℬ if and
- nly if 𝑦 ∈ ℬ
ℬ is convex ℬ is closed, bounded, and has nonempty interior
Conversely, if 𝐷 ⊆ 𝐒 is any set satisfying these three conditions, the it is the unit ball of a norm: 𝑦 sup 𝑢 0 𝑢𝑦 ∈ 𝐷
Norms
Some common norms on
- Sum-absolute-value, or -norm
- Chebyshev or -norm
-
- norm
- /
For
- , -quadratic norm is
- /
/
Norms
Some common norms on
- Sum-absolute-value norm
- Maximum-absolute-value norm
Norms
Equivalence of norms
Suppose that
and are norms
- n
, there exist positive constants
and , for all
- If
is any norm on
, then there
exists a quadratic norm
for which
- holds for all .
Norms
Operator norms
Suppose
and are norms on
- and
, respectively. Operator norm of induced by and is ,
- When
and are Euclidean norms,
the operator norm of is its maximum singular value, and is denoted
- Spectral norm or ℓ-norm
𝑌 𝜏𝑌 𝜇 𝑌𝑌
/
Norms
Operator norms
The norm induced by the ℓ-norm on 𝐒 and 𝐒, denoted 𝑌 , is the max-row-sum norm, 𝑌 sup 𝑌𝑣 | 𝑣 1 max,…, 𝑌
- The norm induced by the ℓ-norm on 𝐒 and
𝐒, denoted 𝑌 , is the max-column-sum norm, 𝑌 max,…, ∑ 𝑌
Norms
Dual norm
Let be a norm on
.
The associated dual norm, denoted
∗,
is defined as
∗
- We have the inequality
- ∗
The dual of Euclidean norm The dual of the
- norm
sup 𝑨𝑦| 𝑦 1 𝑨 sup 𝑨𝑦| 𝑦 1 𝑨
Norms
Dual Norm
The dual of
- norm is the
- norm such
that The dual of the
- norm on
is the
nuclear norm
∗
- /
Outline
Norms Analysis Functions Derivatives Linear Algebra
Analysis
Interior and Open Set
An element 𝑦 ∈ 𝐷 ⊆ 𝐒 is called an interior point of 𝐷 if there exists an 𝜗 0 for which 𝑧 𝑧 𝑦 𝜗 ⊆ 𝐷 i.e., there exists a ball centered at 𝑦 that lies entirely in 𝐷. The set of all points interior to 𝐷 is called the interior of 𝐷 and is denoted int 𝐷.
A set is open if
Analysis
Closed Set and Boundary
A set
- is closed if its complement is
- pen
The closure of a set 𝐷 is defined as cl 𝐷 𝐒 ∖ int𝐒𝐨 ∖ 𝐷 The boundary of the set 𝐷 is defined as bd 𝐷 cl 𝐷 ∖ int 𝐷
𝐷 is closed if it contains its boundary. It is
- pen if it contains no boundary points.
𝐒 ∖ 𝐷 𝑦 ∈ 𝐒|𝑦 ∉ 𝐷
Analysis
Supremum and infimum
The least upper bound or supremum
- f the set
is denoted . The greatest lower bound or infimum
- f the set
is denoted .
Outline
Norms Analysis Functions Derivatives Linear Algebra
Functions
Notation
An example
-
Functions
Continuity
A function
- is continuous at
if for all there exists a with , such that
- Closed functions
A function
- is closed if, for each
, the sublevel set is closed. This is equivalent to
Outline
Norms Analysis Functions Derivatives Linear Algebra
Derivatives
Definition
Suppose
- and
. The function is differentiable at if there exists a matrix
that satisfies
in which case we refer to as the derivative (or Jacobian) of at .
lim
∈ , , →
𝑔 𝑨 𝑔 𝑦 𝐸𝑔 𝑦 𝑨 𝑦
- 𝑨 𝑦
Derivatives
Definition
The affine function of given by is called the first-order approximation
- f
at (or near) .
Derivatives
Gradient
When 𝑔 is real-valued (i.e., 𝑔: 𝐒 → 𝐒) the derivative 𝐸𝑔𝑦 is a 1 𝑜 matrix (it is a row vector). Its transpose is called the gradient of the function: 𝛼𝑔𝑦 𝐸𝑔𝑦 which is a column vector (in 𝐒). Its components are the partial derivatives of 𝑔: 𝛼𝑔𝑦 𝜖𝑔𝑦 𝜖𝑦 , 𝑗 1, ⋯ , 𝑜 The first-order approximation of 𝑔 at a point 𝑦 ∈ int dom 𝑔 can be expressed as (the affine function
- f 𝑨)
𝑔𝑦 𝛼𝑔𝑦𝑨 𝑦
Derivatives
Examples
𝑔 𝑦 1 2 𝑦𝑄𝑦 𝑟𝑦 𝑠 𝛼𝑔 𝑦 𝑄𝑦 𝑟 𝑔 𝑌 log det 𝑌 , dom 𝑔 𝐓
- 𝛼𝑔 𝑌 𝑌
Derivatives
Chain rule
Suppose 𝑔: 𝐒 → 𝐒 is differentiable at 𝑦 ∈ int dom 𝑔 and : 𝐒 → 𝐒 is differentiable at 𝑔𝑦 ∈ int dom . Define the composition ℎ: 𝐒 → 𝐒 by ℎ𝑨 𝑔𝑨. Then ℎ is differentiable at 𝑦, with derivate Suppose 𝑔: 𝐒 → 𝐒, : 𝐒 → 𝐒, and ℎ 𝑦 𝑔 𝑦
𝐸ℎ𝑦 𝐸𝑔𝑦𝐸𝑔𝑦 𝛼ℎ 𝑦 𝑔 𝑦 𝛼𝑔𝑦
Derivatives
Composition of Affine Function
𝑦 𝑔𝐵𝑦 𝑐 𝛼 𝑦 𝐵𝛼𝑔𝐵𝑦 𝑐 𝑔: 𝐒 → 𝐒, : 𝐒 → 𝐒 𝑢 𝑔 𝑦 𝑢𝑤 , 𝑦, 𝑤 ∈ 𝐒 ′ 𝑢 𝑤𝛼𝑔 𝑦 𝑢𝑤
Example 1
Consider the function
- where
-
𝑔 𝑦 log exp 𝑏
𝑦 𝑐
- 𝑧 log exp
𝑧
- 𝛼 𝑧
1 ∑ exp 𝑧
- exp 𝑧
⋮ exp 𝑧
Example 1
Consider the function
- where
-
𝑔 𝑦 log exp 𝑏
𝑦 𝑐
- 𝛼𝑔 𝑦 𝐵𝛼 𝐵𝑦 𝑐
1 1𝑨 𝐵𝑨 𝑨 exp 𝑏
𝑦 𝑐
⋮ exp 𝑏
𝑦 𝑐
Example 2
Consider the function
where
-
Second Derivative
Definition
Suppose 𝑔: 𝐒 → 𝐒. The second derivative or Hessian matrix of 𝑔 at 𝑦 ∈ int dom 𝑔, denoted 𝛼𝑔𝑦, is given by
Second-order Approximation
𝛼𝑔𝑦 𝜖𝑔𝑦 𝜖𝑦𝜖𝑦 , 𝑗 1, ⋯ , 𝑜, 𝑘 1, ⋯ , 𝑜. 𝑔𝑦 𝛼𝑔𝑦 𝑨 𝑦 1 2 𝑨 𝑦 𝛼𝑔𝑦𝑨 𝑦
Derivatives
Examples
𝑔 𝑦 1 2 𝑦𝑄𝑦 𝑟𝑦 𝑠 𝛼𝑔 𝑦 𝑄𝑦 𝑟 𝑔 𝑌 log det 𝑌 , dom 𝑔 𝐓
- 𝛼𝑔 𝑌 𝑌
𝛼𝑔 𝑦 𝑄 𝑔 𝑌 tr 𝑌 𝑎 𝑌 1 2 tr 𝑌 𝑎 𝑌 𝑌 𝑎 𝑌
Second Derivative
Chain rule
Suppose
- ,
, and . Composition with affine function:
𝛼𝑦 𝐵𝛼𝑔𝐵𝑦 𝑐𝐵 𝛼ℎ𝑦 𝑔𝑦𝛼𝑔𝑦 𝑔𝑦𝛼𝑔𝑦𝛼𝑔𝑦 𝑦 𝑔𝐵𝑦 𝑐
Example 1
Consider the function
- where
-
𝑔 𝑦 log exp 𝑏
𝑦 𝑐
- 𝑧 log exp
𝑧
- 𝛼 𝑧
1 ∑ exp 𝑧
- exp 𝑧
⋮ exp 𝑧 𝛼 𝑧 diag𝛼𝑧 𝛼 𝑧 𝛼 𝑧
Example 1
Consider the function
- where
-
- 𝑔 𝑦 log exp
𝑏
𝑦 𝑐
Outline
Norms Analysis Functions Derivatives Linear Algebra
Linear algebra
Range and nullspace
Let 𝐵 ∈ 𝐒, the range of 𝐵, denoted ℛ𝐵, is the set of all vectors in 𝐒 that can be written as linear combinations of the columns of A: ℛ𝐵 𝐵𝑦|𝑦 ∈ 𝐒 ⊆ 𝐒 The nullspace (or kernel) of A, denoted 𝒪𝐵, is the set of all vectors 𝑦 mapped into zero by A: 𝒪𝐵 𝑦|𝐵𝑦 0 ⊆ 𝐒 if 𝒲 is a subspace of 𝐒, its orthogonal complement, denoted 𝒲, is defined as: 𝒲 𝑦|𝑨𝑦 0 for all 𝑨 ∈ 𝒲
Linear algebra
Range and nullspace
Let 𝐵 ∈ 𝐒, the range of 𝐵, denoted ℛ𝐵, is the set of all vectors in 𝐒 that can be written as linear combinations of the columns of A: ℛ𝐵 𝐵𝑦|𝑦 ∈ 𝐒 ⊆ 𝐒 The nullspace (or kernel) of A, denoted 𝒪𝐵, is the set of all vectors 𝑦 mapped into zero by A: 𝒪𝐵 𝑦|𝐵𝑦 0 ⊆ 𝐒 if 𝒲 is a subspace of 𝐒, its orthogonal complement, denoted 𝒲, is defined as: 𝒲 𝑦|𝑨𝑦 0 for all 𝑨 ∈ 𝒲 𝒪𝐵 ℛ 𝐵 𝒪𝐵 ℛ 𝐵
Linear algebra
Symmetric eigenvalue decomposition
Suppose
, i.e.,
is a real symmetric
- matrix. Then
can be factored as
- where
is orthogonal, i.e.,
satisfies
- , and
- The determinant and trace can be
expressed in terms of the eigenvalue.
Linear algebra
Norms
- ,…,
- /
Linear algebra
Positive definite Matrix
A matrix
is called positive
definite, if for all
- ,
denoted as . If is positive definite, we say is negative definite, denoted as . We use
- to denote the set of
positive definite matrices in
.
We use
- to denote the set of
positive semidefinite matrices in
.
Linear algebra
Singular value decomposition (SVD)
Suppose
with
. Then can be factored as
- where
satisfies
- satisfies
- , and
- with
- The singular value decomposition can be
written
Linear algebra
Norms
- /
Linear algebra
Pseudo-inverse
Let 𝐵 𝑉𝛵𝑊 be the singular value decomposition of 𝐵 ∈ 𝐒, with rank 𝐵 𝑠. The pseudo-inverse or Moore-Penrose inverse of 𝐵 is 𝐵 𝑊𝛵𝑉 ∈ 𝐒
Schur complement
𝐵 ∈ 𝐓, and a matrix 𝑌 ∈ 𝐓 partitioned as 𝑌 𝐵 𝐶 𝐶 𝐷 If det 𝐵 0, the matrix 𝑇 𝐷 𝐶𝐵𝐶 is called the Schur complement of 𝐵 in 𝑌.
Application of Schur
complement
PD Matrices if and only if and If , then if and only if PSD Matrices
𝑌 ≽ 0 ⟺ 𝐵 ≽ 0, 𝐽 𝐵𝐵 𝐶 0, 𝐷 𝐶𝐵𝐶 ≽ 0
Summary
Norms of vectors
- norm, -norm, -norm, -quadratic