SLIDE 1 NLA Reading Group Spring’13
by İsmail Arı
SLIDE 2
2
𝑐 is a linear combination of the columns of 𝐵
SLIDE 3
3
Let us re-write the matrix-vector multiplication “As mathematicians, we are used to viewing the formula 𝐵𝑦 = 𝑐 as a statement that 𝐵 acts on 𝑦 to produce 𝑐 The new formula, by contrast, suggests the interpretation that 𝑦 acts on 𝐵 to produce 𝑐
SLIDE 4
4
The map from vectors of coefficients of polynomials 𝑞 of degree < 𝑜 to vectors (𝑞(𝑦1), 𝑞(𝑦2), … , 𝑞(𝑦𝑛)) of sampled polynomial values is linear. The product 𝐵𝑑 gives the sampled polynomial values:
SLIDE 5
Do not see 𝐵𝑑 as 𝑛 distinct scalar summations. Instead, see 𝐵 as a matrix of columns, each giving sampled values of a monomial*, Thus, 𝐵𝑑 is a single vector summation that at once gives a linear combination of these monomials,
*In mathematics, a monomial is roughly speaking, a polynomial which has only one term.
SLIDE 6
6
each column of 𝐶 is a linear combination of the columns of 𝐵 Thus 𝑐𝑘 is a linear combinations of the columns 𝑏𝑙 with coefficients 𝑑𝑙𝑘
SLIDE 7
7
SLIDE 8
8
The matrix 𝑆 is a discrete analogue of an indefinite integral operator
SLIDE 9
9
null(𝐵) is the set of vectors that satisfy 𝐵𝑦 = 0, where 0 is the 0-vector in ℂ𝑛 range(𝐵) is the space spanned by the columns of 𝐵 The column/row rank of a matrix is the dimension of its column/row space. Column rank always equals row rank. So, we call this as rank of the matrix. A matrix 𝐵 of size m-by-n with m ≥ n has full rank iff it maps no two distinct vectors to the same vector.
SLIDE 10
10
A nonsingular or invertible matrix is a square matrix of full rank. 𝐽 is the m-by-m identity. The matrix 𝑎 is the inverse of 𝐵.
SLIDE 11
11
For an m-by-m matrix 𝐵, the following conditions are equivalent: We mention the determinant, though a convenient notion theoretically, rarely finds a useful role on numerical algorithms.
SLIDE 12
12
Do not think 𝑦 as the result of applying 𝐵−1 to 𝑐. Instead, think it as the unique vector that satisfies the equation 𝐵𝑦 = 𝑐. 𝐵−1𝑐 is the vector of coefficients of the expansion of 𝑐 in the basis of columns of 𝐵. Multiplication by 𝐵−1 is a change of basis operation.
SLIDE 13 NLA Reading Group Spring’13
by İsmail Arı
SLIDE 14
14
The complex conjugate of a scalar 𝑨, written 𝑨 or 𝑨∗, is obtained by negating its imaginary part. The hermitian conjugate or adjoint of an m-by-n matrix 𝐵, written 𝐵∗, is the n-by-m matrix whose i, j entry is the complex conjugate of the j, i entry of 𝐵. If 𝐵 = 𝐵∗, 𝐵 is hermitian. For real 𝐵, adjoint is known as transpose and shown as 𝐵𝑈. If 𝐵 = 𝐵𝑈, then 𝐵 is symmetric.
SLIDE 15
15
Euclidean length of 𝑦 The inner product is bilinear, i.e. linear in each vector separately:
SLIDE 16
16
A pair of vectors 𝑦 and 𝑧 are orthogonal if 𝑦∗𝑧 = 0. Two sets of vectors 𝑌 and 𝑍 are orthogonal if every 𝑦 ∈ 𝑌 is orthogonal to 𝑧 ∈ 𝑍. A set of nonzero vectors 𝑇 is orthogonal if its elements are pairwise orthogonal. A set of nonzero vectors 𝑇 is orthonormal if it is orthogonal, in addition, every 𝑦 ∈ 𝑇 has 𝑦 = 1.
SLIDE 17
17
The vectors in an orthogonal set 𝑇 are linearly independent. Sketch of the proof: Assume than they were not independent and propose a nonzero vector by linear combination of the members of 𝑇 Observe that its length should be larger than 0 Use the bilinearity of inner products and the orthogonality of 𝑇 to contradict the assumption If an orthogonal set 𝑇 ⊆ ℂ𝑛 contains 𝑛 vectors, then it is a basis for ℂ𝑛.
⇒
SLIDE 18
18
Inner products can be used to decompose arbitrary vectors into orthogonal components. Assume 𝑟1, 𝑟2, … , 𝑟𝑜 : an orthonormal set 𝑤: an arbitrary vector Utilizing the scalars 𝑟𝑘
∗𝑤 as coordinates in an expansion, we find that
is orthogonal to 𝑟1, 𝑟2, … , 𝑟𝑜 Thus we see that 𝑤 can be decomposed into 𝑜 + 1 orthogonal components:
SLIDE 19
19
2 1 1 2
We view 𝑤 as a sum of coefficients 𝑟𝑘
∗𝑤 times vectors 𝑟𝑗.
We view 𝑤 as a sum of orthogonal projections of 𝑤 onto the various directions of 𝑟𝑗. The 𝑗th projection operation is achieved by the very special rank-one matrix 𝑟𝑗𝑟𝑗
∗.
SLIDE 20
20
If 𝑅∗ = 𝑅−1, 𝑅 is unitary.
SLIDE 21
21
𝑅∗𝑐 is the vector of coefficients of the expansion of 𝑐 in the basis of columns of 𝐵.
SLIDE 22
22
Multiplication by a unitary matrix or its adjoint preserve geometric structure in the Euclidean sense, because inner products are preserved. The invariance of inner products means that angles between vectors are preserved, and so are their lengths: In the real case, multiplication by an orthonormal matrix 𝑅 corresponds to a rigid rotation (if det𝑅 = 1) or reflection (if det𝑅 = −1) of the vector space.
SLIDE 23 NLA Reading Group Spring’13
by İsmail Arı
SLIDE 24
24
The essential notions of size and distance in a vector space are captured by norms. In order to conform a reasonable notion of length, a norm must satisfy for all vectors 𝑦 and 𝑧 and for all scalars 𝛽 ∈ ℂ.
SLIDE 25
25
The closed unit ball 𝑦 ∈ ℂ𝑛: 𝑦 ≤ 1 corresponding to each norm is illustrated to the right for the case 𝑛 = 2.
SLIDE 26
26
Example: a weighted 2-norm Introduce the diagonal matrix 𝑋 whose 𝑗th diagonal entry is the weight 𝑥𝑗 ≠ 0. The most important norms in this book are the unweighted 2-norm and its induced matrix form.
SLIDE 27
27
An 𝑛 × 𝑜 matrix can be viewed as a vector in an 𝑛𝑜-dimensional space: each of the 𝑛𝑜 entries of the matrix is an independent coordinate. ⇒ Any 𝑛𝑜-dimensional norm can be used for measuring the “size” of such a matrix. However, certain special matrix norms are more useful than the vector norms. These are the induced matrix norms, defined in terms of the behavior of a matrix as an operator between its normed domain and range spaces.
SLIDE 28
28
Given vector norms ⋅ (𝑜) and ⋅ (𝑛) on the domain and range of 𝐵 ∈ ℂ𝑛×𝑜, respectively, the induced matrix norm 𝐵 (𝑛,𝑜) is the smallest number 𝐷 for which In other words, it is the maximum factor by which 𝐵 can stretch a vector 𝑦.
SLIDE 29
29
SLIDE 30
30
SLIDE 31
31
For any 𝑛 × 𝑜 matrix 𝐵, 𝐵 1 is equal to the maximum column sum of 𝐵. Consider 𝑦 be in By choosing 𝑦 = 𝑓
𝑘, where 𝑘 maximizes 𝑏𝑘 1, we attain:
SLIDE 32
32
For any 𝑛 × 𝑜 matrix 𝐵, 𝐵 ∞ is equal to the maximum row sum of 𝐵.
SLIDE 33
33
Let 𝑞 and 𝑟 satisfy
1 𝑞 + 1 𝑟 = 1, with 1 ≤ 𝑞, 𝑟 ≤ ∞. Then, the Hölder inequality states
that, for any vectors 𝑦 and 𝑧: The Cauchy-Schwartz inequality is a special case 𝑞 = 𝑟 = 2:
SLIDE 34
34
Consider 𝐵 = 𝑏∗ where 𝑏 is a column vector. For any 𝑦, we have: This bound is tight: observe that Therefore, we have
SLIDE 35
35
Consider 𝐵 = 𝑣𝑤∗, where 𝑣 is an 𝑛-vector and 𝑤 is an 𝑜-vector. For any 𝑜-vector 𝑦, we can bound Therefore, we have This inequality is an equality for the case 𝑦 = 𝑤.
SLIDE 36
36
Therefore, the induced norm of 𝐵𝐶 must satisfy
SLIDE 37
37
SLIDE 38
38
The most important matrix norm which is not induced by a vector norm is the Hilbert-Schmidt or Frobenius norm, defined by Observe that this s the same as the 2-norm of the matrix when viewed as an 𝑛𝑜- dimensional vector. Alternatively, we can write
SLIDE 39
39
Let 𝐷 = 𝐵𝐶, then
SLIDE 40 40
The matrix 2-norm and Frobenius norm are invariant under multiplication by unitary matrices. This fact is still valid if 𝑅 is generalized to a rectangular matrix with orthonormal
- columns. Recall transformation used in PCA.