Quiz 1. Write a procedure check least squares ( A , x , b ) with the - - PowerPoint PPT Presentation

quiz
SMART_READER_LITE
LIVE PREVIEW

Quiz 1. Write a procedure check least squares ( A , x , b ) with the - - PowerPoint PPT Presentation

Quiz 1. Write a procedure check least squares ( A , x , b ) with the following spec: I input: Mat A , Vec u , Vec b I output: True if u is the solution to the least-squares problem A x b , i.e. if u minimizes k b A u k 2 . Assume that the


slide-1
SLIDE 1

Quiz

  • 1. Write a procedure check least squares(A, ˆ

x, b) with the following spec:

I input: Mat A, Vec u, Vec b I output: True if u is the solution to the least-squares problem Ax ⇡ b, i.e. if u minimizes

kb Auk2.

Assume that the vectors are legal, i.e. the domain of u equals the column label set of A and the domain of b equals the row label set of A. Also assume that there is no floating-point error, i.e. that all calculations are precisely correct. Do not assume that the columns of A are linearly independent. Your procedure should not explicitly use any other

  • procedures. (Of course, it can use the usual operations on matrices and vectors.)
  • 2. Suppose U and V are subspaces of W. What does it mean to say that V is the orthogonal

complement of U in W? Give the definition.

slide-2
SLIDE 2

The Singular Value Decomposition

[11] The Singular Value Decomposition

slide-3
SLIDE 3

The Singular Value Decomposition

Gene Golub’s license plate, photographed by Professor P. M. Kroonenberg of Leiden University.

slide-4
SLIDE 4

Frobenius norm for matrices

We have defined a norm for vectors over R: k[x1, x2, . . . , xn]k = q x2

1 + x2 2 + · · · + x2 n

Now we define a norm for matrices: interpret the matrix as a vector. kAkF = p sum of squares of elements of A called the Frobenius norm of a matrix. Squared norm is just sum of squares of the elements. Example:

  •  1

2 3 4 5 6

  • 2

F

= 12 + 22 + 32 + 42 + 52 + 62 Can group in terms of rows ....

  • r columns
  •  1

2 3 4 5 6

  • 2

F

= (12 + 22 + 32) + (42 + 52 + 62) = k[1, 2, 3]k2 + k[4, 5, 6]k2

  •  1

2 3 4 5 6

  • 2

F

= (12 + 42) + (22 + 52) + (32 + 62) = k[1, 4]k2 + k[2, 5]k2 + k[3, 6]k2

slide-5
SLIDE 5

Frobenius norm for matrices

Example:

  •  1

2 3 4 5 6

  • 2

F

= 12 + 22 + 32 + 42 + 52 + 62 Can group in terms of rows ....

  • r columns
  •  1

2 3 4 5 6

  • 2

F

= (12 + 22 + 32) + (42 + 52 + 62) = k[1, 2, 3]k2 + k[4, 5, 6]k2

  •  1

2 3 4 5 6

  • 2

F

= (12 + 42) + (22 + 52) + (32 + 62) = k[1, 4]k2 + k[2, 5]k2 + k[3, 6]k2 Proposition: Squared Frobenius norm of a matrix is the sum of the squared norms of its rows ...

  • 2

6 4

a1

. . .

am

3 7 5

  • 2

F

= ka1k2 + · · · + kamk2

slide-6
SLIDE 6

Frobenius norm for matrices

Example:

  •  1

2 3 4 5 6

  • 2

F

= 12 + 22 + 32 + 42 + 52 + 62 Can group in terms of rows ....

  • r columns
  •  1

2 3 4 5 6

  • 2

F

= (12 + 22 + 32) + (42 + 52 + 62) = k[1, 2, 3]k2 + k[4, 5, 6]k2

  •  1

2 3 4 5 6

  • 2

F

= (12 + 42) + (22 + 52) + (32 + 62) = k[1, 4]k2 + k[2, 5]k2 + k[3, 6]k2 Proposition: Squared Frobenius norm of a matrix is the sum of the squared norms of its rows ... or of its columns.

  • 2

6 6 6 6 4

v1

· · ·

vn

3 7 7 7 7 5

  • 2

F

= kv1k2 + · · · + kvnk2

slide-7
SLIDE 7

Low-rank matrices

Saving space and saving time 2 4 u 3 5 ⇥

vT

⇤ @ 2 4 u 3 5 ⇥

vT

⇤ 1 A 2 4 w 3 5 = 2 4 u 3 5 @⇥

vT

⇤ 2 4 w 3 5 1 A 2 4 u1

u2

3 5 

vT

1

vT

2

slide-8
SLIDE 8

Silly compression

Represent a grayscale m ⇥ n image by an m ⇥ n matrix A. (Requires mn numbers to represent.) Find a low-rank matrix ˜ A that is as close as possible to A. (For rank r, requires only r(m + n) numbers to represent.) Original image (625 ⇥ 1024, so about 625k numbers)

slide-9
SLIDE 9

Silly compression

Represent a grayscale m ⇥ n image by an m ⇥ n matrix A. (Requires mn numbers to represent.) Find a low-rank matrix ˜ A that is as close as possible to A. (For rank r, requires only r(m + n) numbers to represent.) Rank-50 approximation (so about 82k numbers)

slide-10
SLIDE 10

The trolley-line-location problem

Given the locations of m houses a1, . . . , am, we must choose where to run a trolley line. The trolley line must go through downtown (origin) and must be a straight line. The goal is to locate the trolley line so that it is as close as possible to the m houses.

a1 a3 a4 a2

Specify line by unit-norm vector v: line is Span {v}. In measuring objective, how to combine individual objectives? As in least squares, we minimize the 2-norm of the vector [d1, . . . , dm] of distances. Equivalent to minimizing the square of the 2-norm of this vector, i.e. d2

1 + · · · + d2 m.

slide-11
SLIDE 11

The trolley-line-location problem

Given the locations of m houses a1, . . . , am, we must choose where to run a trolley line. The trolley line must go through downtown (origin) and must be a straight line. The goal is to locate the trolley line so that it is as close as possible to the m houses.

a1 a3 a4 a2 v

Specify line by unit-norm vector v: line is Span {v}. In measuring objective, how to combine individual objectives? As in least squares, we minimize the 2-norm of the vector [d1, . . . , dm] of distances. Equivalent to minimizing the square of the 2-norm of this vector, i.e. d2

1 + · · · + d2 m.

slide-12
SLIDE 12

The trolley-line-location problem

Given the locations of m houses a1, . . . , am, we must choose where to run a trolley line. The trolley line must go through downtown (origin) and must be a straight line. The goal is to locate the trolley line so that it is as close as possible to the m houses.

distance to a1 distance to a3 distance to a4 distance to a2

Specify line by unit-norm vector v: line is Span {v}. In measuring objective, how to combine individual objectives? As in least squares, we minimize the 2-norm of the vector [d1, . . . , dm] of distances. Equivalent to minimizing the square of the 2-norm of this vector, i.e. d2

1 + · · · + d2 m.

slide-13
SLIDE 13

Solution to the trolley-line-location problem

For each vector ai, write ai = akv

i

+ a?v

i

where akv

i

is the projection of ai along v and a?v

i

is the projection orthogonal to v.

a?v

1

= a1 akv

1

. . .

a?v

m

= am akv

m

By the Pythagorean Theorem, ka?v

1

k2 = ka1k2

  • kakv

1 k2

. . . ka?v

m k2

= kamk2

  • kakv

m k2

Since the distance from ai to Span {v} is ka?v

i

k, we have (dist from a1 to Span {v})2 = ka1k2

  • kakv

1 k2

. . . (dist from am to Span {v})2 = kamk2

  • kakv

m k2

slide-14
SLIDE 14

Solution to the trolley-line-location problem

a?v

1

= a1 akv

1

. . .

a?v

m

= am akv

m

By the Pythagorean Theorem, ka?v

1

k2 = ka1k2

  • kakv

1 k2

. . . ka?v

m k2

= kamk2

  • kakv

m k2

Since the distance from ai to Span {v} is ka?v

i

k, we have (dist from a1 to Span {v})2 = ka1k2

  • kakv

1 k2

. . . (dist from am to Span {v})2 = kamk2

  • kakv

m k2

P

i(dist from ai to Span {v})2

= ka1k2 + · · · + kamk2

  • kakv

1 k2 + · · · + kakv m k2

= kAk2

F

  • ha1, vi2 + · · · + ham, vi2

using a||v

i

= hai, vi v and hence ka||v

i

k2 = hai, vi2 kvk2 = hai, vi2

slide-15
SLIDE 15

Solution to the trolley-line-location problem, continued

By dot-product interpretation of matrix-vector multiplication, 2 6 4

a1

. . .

am

3 7 5 2 6 6 6 6 4

v

3 7 7 7 7 5 = 2 6 4 ha1, vi . . . ham, vi 3 7 5 (1) so kAvk2 = ⇣ ha1, vi2 + ha2, vi2 + · · · + ham, vi2⌘ We get P

i(distance from ai to Span {v})2

= ||A||2

F

  • kAvk2

Therefore best vector v is a unit vector that maximizes ||Av||2 (equiv., maximizes ||Av||).

slide-16
SLIDE 16

Solution to the trolley-line-location problem, continued

P

i(dist from ai to Span {v})2 = kAk2 F

  • ha1, vi2 + · · · + ham, vi2

By dot-product interpretation of matrix-vector multiplication, 2 6 4

a1

. . .

am

3 7 5 2 6 6 6 6 4

v

3 7 7 7 7 5 = 2 6 4 ha1, vi . . . ham, vi 3 7 5 (1) so kAvk2 = ⇣ ha1, vi2 + ha2, vi2 + · · · + ham, vi2⌘ We get P

i(distance from ai to Span {v})2

= ||A||2

F

  • kAvk2

Therefore best vector v is a unit vector that maximizes ||Av||2 (equiv., maximizes ||Av||).

slide-17
SLIDE 17

Solution to the trolley-line-location problem, continued

P

i(distance from ai to Span {v})2

= ||A||2

F

  • kAvk2

Therefore best vector v is a unit vector that maximizes ||Av||2 (equiv., maximizes ||Av||). def trolley line location(A):

v1 = arg max{||Av|| : ||v|| = 1}

σ1 = ||Av1|| return v1 So far, this is a solution only in principle since we have not specified how to actually compute v1. Definition: σ1 is first singular value of A, and v1 is first right singular vector.

slide-18
SLIDE 18

Trolley-line-location problem, example

Example: Let A =  1 4 5 2

  • , so a1 = [1, 4] and a2 = [5, 2].

A unit vector maximizing ||Av|| is v1 ⇡  .78 .63

  • .
  • 1

1 2 3 4 5 6 1 2 3 4 5 6

a1=[1,4] a2=[5,2] v1=[.777, .629]

σ1 = ||Av1||, which is about 6.1

slide-19
SLIDE 19

Theorem

def trolley line location(A):

v1 = arg max{||Av|| : ||v|| = 1}

σ1 = ||Av1|| return v1 Definition: σ1 is first singular value of A.

v1 is first right singular vector.

Theorem: Let A be an m ⇥ n matrix over R with rows a1, . . . , am. Let v1 be the first right singular vector of A. Then Span {v1} is the one-dimensional vector space V that minimizes (distance from a1 to V)2 + · · · + (distance from am to V)2 How close is the closest vector space to the rows of A? Lemma: The minimum sum of squared distances is ||A||2

F σ2 1.

Proof: The distance is P

i ||ai||2 P i ||akv i ||2.

The first sum is ||A||2

F.

The second sum is square of ||Av1||, i.e. square of σ1. QED

slide-20
SLIDE 20

Example, continued

Let A =  1 4 5 2

  • ) a1 = [1, 4], a2 = [5, 2]. Solution: v1 ⇡

 .78 .63

  • . Sum of squared

distances? Projection of a1 orthogonal to v1:

a1 ha1, v1i v1

⇡ [1, 4] (1 · .78 + 4 · .63)[.78, .63] ⇡ [1, 4] 3.3 [.78, .63] ⇡ [1.6, 1.9] Norm, about 2.5, is distance from a1 to Span {v1}. Projection of a2 orthogonal to v1:

a2 ha1, v1i v1

⇡ [5, 2] (5 · .78 + 2 · .63)[.78, .63] ⇡ [5, 2] 5.1 [.78, .63] ⇡ [1, 1.2] Norm, about 1.6, is distance from a2 to Span {v1}. Thus the sum of squared distances is about 2.52 + 1.62, which is about 8.7. Lemma says sum of squared distances should be ||A||2

F σ2 1 ⇡ (12 + 42 + 52 + 22) 6.12 ⇡ 46 6.12 ⇡ 8.7. X

slide-21
SLIDE 21

Visualization of data in one dimension

Projections of high-dimensional data points a1, . . . , am onto line: visualization technique.

slide-22
SLIDE 22

Visualization of data in one dimension

Projections of high-dimensional data points a1, . . . , am onto line: visualization technique. Each datapoint ai is represented by a single number: σi = hai, v1i What do we know about these numbers?

v1 is chosen among norm-1 vectors to maximize the sum of

squares of these numbers. That is, we are choosing a line through the origin so as to maximally spread out those numbers.

a1 a3 a4 a2

σ1 σ2 σ3 σ4

slide-23
SLIDE 23

Application to voting data

Let a1, . . . , a100 be the voting records for US Senators. Same as you used in politics lab. These are 46-vectors with ±1 entries. Find the unit-norm vector v that minimizes least-squares distance from a1, . . . , a100 to Span {v}. Look at projection along v of each of these vectors. Not so meaningful: Snowe 0.106605199 moderate Republican from Maine Lincoln 0.106694552 moderate Republican from Rhode Island Collins 0.107039376 moderate Republican from Maine Crapo 0.107259689 not so moderate Republican from Idaho Vitter 0.108031374 not so moderate Republican from Louisiana We’ll have to come back to this data.

slide-24
SLIDE 24

Best rank-one approximation to a matrix

A rank-one matrix is a matrix whose row space is

  • ne-dimensional.

All rows must lie in Span {v} for some vector v. That is, every row is a scalar multiple of v.

  • uter product

2 4 u 3 5 ⇥

v

⇤ Goal: Given matrix A, find the rank-one matrix ˜ A that minimizes kA ˜ AkF. ˜ A = 2 6 4 vector in Span {v} closest to a1 . . . vector in Span {v} closest to am 3 7 5 How close is ˜ A to A? ||A ˜ A||2

F

= X

i

||row i of A ˜ A||2 = X

i

||distance from ai to Span {v}||2 To minimize the sum of squares of distances, choose v to be first right singular vector. Sum of squared distances is ||A||2

F σ2 1.

˜ A= closest rank-one matrix.

slide-25
SLIDE 25

An expression for the best rank-one approximation

Using the formula akv1

i

= hai, v1i v1, we obtain ˜ A = 2 6 4 ha1, v1i vT

1

. . . ham, v1i vT

1

3 7 5 Using the linear-combinations interpretation of vector-matrix multiplication, we can write this as an outer product of two vectors: ˜ A = 2 6 4 ha1, v1i . . . ham, v1i 3 7 5 ⇥

vT

1

⇤ The first vector in the outer product can be written as Av1. We obtain ˜ A = 2 6 6 6 6 4 Av1 3 7 7 7 7 5 ⇥

vT

1

⇤ Remember σ1 = kAv1k. Define u1 to be the norm-one vector such that σ1 u1 = Av1. Then ˜ A = σ1 2 6 6 6 6 4

u1

3 7 7 7 7 5 ⇥

vT

1