Numerical Linear Algebra
EECS 442 – David Fouhey Fall 2019, University of Michigan
http://web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/
Numerical Linear Algebra EECS 442 David Fouhey Fall 2019, - - PowerPoint PPT Presentation
Numerical Linear Algebra EECS 442 David Fouhey Fall 2019, University of Michigan http://web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/ Administrivia HW 1 out due in two weeks Follow submission format (wrong format = 0) The
EECS 442 – David Fouhey Fall 2019, University of Michigan
http://web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/
Administrivia
harder to do but mirrors life
you think and why in your homework, and move on
guarantee checking the syllabus thoroughly will help boost your grade.
This Week – Math
Two goals for the next two classes:
may not have been taught
This Week – Goal
impossible.
any gaps
Adding Numbers
and standard deviation 𝜏 for 1 ≤ 𝑗 ≤ 𝑂
𝝂 =
𝟐 𝑶 σ𝒋=𝟐 𝑶
𝒚𝒋, distributed (qualitatively), in terms of variance?
𝜈 has mean 𝜈 and standard deviation
𝜏 𝑂 .
Free Drinks in Vegas
Each game/variable has mean $0.10, std $2 100 games is uncertain and fun! 100K games is guaranteed profit: 99.999999% lowest value is $0.064. $0.01 for drinks $0.054 for profits
Let’s Make It Big
numbers (mean: 31, standard deviation: 1)
for 200 480x640 images and want to know the average error (|predicted – actual|)
numerator = 0 for x in xs: numerator += x return numerator / len(xs)
Let’s Make It Big
mean 31 and standard deviation
1 50𝑁 ≈ 10−5
Trying it Out
Hmm. Hmm.
What’s a Number?
1 1 1 1 27 26 25 24 23 22 21 20 1 185 185 128 + 32 + 16 + 8 + 1 =
Adding Two Numbers
“Integers” on a computer are integers modulo 2k Carry Flag
Result 28 27 1 26 25 1 24 1 23 1 22 21 20 1 185 1 1 1 1 105 1 34 1 1
Some Gotchas
Why? 32 + (3 / 4) x 40 = 32 32 + (3 x 40) / 4 = 62 32 + 3 / 4 x 40 = 32 + 0 x 40 = 32 + 0 = 32 Underflow 32 + 3 x 40 / 4 = 32 + 120 / 4 = 32 + 30 = 62 No Underflow
Ok – you have to multiply before dividing
Some Gotchas
42 32 + 9 x 40 / 10 = 32 + 104 / 10 = Overflow 32 + (9 x 40) / 10 =
uint8
32 + (9 x 40) / 10 = 68
math
Why 104? 9 x 40 = 360 360 % 256 = 104
Should be: 9x4=36
32 + 10 = 42
What’s a Number?
27 1 26 25 1 24 1 23 1 22 21 20 1 185
How can we do fractions?
25 24 23 22 21 20 2-1 2-2 1 1 1 1 1 45.25 45 0.25
Fixed-Point Arithmetic
25 1 24 23 1 22 1 21 1 20 2-1 2-2 1 45.25
What’s the largest number we can represent? 63.75 – Why? How precisely can we measure at 63? How precisely can we measure at 0? 0.25 0.25 Fine for many purposes but for science, seems silly
Floating Point
1 1 1 1 1
Sign (S) Exponent (E) Fraction (F)
−𝟐𝑻 𝟑𝑭+𝒄𝒋𝒃𝒕 𝟐 + 𝑮 𝟑𝟒 1 7 1
27-7 = 20 =1 1+1/8 = 1.125
Bias: allows exponent to be negative; Note: fraction = significant = mantissa; exponents of all ones or all zeros are special numbers
Floating Point
Sign Exponent Fraction
0 0 0
0/8
0 0 1
1/8
0 1 0
2/8
1 1 0 1 1 1
…
6/8 7/8
1 0 1 1 1
7-7=0 *(-bias)*
Floating Point
Fraction
0 0 0
0/8
0 0 1
1/8
0 1 0 1 1 0 1 1 1
…
2/8 6/8 7/8 Sign Exponent
1 1 0 0 1
9-7=2 *(-bias)*
Floating Point
0 0 0 1 0 1 1 1
Sign Exponent Fraction
0 0 1
0 0 0 1 1 0 0 1 0 0 1
Gap between numbers is relative, not absolute
Revisiting Adding Numbers
Sign Exponent Fraction
1 1 0 0 1 0 0 0
1 0 1 1 0 0 0 0
1 1 0 0 1 0 0 1
Actual implementation is complex
Revisiting Adding Numbers
Sign Exponent Fraction
1 1 0 0 1 0 0 0
1 0 1 0 0 0 0 0
1 1 0 0 1 0 0 0 1 1 0 0 1 0 0 1 -22 x 1.125 = -4.5 ?
Revisiting Adding Numbers
Sign Exponent Fraction
1 1 0 0 1 0 0 0
1 0 1 0 0 0 0 0
1 1 0 0 1 0 0 0
For a and b, these can happen a + b = a a+b-a ≠ b
Revisiting Adding Numbers
S Exponent Fraction
8 bits 2127 ≈ 1038 23 bits ≈ 7 decimal digits
S
Exponent Fraction
11 bits 21023 ≈ 10308 52 bits ≈ 15 decimal digits IEEE 754 Single Precision / Single IEEE 754 Double Precision / Double
Trying it Out
a+b=a -> numerator is stuck, denominator isn’t Roundoff error occurs
Take-homes
basic equalities are almost certainly incorrect for some values
you.
functions in numpy (not necessarily others!)
Vectors
[2,3] = + 3 x [0,1] 2 x [1,0] 2 x + 3 x e1 e2 2 x + 3 x
x = [2,3] Can be arbitrary # of dimensions (typically denoted Rn)
Vectors
x = [2,3]
𝒚 = 2 3 𝒚1 = 2 𝒚2 = 3
Just an array! Get in the habit of thinking of them as columns.
Scaling Vectors
x = [2,3] 2x = [4,6]
independently
does not change direction.
Adding Vectors
y = [3,1] x+y = [5,4] x = [2,3]
Scaling and Adding
y = [3,1] 2x+y = [7,7] Can do both at the same time x = [2,3]
Measuring Length
y = [3,1] x = [2,3] Magnitude / length / (L2) norm of vector 𝒚 = 𝒚 2 =
𝑗 𝑜
𝑦𝑗
2 1/2
There are other norms; assume L2 unless told otherwise
𝒚 2 = 13 𝒛 2 = 10
Why?
Normalizing a Vector
x = [2,3] y = [3,1] 𝒚′ = 𝒚/ 𝒚 𝟑 𝒛′ = 𝒛/ 𝒛 𝟑 Diving by norm gives something on the unit sphere (all vectors with length 1)
Dot Products
𝒚′ 𝒛′ 𝒚 ⋅ 𝒛 =
𝑗=1 𝑜
𝑦𝑗𝑧𝑗 = 𝒚𝑼𝒛
𝜄
𝒚 ⋅ 𝒛 = cos 𝜄 𝒚 𝒛 What happens with normalized / unit vectors?
Dot Products
𝒇𝟐 𝒇𝟑 𝒚 ⋅ 𝒛 =
𝑗 𝑜
𝑦𝑗𝑧𝑗
𝒚 = [2,3]
What’s 𝒚 ⋅ 𝒇𝟐, 𝒚 ⋅ 𝒇𝟑? Ans: 𝒚 ⋅ 𝒇𝟐 = 1 ; 𝒚 ⋅ 𝒇𝟑 = 3
pointing in direction of y
Dot Products
What’s 𝒚 ⋅ 𝒚 ? Ans: 𝒚 ⋅ 𝒚 = σ𝑦𝑗𝑦𝑗 = 𝒚 2
2
𝒚 ⋅ 𝒛 =
𝑗 𝑜
𝑦𝑗𝑧𝑗
𝒚 = [2,3]
Special Angles
𝒚′ 𝒛′
𝜄
1 0 ⋅ 0 1 = 0 ∗ 1 + 1 ∗ 0 = 0 Perpendicular /
have dot product 0 irrespective of their magnitude 𝒚 𝒛
Special Angles
𝑦1 𝑦2 ⋅ 𝑧1 𝑧2 = 𝑦1𝑧1 + 𝑦2𝑧2 = 0 Perpendicular /
have dot product 0 irrespective of their magnitude 𝒚′ 𝒛′
𝜄
𝒚 𝒛
Orthogonal Vectors
𝒚 = [2,3]
what’s the set of vectors that are
Orthogonal Vectors
vector [0, 𝑏, 𝑐]
vector [0,0, 𝑐]
𝒚 𝒚 𝒛
Cross Product
ambiguity in sign and magnitude
given by right hand rule and (3) has magnitude given by area of parallelogram of x and y
direction or either is 0, then 𝒚 × 𝒛 = 𝟏 .
𝒚 𝒛 𝒚 × 𝒛
Image credit: Wikipedia.org
Operations You Should Know
Matrices
Horizontally concatenate n, m-dim column vectors and you get a mxn matrix A (here 2x3)
𝑩 = 𝒘1, ⋯ , 𝒘𝑜 = 𝑤11 𝑤21 𝑤31 𝑤12 𝑤22 𝑤32
(scalar) lowercase undecorated a (vector) lowercase bold or arrow A (matrix) uppercase bold
Matrices
Vertically concatenate m, n-dim row vectors and you get a mxn matrix A (here 2x3)
𝐵 = 𝒗1
𝑈
⋮ 𝒗𝑜
𝑈
= 𝑣11 𝑣12 𝑣13 𝑣21 𝑣22 𝑣23
Transpose: flip rows / columns
𝑏 𝑐 𝑑
𝑈
= 𝑏 𝑐 𝑑 (3x1)T = 1x3
Matrix-Vector Product
𝒛2𝑦1 = 𝑩2𝑦3𝒚3𝑦1 𝒛 = 𝑦1𝒘𝟐 + 𝑦2𝒘𝟑 + 𝑦3𝒘𝟒
Linear combination of columns of A
𝑧1 𝑧2 = 𝒘𝟐 𝒘𝟑 𝒘𝟒 𝑦1 𝑦2 𝑦3
Matrix-Vector Product
𝒛2𝑦1 = 𝑩2𝑦3𝒚3𝑦1 𝑧1 = 𝒗𝟐
𝑼𝒚
Dot product between rows of A and x
𝑧2 = 𝒗𝟑
𝑼𝒚
𝒗𝟐 𝒗𝟑 𝑧1 𝑧2 = 𝒚
3 3
Matrix Multiplication
− 𝒃𝟐
𝑼
− ⋮ − 𝒃𝒏
𝑼
− | | 𝒄𝟐 ⋯ 𝒄𝒒 | | 𝑩𝑪 =
Generally: Amn and Bnp yield product (AB)mp
Yes – in A, I’m referring to the rows, and in B, I’m referring to the columns
Matrix Multiplication
− 𝒃𝟐
𝑼
− ⋮ − 𝒃𝒏
𝑼
− | | 𝒄𝟐 ⋯ 𝒄𝒒 | | 𝑩𝑪 = 𝒃𝟐
𝑼𝒄𝟐
⋯ 𝒃𝟐
𝑼𝒄𝒒
⋮ ⋱ ⋮ 𝒃𝒏
𝑼 𝒄𝟐
⋯ 𝒃𝒏
𝑼 𝒄𝒒
𝑩𝑪𝑗𝑘 = 𝒃𝒋
𝑼𝒄𝒌
Generally: Amn and Bnp yield product (AB)mp
Matrix Multiplication
Operations They Don’t Teach
𝑏 + 𝑓 𝑐 + 𝑓 𝑑 + 𝑓 𝑒 + 𝑓 𝑏 𝑐 𝑑 𝑒 + 𝑓 𝑔 ℎ = 𝑏 + 𝑓 𝑐 + 𝑔 𝑑 + 𝑒 + ℎ
You Probably Saw Matrix Addition
𝑏 𝑐 𝑑 𝑒 + 𝑓 =
What is this? FYI: e is a scalar
Broadcasting
𝑏 𝑐 𝑑 𝑒 + 𝑓 = 𝑏 𝑐 𝑑 𝑒 + 𝑓 𝑓 𝑓 𝑓 = 𝑏 𝑐 𝑑 𝑒 + 𝟐2𝑦2𝑓
If you want to be pedantic and proper, you expand e by multiplying a matrix of 1s (denoted 1) Many smart matrix libraries do this automatically. This is the source of many bugs.
Broadcasting Example
𝑸 = 𝑦1 𝑧1 ⋮ ⋮ 𝑦𝑜 𝑧𝑜 𝒘 = 𝑏 𝑐
Given: a nx2 matrix P and a 2D column vector v, Want: nx2 difference matrix D
𝑬 = 𝑦1 − 𝑏 𝑧1 − 𝑐 ⋮ ⋮ 𝑦𝑜 − 𝑏 𝑧𝑜 − 𝑐 𝑸 − 𝒘𝑈 = 𝑦1 𝑧1 ⋮ ⋮ 𝑦𝑜 𝑧𝑜 − 𝑏 𝑐 𝑏 𝑐 ⋮
Blue stuff is assumed / broadcast
Two Uses for Matrices
maps)
convolution (which we’ll cover next)
a math linear algebra class
another space (Ax)
Images as Matrices
Suppose someone hands you this matrix. What’s wrong with it?
Contrast – Gamma curve
Typical way to change the contrast is to apply a nonlinear correction pixelvalue𝛿 The quantity 𝛿 controls how much contrast gets added
Contrast – Gamma curve
10% 50% 90% Now the darkest regions (10th pctile) are much darker than the moderately dark regions (50th pctile). new 10% new 50% new 90%
Implementation
imNew = im**0.25 Python+Numpy (right way): Python+Numpy (slow way – why? ): imNew = np.zeros(im.shape) for y in range(im.shape[0]): for x in range(im.shape[1]): imNew[y,x] = im[y,x]**expFactor
Results
Phew! Much Better.
Element-wise Operations
𝑩 ⊙ 𝑪 𝑗𝑘 = 𝑩𝑗𝑘 ∗ 𝑪𝑗𝑘
“Hadamard Product” / Element-wise multiplication
𝑩/𝑪 𝑗𝑘 = 𝐵𝑗𝑘 𝐶𝑗𝑘
Element-wise division
𝑩𝑞 𝑗𝑘 = 𝐵𝑗𝑘
𝑞
Element-wise power – beware notation
Sums Across Axes
𝑩 = 𝑦1 𝑧1 ⋮ ⋮ 𝑦𝑜 𝑧𝑜
Suppose have Nx2 matrix A
Σ(𝑩, 1) = 𝑦1 + 𝑧1 ⋮ 𝑦𝑜 + 𝑧𝑜
ND col. vec.
Σ(𝑩, 0) =
𝑗=1 𝑜
𝑦𝑗 ,
𝑗=1 𝑜
𝑧𝑗
2D row vec
Note – libraries distinguish between N-D column vector and Nx1 matrix.
Vectorizing Example
dimensional vector
between {x1, …, xN} and {y1, …, yM} so I can find, for every xi the nearest yj
𝒚 2 + 𝒛 2 − 2𝒚𝑈𝒛
= 𝒚 2 + 𝒛 2 − 2𝒚𝑈𝒛 1/2
Vectorizing Example
𝒀 = − 𝒚1 − ⋮ − 𝒚𝑂 − 𝒁 = − 𝒛1 − ⋮ − 𝒛𝑁 − 𝒀𝒁𝑼
𝑗𝑘 = 𝒚𝒋 𝑼𝒛𝒌
𝒁𝑼 = | | 𝒛1 ⋯ 𝒛𝑁 | | 𝚻 𝒀𝟑, 𝟐 = 𝒚𝟐 𝟑 ⋮ 𝒚𝑶 𝟑 Compute a Nx1 vector of norms (can also do Mx1) Compute a NxM matrix of dot products
Vectorizing Example
𝐄 = Σ 𝒀𝟑, 1 + Σ 𝒁𝟑, 1
𝑼 − 2𝒀𝒁𝑼 1/2
𝒚𝟐 𝟑 ⋮ 𝒚𝑶 𝟑 + 𝒛1 𝟑 ⋯ 𝒛𝑁
𝟑
𝒚𝟐 2 + 𝒛𝟐 2 ⋯ 𝒚𝟐 2 + 𝒛𝑵 2 ⋮ ⋱ ⋮ 𝒚𝑶 2 + 𝒛𝟐 2 ⋯ 𝒚𝑶 2 + 𝒛𝑵 2 Σ 𝑌2, 1 + Σ 𝑍2, 1 𝑈 𝑗𝑘 = 𝑦𝑗
2 + 𝑧𝑘 2
Why?
Vectorizing Example
𝐄𝑗𝑘 = 𝒚𝒋 2 + 𝒛𝒌
2 + 2𝒚𝑼𝒛
Numpy code: XNorm = np.sum(X**2,axis=1,keepdims=True) YNorm = np.sum(Y**2,axis=1,keepdims=True) D = (XNorm+YNorm.T-2*np.dot(X,Y.T))**0.5
𝐄 = Σ 𝒀𝟑, 1 + Σ 𝒁𝟑, 1
𝑼 − 2𝒀𝒁𝑼 1/2
*May have to make sure this is at least 0 (sometimes roundoff issues happen)
Does it Make a Difference?
Computing pairwise distances between 300 and 400 128-dimensional vectors
distance: 0.8s
175x faster than 2) Expressing things in primitives that are
Linear Independence
𝒛 = −2 1 = 1 2 𝒃 − 1 3 𝒄 𝒚 = 4 = 2𝒃
𝒃 = 2 𝒄 = 6 𝒅 = 5
Suppose:
A set of vectors is linearly independent if you can’t write one as a linear combination of the others.
Span
Span: all linear combinations of a set of vectors Span({ }) = Span({[0,1]}) = ? All vertical lines through origin = 𝜇 0,1 : 𝜇 ∈ 𝑆 Is blue in {red}’s span?
Span
Span: all linear combinations of a set of vectors Span({ , }) = ?
Span
Span: all linear combinations of a set of vectors Span({ , }) = ?
Matrix-Vector Product
𝑩𝒚 = | | 𝒅𝟐 ⋯ 𝒅𝒐 | | 𝒚
Right-multiplying A by x mixes columns of A according to entries of x
be the span of the columns of A.
your columns
An Intuition
x Ax
y1 y2 y3
x1 x2 x3
y
𝒛 = 𝑩𝒚 = | | | 𝒅𝟐 𝒅𝟑 𝒅𝒐 | | | 𝑦1 𝑦2 𝑦3
x – knobs on machine (e.g., fuel, brakes) y – state of the world (e.g., where you are) A – machine (e.g., your car)
Linear Independence
𝒛 = 𝑩𝒚 = | | | 𝒅𝟐 𝛽𝒅𝟐 𝒅𝟑 | | | 𝑦1 𝑦2 𝑦3
Suppose the columns of 3x3 matrix A are not linearly independent (c1, αc1, c2 for instance)
𝒛 = 𝑦1𝒅𝟐 + 𝛽𝑦2𝒅𝟐 + 𝑦3𝒅𝟑 𝒛 = 𝑦1 + 𝛽𝑦2 𝒅𝟐 + 𝑦3𝒅𝟑
Linear Independence Intuition
Knobs of x are redundant. Even if y has 3
𝒛 = 𝑦1 + 𝛽𝑦2 𝒅𝟐 + 𝑦3𝒅𝟑
x Ax
y1 y2 y3
x1 x2 x3
y
Linear Independence
𝑩𝒚 = 𝑦1 + 𝛽𝑦2 𝒅𝟐 + 𝑦3𝒅𝟑
vector x s.t. y =Ax
𝒛 = 𝑩 𝑦1 + 𝛾 𝑦2 − 𝛾/𝛽 𝑦3
adding 𝛾 to x1 and subtracting
𝛾 𝛽 from x2
Recall:
= 𝑦1 + 𝛾 + 𝛽𝑦2 − 𝛽 𝛾 𝛽 𝑑1 + 𝑦3𝑑2
Linear Independence
𝑩𝒚 = 𝑦1 + 𝛽𝑦2 𝒅𝟐 + 𝑦3𝒅𝟑
map to a zero-vector y
𝒛 = 𝑩 𝛾 −𝛾/𝛽 = 𝛾 − 𝛽 𝛾 𝛽 𝒅𝟐 + 0𝒅𝟑
Rank
independent columns (or rows) of A / the dimension of the span of the columns
nicely: can be inverted, span the full output space, are one-to-one.
every knob is useful and every output state can be made by the machine
Inverses
columns of A proportional to x. If A is full-rank, we should be able to invert this mapping.
produced it?
ever compute it. Solving for x is much faster and stable than obtaining A-1.
Symmetric Matrices
𝑩𝑗𝑘 = 𝑩𝑘𝑗
properties
𝑏11 𝑏12 𝑏13 𝑏21 𝑏22 𝑏23 𝑏31 𝑏32 𝑏33
Any matrix of the form 𝑩 = 𝒀𝑼𝒀 is symmetric. Quick check: 𝑩𝑼 = 𝒀𝑼𝒀
𝑼
𝑩𝑼 = 𝒀𝑼 𝒀𝑼 𝑼 𝑩𝑼 = 𝒀𝑼𝒀
Special Matrices – Rotations
𝑠
11
𝑠
12
𝑠
13
𝑠
21
𝑠
22
𝑠
23
𝑠
31
𝑠
32
𝑠
33
change vector L2 norms ( 𝑺𝒚 2 = 𝒚 2)
flip/reflection), eigenvalues are 1
Eigensystems
matrix 𝑩 satisfy 𝑩𝒘𝒋 = 𝜇𝑗𝒘𝒋 (𝑩𝒘𝒋 is scaled by 𝜇𝑗)
typically you assume 𝒘𝒋 2 = 1
much 𝑔 𝒚 = 𝑩𝒚 stretches a vector x.
Suppose I have points in a grid
Now I apply f(x) = Ax to these points Pointy-end: Ax . Non-Pointy-End: x
Red box – unit square, Blue box – after f(x) = Ax. What are the yellow lines and why?
𝑩 =
1.1 1.1
𝑩 =
0.8 1.25 Now I apply f(x) = Ax to these points Pointy-end: Ax . Non-Pointy-End: x
Red box – unit square, Blue box – after f(x) = Ax. What are the yellow lines and why?
𝑩 =
0.8 1.25
Red box – unit square, Blue box – after f(x) = Ax. Can we draw any yellow lines?
𝑩 =
cos(𝑢) −sin(𝑢) sin(𝑢) cos(𝑢)
Eigenvectors of Symmetric Matrices
with n (not necessarily) distinct eigenvalues
largest eigenvalue maximizes
𝒚𝑼𝑩𝒚 𝒚𝑼𝒚
(smallest/min)
eigenvector maximizes 𝒚𝑼𝑩𝒚
problems rely on (max/min)imizing this
The Singular Value Decomposition
U A =
Rotation Can always write a mxn matrix A as: 𝑩 = 𝑽𝚻𝑾𝑼 Eigenvectors
∑
Scale Sqrt of Eigenvalues
σ1 σ2 σ3
The Singular Value Decomposition
U ∑ A =
Rotation Scale
VT
Rotation Can always write a mxn matrix A as: 𝑩 = 𝑽𝚻𝑾𝑼 Eigenvectors
Sqrt of Eigenvalues
Eigenvectors
Singular Value Decomposition
number of linearly independent vectors
U A
σ1 σ2 σ3
VT
Singular Value Decomposition
number of linearly independent vectors
U Â
σ1 σ2
VT
Singular Value Decomposition
number of linearly independent vectors
with matrices
Solving Least-Squares
Start with two points (xi,yi) 𝑧1 𝑧2 = 𝑦1 1 𝑦2 1 𝑛 𝑐 𝒛 = 𝑩𝒘 𝑧1 𝑧2 = 𝑛𝑦1 + 𝑐 𝑛𝑦2 + 𝑐 We know how to solve this – invert A and find v (i.e., (m,b) that fits points)
(x1,y1) (x2,y2)
Solving Least-Squares
Start with two points (xi,yi) 𝑧1 𝑧2 = 𝑦1 1 𝑦2 1 𝑛 𝑐 𝒛 = 𝑩𝒘 𝑧1 𝑧2 − 𝑛𝑦1 + 𝑐 𝑛𝑦2 + 𝑐
2
𝒛 − 𝑩𝒘 2 = = 𝑧1 − 𝑛𝑦1 + 𝑐
2 + 𝑧2 − 𝑛𝑦2 + 𝑐 2
(x1,y1) (x2,y2)
The sum of squared differences between the actual value of y and what the model says y should be.
Solving Least-Squares
Suppose there are n > 2 points 𝑧1 ⋮ 𝑧𝑂 = 𝑦1 1 ⋮ ⋮ 𝑦𝑂 1 𝑛 𝑐 𝒛 = 𝑩𝒘 Compute 𝑧 − 𝐵𝑦 2 again 𝒛 − 𝑩𝒘 2 =
𝑗=1 𝑜
𝑧𝑗 − (𝑛𝑦𝑗 + 𝑐) 2
Solving Least-Squares
Given y, A, and v with y = Av overdetermined (A tall / more equations than unknowns) We want to minimize 𝒛 − 𝑩𝒘 𝟑, or find:
arg min𝒘 𝒛 − 𝑩𝒘 2
(The value of x that makes the expression smallest)
Solution satisfies 𝑩𝑼𝑩 𝒘∗ = 𝑩𝑼𝒛
𝒘∗ = 𝑩𝑼𝑩
−1𝑩𝑼𝒛
(Don’t actually compute the inverse!)
When is Least-Squares Possible?
Given y, A, and v. Want y = Av A y = v
Want n outputs, have n knobs to fiddle with, every knob is useful if A is full rank.
A y = v
A: rows (outputs) > columns (knobs). Thus can’t get precise
knobs). So settle for “closest” knob setting.
When is Least-Squares Possible?
Given y, A, and v. Want y = Av A y = v
Want n outputs, have n knobs to fiddle with, every knob is useful if A is full rank.
A y = v
A: columns (knobs) > rows (outputs). Thus, any output can be expressed in infinite ways.
Homogeneous Least-Squares
Given a set of unit vectors (aka directions) 𝒚𝟐, … , 𝒚𝒐 and I want vector 𝒘 that is as orthogonal to all the 𝒚𝒋 as possible (for some definition of orthogonal)
𝑩𝒘 = − 𝒚𝟐
𝑼
− ⋮ − 𝒚𝒐
𝑼
− 𝒘
Stack 𝒚𝒋 into A, compute Av
= 𝒚𝟐
𝑼𝒘
⋮ 𝒚𝒐
𝑼𝒘
𝒚𝟐 𝒚𝟑 𝒚𝒐… 𝒘 𝑩𝒘 𝟑 =
𝒋 𝒐
𝒚𝒋
𝑼𝒘 𝟑
Compute
0 if
Sum of how orthog. v is to each x
Homogeneous Least-Squares
the v that minimizes 𝑩𝒘 2 .
𝒘
𝑩𝒘 2
2
Homogeneous Least-Squares
Let’s look at 𝑩𝒘 2
2
𝑩𝒘 2
2 =
Rewrite as dot product
𝑩𝒘 2
2 = 𝒘𝑼𝑩𝑼𝐁𝐰 = 𝐰𝐔 𝐁𝐔𝐁 𝐰
𝑩𝒘 2
2 = 𝐁𝐰 T(𝐁𝐰)
Distribute transpose We want the vector minimizing this quadratic form Where have we seen this?
Homogeneous Least-Squares
arg min
𝒘 2=1 𝑩𝒘 2
*Note: 𝑩𝑼𝑩 is positive semi-definite so it has all non-negative eigenvalues
(1) “Smallest”* eigenvector of 𝑩𝑼𝑩 (2) “Smallest” right singular vector of 𝑩 Ubiquitious tool in vision: For min → max, switch smallest → largest
Derivatives
Remember derivatives? Derivative: rate at which a function f(x) changes at a point as well as the direction that increases the function
Given quadratic function f(x) 𝑔 𝑦 is function 𝑦 = 𝑔′ 𝑦 aka 𝑦 = 𝑒 𝑒𝑦 𝑔(𝑦)
𝑔 𝑦, 𝑧 = 𝑦 − 2 2 + 5
Given quadratic function f(x)
What’s special about x=2? 𝑔 𝑦 minim. at 2 𝑦 = 0 at 2 a = minimum of f → 𝑏 = 0 Reverse is not true 𝑔 𝑦, 𝑧 = 𝑦 − 2 2 + 5
Rates of change
Suppose I want to increase f(x) by changing x: Blue area: move left Red area: move right Derivative tells you direction of ascent and rate 𝑔 𝑦, 𝑧 = 𝑦 − 2 2 + 5
What Calculus Should I Know
algebra system / use a cookbook
multivariable calculus)
Partial Derivatives
𝑔
2 𝑦, 𝑧 = 𝑦 − 2 2 + 5 + 𝑧 + 1 2
𝑔 𝑦 = 𝑦 − 2 2 + 5 𝜖 𝜖𝑦 𝑔 𝑦 = 2 𝑦 − 2 ∗ 1 = 2(𝑦 − 2) 𝜖 𝜖𝑦 𝑔
2 𝑦 = 2(𝑦 − 2)
Pretend it’s constant → derivative = 0
Zooming Out
𝑔
2 𝑦, 𝑧 = 𝑦 − 2 2 + 5 + 𝑧 + 1 2
Dark = f(x,y) low Bright = f(x,y) high
Taking a slice of
𝑔
2 𝑦, 𝑧 = 𝑦 − 2 2 + 5 + 𝑧 + 1 2
Slice of y=0 is the function from before: 𝑔 𝑦 = 𝑦 − 2 2 + 5 𝑔′ 𝑦 = 2(𝑦 − 2)
Taking a slice of
𝑔
2 𝑦, 𝑧 = 𝑦 − 2 2 + 5 + 𝑧 + 1 2 𝜖 𝜖𝑦 𝑔 2 𝑦, 𝑧 is rate of
change & direction in x dimension
Zooming Out
𝑔
2 𝑦, 𝑧 = 𝑦 − 2 2 + 5 + 𝑧 + 1 2 𝜖 𝜖𝑧 𝑔 2 𝑦, 𝑧 is
2(𝑧 + 1) and is the rate of change & direction in y dimension
Zooming Out
𝑔
2 𝑦, 𝑧 = 𝑦 − 2 2 + 5 + 𝑧 + 1 2
Gradient/Jacobian: Making a vector of ∇𝑔= 𝜖𝑔 𝜖𝑦 , 𝜖𝑔 𝜖𝑧 gives rate and direction of change. Arrows point OUT of minimum / basin.
What Should I Know?
dimension: if 𝒚 in 𝑔(𝒚) has n dimensions, ∇𝑔(𝑦) has n dimensions
the rate of ascent
dimensional spaces