SLIDE 13 Training for F-dim, with-bias LR
14
Mike Hughes - Tufts COMP 135 - Spring 2019
Training objective: minimize squared error (“least squares” estimation) Formula for parameters that minimize the objective: When can you use this formula? When you observe at least F+1 examples that are linearly independent Otherwise, infinitely many w, b will yield lowest possible training error
How to derive the formula (see notes):
- 1. Compute gradient of objective wrt each entry of w, and wrt scalar b (F+1 total expressions)
- 2. Set all gradients equal to zero and solve for w and b (F+1 equations, F+1 unknowns)
min
w∈RF ,b∈R N
X
n=1
(yn − ˆ y(xn, w, b))2
<latexit sha1_base64="7urT2WdKhfskLafpXljP3LhsVnU=">ACT3icbZHBaxQxFMYzq9V2W3XVo5dHl8IW6jKzCnoRioJ4klrctrDZHTLZzG5okhmSN9ZhmP6FXvTmv+HFg6WYWegWx8EfnzfeyTvS5Ir6TAMvwedW7c37tzd3Opu79y7/6D38NGJywrLxZhnKrNnCXNCSPGKFGJs9wKphMlTpPzN41/+klYJzPzEctcTDVbGJlKztBLcS+lWpq4ugAqDVDNcJk1XE9e3sAyZpWw+UlUFfouDKvonr2HqgSKQ6gjA08BbpkWJX14HNsDuDCj+8DtXKxP3ZqBv3+uEwXBXchKiFPmnrKO59o/OMF1oY5Io5N4nCHKcVsyi5EnWXFk7kjJ+zhZh4NEwLN61WedSw5U5pJn1xyCs1L8nKqadK3XiO5vt3LrXiP/zJgWmL6eVNHmBwvA/F6WFAsygCRfm0gqOqvTAuJX+rcCXzDKO/guaEKL1lW/CyWgYPRuOPjzvH75u49gkT8guGZCIvCH5B05ImPCyRfyg/wiV8HX4Gdw3WlbO0ELj8k/1dn6Dc97sdc=</latexit>
[w1 . . . wF b]T = ( ˜ XT ˜ X)−1 ˜ XT y
˜ X = x11 . . . x1F 1 x21 . . . x2F 1 . . . xN1 . . . xNF 1 y =
y1 y2 . . . yN
<latexit sha1_base64="CpPYWC8yCiRfefbVRXgbwSkx/R4=">ACMnicbVBNa9tAEF2lzZfy5abHXpaQk5GcgLJRDS3IpKcROwCvMajWyl6xWYncUEMK/qZf+kAOzSEh5Nof0bWtQ+L0wTKP92Z2d15cKGkxCP54Sx8+Lq+srq37G5tb2zutT7t9m5dGQE/kKjfXMbegpIYeSlRwXRjgWazgKr75PvWvbsFYmetLrAqIMj7SMpWCo5OGrfOKHlOmIMUBi2Ekdc2N4dWkFhO/GoaUMVe608JukxztXPjhM9BJ08qMHI0xGrbaQSeYgb4nYUPapMHFsHXHklyUGWgUils7CIMCI3cpSqFg4rPSQsHFDR/BwFHNM7BRPVt5Qr85JaFpbtzRSGfq64maZ9ZWew6M45ju+hNxf95gxLTo6iWuigRtJg/lJaKYk6n+dFEGhCoKke4MNL9lYoxN1ygS9l3IYSLK78n/W4n3O90fx60T06bONbIF/KV7JGQHJITckYuSI8I8ovck0fy5P32Hrxn72XeuQ1M5/JG3h/wFvpKm0</latexit>