SLIDE 14 Ridge regularization
Consider the linear model Y = β. X + ǫ with ǫ ∼ N(0, σ2). Facts
- 1. The maximum likelihood solution is the same as the solution of the
following optimization problem: min
N
(yi − β. xi)2
- 2. Putting a Gaussian prior βi ∼ N(0, η2) on the parameters is the same
as solving the following optimization problem (ridge regularization): min
N
(yi − β. xi)2 + σ2
η2 ||
β||2
2
- 3. It tells the model to avoid high values for the parameters. It is
equivalent to introduce fake data at coordinates:
η , σ η , ..., σ η ), y = 0
Application of Artificial Intelligence March 31, 2020 7 / 17