Lecture 7: Cross-Validation
Instructor: Prof. Shuai Huang Industrial and Systems Engineering University of Washington
Lecture 7: Cross-Validation Instructor: Prof. Shuai Huang - - PowerPoint PPT Presentation
Lecture 7: Cross-Validation Instructor: Prof. Shuai Huang Industrial and Systems Engineering University of Washington Underfit, Good fit, and Overfit 2 = 0 + 1 1 + 2 2 + 11 1 2
Instructor: Prof. Shuai Huang Industrial and Systems Engineering University of Washington
π π = πΎ0 + πΎ1π¦1 + πΎ2π¦2 π π = πΎ0 + πΎ1π¦1 + πΎ2π¦2 + πΎ11π¦1
2
+ πΎ22π¦2
2 + πΎ12π¦1π¦2
π π = πΎ0 + πΎ1π¦1 + πΎ2π¦2 + πΎ11π¦1
2
+ πΎ22π¦2
2 + πΎ12π¦1π¦2 + πΎ112π¦1 2π¦2
+ πΎ122π¦1π¦2
2 + β―
decrease; in practice, it always increases. Thus, it is not a good metric to take into consideration of model complexity
if more variables are put into the model even if these new added variables have no relationship with the outcome variable
π2 = 1 β πππΉ πππ
as well. As the underlying regression model is π = πΎπ + π,
takes the form as R-squared=
πΎ2π€ππ π πΎ2π€ππ π +π€ππ (π).
predict π, but also by the variance of π as well.
mathematically more complex (therefore, more able to shape itself to make its prediction correct on data points that are due to noise)
π΅π½π· = 2π β 2 ln ΰ· π
πΆπ½π· = ln π π β 2 ln ΰ· π
an unseen testing data (that represents the future data β which is of course unseen in the model training stage)
access to a dataset (usually we take this dataset as βtraining dataβ β a concept taken for granted)?