Overfitting and Regularization
March 31, 2020 Data Science CSCI 1951A Brown University Instructor: Ellie Pavlick HTAs: Josh Levin, Diane Mutako, Sol Zitter
1
Overfitting and Regularization March 31, 2020 Data Science CSCI - - PowerPoint PPT Presentation
Overfitting and Regularization March 31, 2020 Data Science CSCI 1951A Brown University Instructor: Ellie Pavlick HTAs: Josh Levin, Diane Mutako, Sol Zitter 1 Announcements Office Hourswatch calendar ML assignment out later today
March 31, 2020 Data Science CSCI 1951A Brown University Instructor: Ellie Pavlick HTAs: Josh Levin, Diane Mutako, Sol Zitter
1
they don’t see
train set and a test set—and assess performance
5
Train
6
Train MSE = 6
7
Test
8
Test MSE = 12
9
Train MSE = 4
Problem gets worse as models get more powerful/flexible
10
MSE = 14
Problem gets worse as models get more powerful/flexible
11
accs = [] for i in range(num_folds): train, test = random.split(data) clf.fit(train) accs.append(clf.score(test))
performance, we can use cross validation
“complex” than is needed to explain the variation we care about
parameters (i.e. features) is high
training data, often without learning anything generalizable to test time
weights), or for assuming features are very important (more higher weights)
gradient descent) stop before the model has fully converged (i.e. you assume the final steps are spent memorizing noise)
during training…
about)
how much you penalize
parameters), E.g.
you use test to set these parameters, you are “peaking” at unseen data in order to fit the model, and thus test performance is no longer actually representative of how you would do in the real world
l1 = X
i
|xi|
<latexit sha1_base64="iV4wmelmObujAL5Q8TPaMjUlndc=">AB+3icbVBNSwMxEJ31s9avtR69BIvgqeyKoBeh6MVjBfsB7bJk02wbmSXJCstbf+KFw+KePWPePfmLZ70NYHA4/3ZpiZF6WcaeN5387a+sbm1nZhp7i7t39w6B6VGjrJFKF1kvBEtSKsKWeS1g0znLZSRbGIOG1Gg7uZ3yiSrNEPpRSgOBe5LFjGBjpdAt8dBHN6ijMxEyNBmGbBK6Za/izYFWiZ+TMuSohe5Xp5uQTFBpCMdat30vNcEYK8MIp9NiJ9M0xWSAe7RtqcSC6mA8v32KzqzSRXGibEmD5urviTEWo9EZDsFNn297M3E/7x2ZuLrYMxkmhkqyWJRnHFkEjQLAnWZosTwkSWYKGZvRaSPFSbGxlW0IfjL6+SxkXF9yr+w2W5epvHUYATOIVz8OEKqnAPNagDgSE8wyu8OVPnxXl3Phata04+cwx/4Hz+ANX0k6k=</latexit><latexit sha1_base64="iV4wmelmObujAL5Q8TPaMjUlndc=">AB+3icbVBNSwMxEJ31s9avtR69BIvgqeyKoBeh6MVjBfsB7bJk02wbmSXJCstbf+KFw+KePWPePfmLZ70NYHA4/3ZpiZF6WcaeN5387a+sbm1nZhp7i7t39w6B6VGjrJFKF1kvBEtSKsKWeS1g0znLZSRbGIOG1Gg7uZ3yiSrNEPpRSgOBe5LFjGBjpdAt8dBHN6ijMxEyNBmGbBK6Za/izYFWiZ+TMuSohe5Xp5uQTFBpCMdat30vNcEYK8MIp9NiJ9M0xWSAe7RtqcSC6mA8v32KzqzSRXGibEmD5urviTEWo9EZDsFNn297M3E/7x2ZuLrYMxkmhkqyWJRnHFkEjQLAnWZosTwkSWYKGZvRaSPFSbGxlW0IfjL6+SxkXF9yr+w2W5epvHUYATOIVz8OEKqnAPNagDgSE8wyu8OVPnxXl3Phata04+cwx/4Hz+ANX0k6k=</latexit><latexit sha1_base64="iV4wmelmObujAL5Q8TPaMjUlndc=">AB+3icbVBNSwMxEJ31s9avtR69BIvgqeyKoBeh6MVjBfsB7bJk02wbmSXJCstbf+KFw+KePWPePfmLZ70NYHA4/3ZpiZF6WcaeN5387a+sbm1nZhp7i7t39w6B6VGjrJFKF1kvBEtSKsKWeS1g0znLZSRbGIOG1Gg7uZ3yiSrNEPpRSgOBe5LFjGBjpdAt8dBHN6ijMxEyNBmGbBK6Za/izYFWiZ+TMuSohe5Xp5uQTFBpCMdat30vNcEYK8MIp9NiJ9M0xWSAe7RtqcSC6mA8v32KzqzSRXGibEmD5urviTEWo9EZDsFNn297M3E/7x2ZuLrYMxkmhkqyWJRnHFkEjQLAnWZosTwkSWYKGZvRaSPFSbGxlW0IfjL6+SxkXF9yr+w2W5epvHUYATOIVz8OEKqnAPNagDgSE8wyu8OVPnxXl3Phata04+cwx/4Hz+ANX0k6k=</latexit><latexit sha1_base64="iV4wmelmObujAL5Q8TPaMjUlndc=">AB+3icbVBNSwMxEJ31s9avtR69BIvgqeyKoBeh6MVjBfsB7bJk02wbmSXJCstbf+KFw+KePWPePfmLZ70NYHA4/3ZpiZF6WcaeN5387a+sbm1nZhp7i7t39w6B6VGjrJFKF1kvBEtSKsKWeS1g0znLZSRbGIOG1Gg7uZ3yiSrNEPpRSgOBe5LFjGBjpdAt8dBHN6ijMxEyNBmGbBK6Za/izYFWiZ+TMuSohe5Xp5uQTFBpCMdat30vNcEYK8MIp9NiJ9M0xWSAe7RtqcSC6mA8v32KzqzSRXGibEmD5urviTEWo9EZDsFNn297M3E/7x2ZuLrYMxkmhkqyWJRnHFkEjQLAnWZosTwkSWYKGZvRaSPFSbGxlW0IfjL6+SxkXF9yr+w2W5epvHUYATOIVz8OEKqnAPNagDgSE8wyu8OVPnxXl3Phata04+cwx/4Hz+ANX0k6k=</latexit>l2 = sX
i
x2
i
<latexit sha1_base64="OhDaq7DQE40PuW62nYTke30IDVw=">ACAnicbVDLSsNAFL3xWesr6krcDBbBVUmKoBuh6MZlBfuAJobJdNIOnUnizEQsobjxV9y4UMStX+HOv3H6WGjrgQuHc+7l3nvClDOlHefbWlhcWl5ZLawV1zc2t7btnd2GSjJaJ0kPJGtECvKWUzrmlOW6mkWIScNsP+5chv3lOpWBLf6EFKfYG7MYsYwdpIgb3Pgwo6R56kzr3VCYCh4CdlsZBnbJKTtjoHniTkJpqgF9pfXSUgmaKwJx0q1XSfVfo6lZoTYdHLFE0x6eMubRsaY0GVn49fGKIjo3RQlEhTsUZj9fdEjoVSAxGaToF1T816I/E/r53p6MzPWZxmsZksijKONIJGuWBOkxSovnAEwkM7ci0sMSE21SK5oQ3NmX50mjUnadsnt9UqpeTOMowAEcwjG4cApVuIa1IHAIzDK7xZT9aL9W59TFoXrOnMHvyB9fkDLsOWpg=</latexit><latexit sha1_base64="OhDaq7DQE40PuW62nYTke30IDVw=">ACAnicbVDLSsNAFL3xWesr6krcDBbBVUmKoBuh6MZlBfuAJobJdNIOnUnizEQsobjxV9y4UMStX+HOv3H6WGjrgQuHc+7l3nvClDOlHefbWlhcWl5ZLawV1zc2t7btnd2GSjJaJ0kPJGtECvKWUzrmlOW6mkWIScNsP+5chv3lOpWBLf6EFKfYG7MYsYwdpIgb3Pgwo6R56kzr3VCYCh4CdlsZBnbJKTtjoHniTkJpqgF9pfXSUgmaKwJx0q1XSfVfo6lZoTYdHLFE0x6eMubRsaY0GVn49fGKIjo3RQlEhTsUZj9fdEjoVSAxGaToF1T816I/E/r53p6MzPWZxmsZksijKONIJGuWBOkxSovnAEwkM7ci0sMSE21SK5oQ3NmX50mjUnadsnt9UqpeTOMowAEcwjG4cApVuIa1IHAIzDK7xZT9aL9W59TFoXrOnMHvyB9fkDLsOWpg=</latexit><latexit sha1_base64="OhDaq7DQE40PuW62nYTke30IDVw=">ACAnicbVDLSsNAFL3xWesr6krcDBbBVUmKoBuh6MZlBfuAJobJdNIOnUnizEQsobjxV9y4UMStX+HOv3H6WGjrgQuHc+7l3nvClDOlHefbWlhcWl5ZLawV1zc2t7btnd2GSjJaJ0kPJGtECvKWUzrmlOW6mkWIScNsP+5chv3lOpWBLf6EFKfYG7MYsYwdpIgb3Pgwo6R56kzr3VCYCh4CdlsZBnbJKTtjoHniTkJpqgF9pfXSUgmaKwJx0q1XSfVfo6lZoTYdHLFE0x6eMubRsaY0GVn49fGKIjo3RQlEhTsUZj9fdEjoVSAxGaToF1T816I/E/r53p6MzPWZxmsZksijKONIJGuWBOkxSovnAEwkM7ci0sMSE21SK5oQ3NmX50mjUnadsnt9UqpeTOMowAEcwjG4cApVuIa1IHAIzDK7xZT9aL9W59TFoXrOnMHvyB9fkDLsOWpg=</latexit><latexit sha1_base64="OhDaq7DQE40PuW62nYTke30IDVw=">ACAnicbVDLSsNAFL3xWesr6krcDBbBVUmKoBuh6MZlBfuAJobJdNIOnUnizEQsobjxV9y4UMStX+HOv3H6WGjrgQuHc+7l3nvClDOlHefbWlhcWl5ZLawV1zc2t7btnd2GSjJaJ0kPJGtECvKWUzrmlOW6mkWIScNsP+5chv3lOpWBLf6EFKfYG7MYsYwdpIgb3Pgwo6R56kzr3VCYCh4CdlsZBnbJKTtjoHniTkJpqgF9pfXSUgmaKwJx0q1XSfVfo6lZoTYdHLFE0x6eMubRsaY0GVn49fGKIjo3RQlEhTsUZj9fdEjoVSAxGaToF1T816I/E/r53p6MzPWZxmsZksijKONIJGuWBOkxSovnAEwkM7ci0sMSE21SK5oQ3NmX50mjUnadsnt9UqpeTOMowAEcwjG4cApVuIa1IHAIzDK7xZT9aL9W59TFoXrOnMHvyB9fkDLsOWpg=</latexit>lp =
p
sX
i
xp
i
<latexit sha1_base64="yLJpu1cRV5/0ghZ5Pmx7ETiJRo=">ACBXicbVDLSsNAFJ3UV62vqEtdDBbBVUlE0I1QdOygn1AEsNkOmHTibjzEQsoRs3/obF4q49R/c+TdO2y09cCFwzn3cu89kWBUacf5tkoLi0vLK+XVytr6xuaWvb3TUmkmMWnilKWyEyFGOWkqalmpCMkQUnESDsaXI79j2Riqb8Rg8FCRLU4zSmGkjhfY+CwU8h76k9oTQe6rLAkpfAjprRiFdtWpORPAeIWpAoKNEL7y+mOEsI15ghpTzXETrIkdQUMzKq+JkiAuEB6hHPUI4SoJ8sUIHhqlC+NUmuIaTtTfEzlKlBomkelMkO6rW8s/ud5mY7PgpxykWnC8XRnDGoUziOBHapJFizoSEIS2puhbiPJMLaBFcxIbizL8+T1nHNdWru9Um1flHEUQZ74AcARecgjq4Ag3QBg8gmfwCt6sJ+vFerc+pq0lq5jZBX9gf4APaGYaA=</latexit><latexit sha1_base64="yLJpu1cRV5/0ghZ5Pmx7ETiJRo=">ACBXicbVDLSsNAFJ3UV62vqEtdDBbBVUlE0I1QdOygn1AEsNkOmHTibjzEQsoRs3/obF4q49R/c+TdO2y09cCFwzn3cu89kWBUacf5tkoLi0vLK+XVytr6xuaWvb3TUmkmMWnilKWyEyFGOWkqalmpCMkQUnESDsaXI79j2Riqb8Rg8FCRLU4zSmGkjhfY+CwU8h76k9oTQe6rLAkpfAjprRiFdtWpORPAeIWpAoKNEL7y+mOEsI15ghpTzXETrIkdQUMzKq+JkiAuEB6hHPUI4SoJ8sUIHhqlC+NUmuIaTtTfEzlKlBomkelMkO6rW8s/ud5mY7PgpxykWnC8XRnDGoUziOBHapJFizoSEIS2puhbiPJMLaBFcxIbizL8+T1nHNdWru9Um1flHEUQZ74AcARecgjq4Ag3QBg8gmfwCt6sJ+vFerc+pq0lq5jZBX9gf4APaGYaA=</latexit><latexit sha1_base64="yLJpu1cRV5/0ghZ5Pmx7ETiJRo=">ACBXicbVDLSsNAFJ3UV62vqEtdDBbBVUlE0I1QdOygn1AEsNkOmHTibjzEQsoRs3/obF4q49R/c+TdO2y09cCFwzn3cu89kWBUacf5tkoLi0vLK+XVytr6xuaWvb3TUmkmMWnilKWyEyFGOWkqalmpCMkQUnESDsaXI79j2Riqb8Rg8FCRLU4zSmGkjhfY+CwU8h76k9oTQe6rLAkpfAjprRiFdtWpORPAeIWpAoKNEL7y+mOEsI15ghpTzXETrIkdQUMzKq+JkiAuEB6hHPUI4SoJ8sUIHhqlC+NUmuIaTtTfEzlKlBomkelMkO6rW8s/ud5mY7PgpxykWnC8XRnDGoUziOBHapJFizoSEIS2puhbiPJMLaBFcxIbizL8+T1nHNdWru9Um1flHEUQZ74AcARecgjq4Ag3QBg8gmfwCt6sJ+vFerc+pq0lq5jZBX9gf4APaGYaA=</latexit><latexit sha1_base64="yLJpu1cRV5/0ghZ5Pmx7ETiJRo=">ACBXicbVDLSsNAFJ3UV62vqEtdDBbBVUlE0I1QdOygn1AEsNkOmHTibjzEQsoRs3/obF4q49R/c+TdO2y09cCFwzn3cu89kWBUacf5tkoLi0vLK+XVytr6xuaWvb3TUmkmMWnilKWyEyFGOWkqalmpCMkQUnESDsaXI79j2Riqb8Rg8FCRLU4zSmGkjhfY+CwU8h76k9oTQe6rLAkpfAjprRiFdtWpORPAeIWpAoKNEL7y+mOEsI15ghpTzXETrIkdQUMzKq+JkiAuEB6hHPUI4SoJ8sUIHhqlC+NUmuIaTtTfEzlKlBomkelMkO6rW8s/ud5mY7PgpxykWnC8XRnDGoUziOBHapJFizoSEIS2puhbiPJMLaBFcxIbizL8+T1nHNdWru9Um1flHEUQZ74AcARecgjq4Ag3QBg8gmfwCt6sJ+vFerc+pq0lq5jZBX9gf4APaGYaA=</latexit>sklearn)
minw
minw
minw