Lifelong Learning CS 330 Logistics Project milestone due Wednesday. - - PowerPoint PPT Presentation
Lifelong Learning CS 330 Logistics Project milestone due Wednesday. - - PowerPoint PPT Presentation
Lifelong Learning CS 330 Logistics Project milestone due Wednesday. Two guest lectures next week! Je ff Clune Sergey Levine 2 Plan for Today The lifelong learning problem statement Basic approaches to lifelong learning Can we do better than the
Logistics
2
Project milestone due Wednesday. Two guest lectures next week! Jeff Clune Sergey Levine
Plan for Today
3
The lifelong learning problem statement Basic approaches to lifelong learning Can we do better than the basics? Revisiting the problem statement from the meta-learning perspective
4
A brief review of problem statements.
Meta-Learning
Given i.i.d. task distribu0on, learn a new task efficiently
quickly learn new task learn to learn tasks
Mul8-Task Learning
Learn to solve a set of tasks.
perform tasks learn tasks
5
In contrast, many real world se@ngs look like:
Meta-Learning
learn to learn tasks quickly learn new task
Mul8-Task Learning
perform tasks learn tasks
0me
- a student learning concepts in school
- a deployed image classifica8on system learning from a
stream of images from users
- a robot acquiring an increasingly large set of skills in
different environments
- a virtual assistant learning to help different users with
different tasks at different points in 0me
- a doctor’s assistant aiding in medical decision-making
Some examples:
Our agents may not be given a large batch of data/tasks right off the bat!
Sequen8al learning se@ngs
- nline learning, lifelong learning, con0nual learning, incremental learning, streaming data
dis0nct from sequence data and sequen8al decision-making
Some Terminology
6
- 1. Pick an example se@ng.
- 2. Discuss problem statement with your neighbor:
(a) how would you set-up an experiment to develop & test your algorithm? (b) what are desirable/required proper0es of the algorithm? (c) how do you evaluate such a system?
What is the lifelong learning problem statement?
- A. a student learning concepts in school
- B. a deployed image classifica8on system learning from a
stream of images from users
- C. a robot acquiring an increasingly large set of skills in
different environments
- D. a virtual assistant learning to help different users with
different tasks at different points in 0me
- E. a doctor’s assistant aiding in medical decision-making
Example seTngs: Exercise:
7
Some considera0ons:
- computa8onal resources
- memory
- model performance
- data efficiency
Problem varia0ons:
- task/data order: i.i.d. vs. predictable vs. curriculum vs. adversarial
- others: privacy, interpretability, fairness,
test 0me compute & memory
- discrete task boundaries vs. con8nuous shiVs (vs. both)
- known task boundaries/shiVs vs. unknown
Substan0al variety in problem statement!
What is the lifelong learning problem statement?
8
General [supervised] online learning problem:
What is the lifelong learning problem statement?
for t = 1, …, n
- bserve xt
predict ̂ yt
- bserve label yt
i.i.d. setting: xt ∼ p(x), yt ∼ p(y|x) not a function of p t streaming setting: cannot store (xt, yt)
- lack of memory
- lack of computational resources
- privacy considerations
- want to study neural memory mechanisms
- therwise: xt ∼ pt(x), yt ∼ pt(y|x)
true in some cases, but not in many cases!
- recall: replay buffers
<— if observable task boundaries: observe xt, zt
9
What do you want from your lifelong learning algorithm?
minimal regret (that grows slowly with )
t
regret: cumula0ve loss of learner — cumula0ve loss of best learner in hindsight (cannot be evaluated in prac0ce, useful for analysis) RegretT :=
T
∑
1
ℒt(θt) − min
θ T
∑
1
ℒt(θ)
10
Regret that grows linearly in is trivial.
t
Why?
posi1ve & nega1ve transfer
What do you want from your lifelong learning algorithm?
posi8ve forward transfer: previous tasks cause you to do be[er on future tasks compared to learning future tasks from scratch posi8ve backward transfer: current tasks cause you to do be[er on previous tasks compared to learning past tasks from scratch posi8ve -> nega8ve : beMer -> worse
11
Plan for Today
12
The lifelong learning problem statement Basic approaches to lifelong learning Can we do better than the basics? Revisiting the problem statement from the meta-learning perspective
Store all the data you’ve seen so far, and train on it.
Approaches
—> follow the leader algorithm Take a gradient step on the datapoint you observe. —> stochas8c gradient descent + will achieve very strong performance
- computa8on intensive
- can be memory intensive
—> Con8nuous fine-tuning can help. [depends on the applica0on] + computa0onally cheap + requires 0 memory
- subject to nega8ve backward transfer
“forgeTng” some0mes referred to as catastrophic forgeTng
- slow learning
13
Can we do beMer?
Plan for Today
14
The lifelong learning problem statement Basic approaches to lifelong learning Can we do better than the basics? Revisiting the problem statement from the meta-learning perspective
Case Study: Can we use meta-learning to accelerate online learning?
15
motor malfunction gradual terrain change time
- nline adaptation = few-shot learning
tasks are temporal slices of experience
Recall: model-based meta-RL
16
motor malfunction gradual terrain change time icy terrain k time steps not sufficient to learn entirely new terrain Continue to run SGD?
example online learning problem
+ will be fast with MAML initialization
- what if ice goes away? (subject to forgetting)
17
Nagabandi, Finn, Levine. Deep Online Learning via Meta-Learning. ICLR ‘19
time
Online inference problem: infer latent “task” variable at each time step
Note: If neural net is random initialized, this procedure would be too slow. Alternate between:
M-step: Update mixture of network parameters Mixture of neural networks over task variable T, adapted continually: E-step: Estimate latent “task” variable at each time step given data
prior gradient step on each mixture element, weighted by task probability likelihood of the data under task .
P(Tt = Ti|xt, yt) ∝ pθ(Ti)(yt|xt, Tt = Ti)P(Tt = Ti)
<latexit sha1_base64="o+w8nuWLDJ0VD8mItgNEHIiSq5Q=">AGNXicxVRLbxMxEN6WBEp4tXDkMqKSFRaZRESCFSpEhyQuBTRlxSHldfxJla9D9mzpdHWFb+JC/+DExw4gBX/gLePJrsJlJPCEurHc98/mbm8NPpNDYan1bWr5SqV69tnK9duPmrdt3VtfuHug4VYzvs1jG6sinmksR8X0UKPlRojgNfckP/eOXefzwhCst4mgPBwnvhLQXiUAwitblrVXe1EnWICHFvh9k2nj4CYzOpypWZcFZLjhmiYQU6uTREC9QVg3ziHY50ib1nuBhbpHkJ9ihlyjmUZyYtiGYJSXUZm9MhOkoiIyhYT4YhG3rWPz+DmaQrljX62+O6Xeswz5yiGFh4WAaNZKSFtWAQBnF/ynBXEGOS2QRMUJxpB42ShBid40ZvFnC6gmibeLC+GSumxHcKgEIo8gThESJWJlfUTyANtAdBpaNR5u+a9FcU1JT4bMuWMG0C6XJZbgNG+gNAQ8Q+mSahM+hSAKNHrY8emVDyMT7j9dOoSyM2mJP/H4o6I2nzPzQ/PlXDO1GqMm968ZmDTSC+DVy2x5fJFf0olGc1mMlSPuTUVcACgezkKyJpyfkyBWVMpiUdM7OF+Tt7re2moNB8wb7thYd8Zj1v9QroxS0MeIZNU67bSrCTUYWCSW5qJNU8oeyY9njbmhENue5kw1fPi7W0wVbo/0ihKF3dkVGQ60HoW+ReY26HMudi2LtFINnUxESWpvFxslClIJ9kjmTyh0heIM5cAalNk7KBiwPlWUoX1oa1YEt9zyvHweMu19tsn6zvPo7kWHuOw+chuM6T50d57Wz6+w7rPKp8rXyo/Kz+rn6vfqr+nsEXV4aS3jPKYzqn7/h7iY/</latexit><latexit sha1_base64="o+w8nuWLDJ0VD8mItgNEHIiSq5Q=">AGNXicxVRLbxMxEN6WBEp4tXDkMqKSFRaZRESCFSpEhyQuBTRlxSHldfxJla9D9mzpdHWFb+JC/+DExw4gBX/gLePJrsJlJPCEurHc98/mbm8NPpNDYan1bWr5SqV69tnK9duPmrdt3VtfuHug4VYzvs1jG6sinmksR8X0UKPlRojgNfckP/eOXefzwhCst4mgPBwnvhLQXiUAwitblrVXe1EnWICHFvh9k2nj4CYzOpypWZcFZLjhmiYQU6uTREC9QVg3ziHY50ib1nuBhbpHkJ9ihlyjmUZyYtiGYJSXUZm9MhOkoiIyhYT4YhG3rWPz+DmaQrljX62+O6Xeswz5yiGFh4WAaNZKSFtWAQBnF/ynBXEGOS2QRMUJxpB42ShBid40ZvFnC6gmibeLC+GSumxHcKgEIo8gThESJWJlfUTyANtAdBpaNR5u+a9FcU1JT4bMuWMG0C6XJZbgNG+gNAQ8Q+mSahM+hSAKNHrY8emVDyMT7j9dOoSyM2mJP/H4o6I2nzPzQ/PlXDO1GqMm968ZmDTSC+DVy2x5fJFf0olGc1mMlSPuTUVcACgezkKyJpyfkyBWVMpiUdM7OF+Tt7re2moNB8wb7thYd8Zj1v9QroxS0MeIZNU67bSrCTUYWCSW5qJNU8oeyY9njbmhENue5kw1fPi7W0wVbo/0ihKF3dkVGQ60HoW+ReY26HMudi2LtFINnUxESWpvFxslClIJ9kjmTyh0heIM5cAalNk7KBiwPlWUoX1oa1YEt9zyvHweMu19tsn6zvPo7kWHuOw+chuM6T50d57Wz6+w7rPKp8rXyo/Kz+rn6vfqr+nsEXV4aS3jPKYzqn7/h7iY/</latexit><latexit sha1_base64="o+w8nuWLDJ0VD8mItgNEHIiSq5Q=">AGNXicxVRLbxMxEN6WBEp4tXDkMqKSFRaZRESCFSpEhyQuBTRlxSHldfxJla9D9mzpdHWFb+JC/+DExw4gBX/gLePJrsJlJPCEurHc98/mbm8NPpNDYan1bWr5SqV69tnK9duPmrdt3VtfuHug4VYzvs1jG6sinmksR8X0UKPlRojgNfckP/eOXefzwhCst4mgPBwnvhLQXiUAwitblrVXe1EnWICHFvh9k2nj4CYzOpypWZcFZLjhmiYQU6uTREC9QVg3ziHY50ib1nuBhbpHkJ9ihlyjmUZyYtiGYJSXUZm9MhOkoiIyhYT4YhG3rWPz+DmaQrljX62+O6Xeswz5yiGFh4WAaNZKSFtWAQBnF/ynBXEGOS2QRMUJxpB42ShBid40ZvFnC6gmibeLC+GSumxHcKgEIo8gThESJWJlfUTyANtAdBpaNR5u+a9FcU1JT4bMuWMG0C6XJZbgNG+gNAQ8Q+mSahM+hSAKNHrY8emVDyMT7j9dOoSyM2mJP/H4o6I2nzPzQ/PlXDO1GqMm968ZmDTSC+DVy2x5fJFf0olGc1mMlSPuTUVcACgezkKyJpyfkyBWVMpiUdM7OF+Tt7re2moNB8wb7thYd8Zj1v9QroxS0MeIZNU67bSrCTUYWCSW5qJNU8oeyY9njbmhENue5kw1fPi7W0wVbo/0ihKF3dkVGQ60HoW+ReY26HMudi2LtFINnUxESWpvFxslClIJ9kjmTyh0heIM5cAalNk7KBiwPlWUoX1oa1YEt9zyvHweMu19tsn6zvPo7kWHuOw+chuM6T50d57Wz6+w7rPKp8rXyo/Kz+rn6vfqr+nsEXV4aS3jPKYzqn7/h7iY/</latexit><latexit sha1_base64="ck8pdC+ekZH4nUmSP+ZG7r8lEyk=">AB2XicbZDNSgMxFIXv1L86Vq1rN8EiuCozbnQpuHFZwbZCO5RM5k4bmskMyR2hDH0BF25EfC93vo3pz0JbDwQ+zknIvSculLQUBN9ebWd3b/+gfugfNfzjk9Nmo2fz0gjsilzl5jnmFpXU2CVJCp8LgzyLFfbj6f0i7+gsTLXTzQrMr4WMtUCk7O6oyaraAdLMW2IVxDC9YaNb+GS7KDUJxa0dhEFBUcUNSaFw7g9LiwUXUz7GgUPNM7RtRxzi6dk7A0N+5oYkv394uKZ9bOstjdzDhN7Ga2MP/LBiWlt1EldVESarH6KC0Vo5wtdmaJNChIzRxwYaSblYkJN1yQa8Z3HYSbG29D7odOn4MoA7ncAFXEMIN3MEDdKALAhJ4hXdv4r15H6uat6tDP4I+/zBzjGijg=</latexit><latexit sha1_base64="HYuehfr9ScUR7ZI+Dh5V0LkyYlA=">AGKnicxVRLb9NAEHZLAiU2nLlMqKSFRaxVxAoEpIcEDiUqS+pGyw1ut1sur6od1xaeRufxQX/gcnOHAIX4H6zya2InUE2Ily7Mz34z8+3DT6XQ2Ol8X1m9VavfvrN2t3Fv/f6Djc2t9WOdZIrxI5bIRJ36VHMpYn6EAiU/TRWnkS/5iX/2poifnHOlRIf4jDlvYj2YxEKRtG6vK3a+ybJWySiOPDXBsPn8J0RkczNe+ygBx3XNMGYhpNkgpotgLkgKCA460b3XWGh6BPkF5sg1mlmkIZ9CMd5GZX5WzNFKipiU0qIr5Zx2zp2z16iKZU78TWaBzPqQ8tQrBxReFgKiHajgrRlQBwec1/URJnWNACSVWSYgKpl48TVOhNax5/uYRqmni/vBuqMt2BCdKIPIYkgwhVSJR1kckD7ELRGeRVePJvms+WlFcU+GzIVPNuAMk4LaAoz3BYSGmH8ybUJlOqARIn+AHs2peJRcs7tL8jigMZsuCD/PxR1TtL2f2h+cqpGd6JSZdH08jMHu0B8G7hpj2+SKa+pFONFrIYK0fSn4m4BFA+nKVkbi6ImGiqJTlomZ3cLEmb3O7s9cZDVg03Imx7UzGgbf5lQJyIeI5NU67bSbGXU4WCSW4aJNM8peyM9nXmjGNuO7lo1fPi7WE4Ct0X4xwsg7vyKnkdbDyLfIokZdjRXOZbFuhuGLXi7iNLO3i40ThZkEeySLJxQCoThDObQGZfYOCgZsQBVlaB/ahXBrba8aBw/23Ot/aHjrDmPnMdOy3Gd585r51z4Bw5rPa59q32s/ar/qX+o/57LNfqykS3h05p1P/8BQlyJAo=</latexit><latexit sha1_base64="HYuehfr9ScUR7ZI+Dh5V0LkyYlA=">AGKnicxVRLb9NAEHZLAiU2nLlMqKSFRaxVxAoEpIcEDiUqS+pGyw1ut1sur6od1xaeRufxQX/gcnOHAIX4H6zya2InUE2Ily7Mz34z8+3DT6XQ2Ol8X1m9VavfvrN2t3Fv/f6Djc2t9WOdZIrxI5bIRJ36VHMpYn6EAiU/TRWnkS/5iX/2poifnHOlRIf4jDlvYj2YxEKRtG6vK3a+ybJWySiOPDXBsPn8J0RkczNe+ygBx3XNMGYhpNkgpotgLkgKCA460b3XWGh6BPkF5sg1mlmkIZ9CMd5GZX5WzNFKipiU0qIr5Zx2zp2z16iKZU78TWaBzPqQ8tQrBxReFgKiHajgrRlQBwec1/URJnWNACSVWSYgKpl48TVOhNax5/uYRqmni/vBuqMt2BCdKIPIYkgwhVSJR1kckD7ELRGeRVePJvms+WlFcU+GzIVPNuAMk4LaAoz3BYSGmH8ybUJlOqARIn+AHs2peJRcs7tL8jigMZsuCD/PxR1TtL2f2h+cqpGd6JSZdH08jMHu0B8G7hpj2+SKa+pFONFrIYK0fSn4m4BFA+nKVkbi6ImGiqJTlomZ3cLEmb3O7s9cZDVg03Imx7UzGgbf5lQJyIeI5NU67bSbGXU4WCSW4aJNM8peyM9nXmjGNuO7lo1fPi7WE4Ct0X4xwsg7vyKnkdbDyLfIokZdjRXOZbFuhuGLXi7iNLO3i40ThZkEeySLJxQCoThDObQGZfYOCgZsQBVlaB/ahXBrba8aBw/23Ot/aHjrDmPnMdOy3Gd585r51z4Bw5rPa59q32s/ar/qX+o/57LNfqykS3h05p1P/8BQlyJAo=</latexit><latexit sha1_base64="ekeP9cfXZoDKmiLpGft+htEIbE=">AGNXicxVRLbxMxEN6WBEp4tIUjlxFVRKLSKsFBKpUCQ5IXIroS4rDyut4E6veh+zZ0mjrit/Ehf/BCQ4cQIgrfwFvHk12E6knhKXVjmc+fzPz+eEnUmhstb4tLV+rVK/fWLlZu3X7zt3VtfV7hzpOFeMHLJaxOvap5lJE/AFSn6cKE5DX/Ij/+RlHj865UqLONrHQcI7Ie1FIhCMonV565U3dZI1SEix7weZNh4+hsmMDmdq1mUBGW6pgnE1OokEVBvENaNcwj2OdKm9V5ioe4R5GeYIdopGcGHYgGOVlVGavzASpqIhMISG+WMRt69g6eY6mUO7YV6vTan3LUO+ckjhYSEgmrUS0pZVAMD5Jf9ZQZxBTgskUXGCMSReNkpQojeNWfz5AqpJ4p3iQriLtsRHCmByCOIU4REiVhZH5E8wDYQnYZWjUc7rnlvRXFNic+GTDnjJpAul+UWYLQvIDRE/INpEiqTPgUgSvT62LEpFQ/jU25/3Tq0ogN5uT/h6LOSNr8D82PT9XwTpSqzJtefOZgC4hvA1ft8VUyRdSXdKLRXBZj5Yh7UxEXAIqHs5CsCRcXJIgVlbJY1PQOztfkrW20tlvDAfOGOzY2nPHY89a+kG7M0pBHyCTVu2EuxkVKFgkpsaSTVPKDuhPd62ZkRDrjvZ8NWzj4v1dMHWaL8IYeidXZHRUOtB6FtkXqMux3Lnolg7xeBZJxNRktrbxUaJglSCPZL5EwpdoThDObAGZfYOCgasTxVlaB/amhXBLbc8bxw+2Xat/ba1sfvu40iOFeB89BpOK7z1Nl1Xjt7zoHDKp8qXys/Kj+rn6vfq7+qv0fQ5aWxhPedwqj+QvgriY7</latexit><latexit sha1_base64="o+w8nuWLDJ0VD8mItgNEHIiSq5Q=">AGNXicxVRLbxMxEN6WBEp4tXDkMqKSFRaZRESCFSpEhyQuBTRlxSHldfxJla9D9mzpdHWFb+JC/+DExw4gBX/gLePJrsJlJPCEurHc98/mbm8NPpNDYan1bWr5SqV69tnK9duPmrdt3VtfuHug4VYzvs1jG6sinmksR8X0UKPlRojgNfckP/eOXefzwhCst4mgPBwnvhLQXiUAwitblrVXe1EnWICHFvh9k2nj4CYzOpypWZcFZLjhmiYQU6uTREC9QVg3ziHY50ib1nuBhbpHkJ9ihlyjmUZyYtiGYJSXUZm9MhOkoiIyhYT4YhG3rWPz+DmaQrljX62+O6Xeswz5yiGFh4WAaNZKSFtWAQBnF/ynBXEGOS2QRMUJxpB42ShBid40ZvFnC6gmibeLC+GSumxHcKgEIo8gThESJWJlfUTyANtAdBpaNR5u+a9FcU1JT4bMuWMG0C6XJZbgNG+gNAQ8Q+mSahM+hSAKNHrY8emVDyMT7j9dOoSyM2mJP/H4o6I2nzPzQ/PlXDO1GqMm968ZmDTSC+DVy2x5fJFf0olGc1mMlSPuTUVcACgezkKyJpyfkyBWVMpiUdM7OF+Tt7re2moNB8wb7thYd8Zj1v9QroxS0MeIZNU67bSrCTUYWCSW5qJNU8oeyY9njbmhENue5kw1fPi7W0wVbo/0ihKF3dkVGQ60HoW+ReY26HMudi2LtFINnUxESWpvFxslClIJ9kjmTyh0heIM5cAalNk7KBiwPlWUoX1oa1YEt9zyvHweMu19tsn6zvPo7kWHuOw+chuM6T50d57Wz6+w7rPKp8rXyo/Kz+rn6vfqr+nsEXV4aS3jPKYzqn7/h7iY/</latexit><latexit sha1_base64="o+w8nuWLDJ0VD8mItgNEHIiSq5Q=">AGNXicxVRLbxMxEN6WBEp4tXDkMqKSFRaZRESCFSpEhyQuBTRlxSHldfxJla9D9mzpdHWFb+JC/+DExw4gBX/gLePJrsJlJPCEurHc98/mbm8NPpNDYan1bWr5SqV69tnK9duPmrdt3VtfuHug4VYzvs1jG6sinmksR8X0UKPlRojgNfckP/eOXefzwhCst4mgPBwnvhLQXiUAwitblrVXe1EnWICHFvh9k2nj4CYzOpypWZcFZLjhmiYQU6uTREC9QVg3ziHY50ib1nuBhbpHkJ9ihlyjmUZyYtiGYJSXUZm9MhOkoiIyhYT4YhG3rWPz+DmaQrljX62+O6Xeswz5yiGFh4WAaNZKSFtWAQBnF/ynBXEGOS2QRMUJxpB42ShBid40ZvFnC6gmibeLC+GSumxHcKgEIo8gThESJWJlfUTyANtAdBpaNR5u+a9FcU1JT4bMuWMG0C6XJZbgNG+gNAQ8Q+mSahM+hSAKNHrY8emVDyMT7j9dOoSyM2mJP/H4o6I2nzPzQ/PlXDO1GqMm968ZmDTSC+DVy2x5fJFf0olGc1mMlSPuTUVcACgezkKyJpyfkyBWVMpiUdM7OF+Tt7re2moNB8wb7thYd8Zj1v9QroxS0MeIZNU67bSrCTUYWCSW5qJNU8oeyY9njbmhENue5kw1fPi7W0wVbo/0ihKF3dkVGQ60HoW+ReY26HMudi2LtFINnUxESWpvFxslClIJ9kjmTyh0heIM5cAalNk7KBiwPlWUoX1oa1YEt9zyvHweMu19tsn6zvPo7kWHuOw+chuM6T50d57Wz6+w7rPKp8rXyo/Kz+rn6vfqr+nsEXV4aS3jPKYzqn7/h7iY/</latexit><latexit sha1_base64="o+w8nuWLDJ0VD8mItgNEHIiSq5Q=">AGNXicxVRLbxMxEN6WBEp4tXDkMqKSFRaZRESCFSpEhyQuBTRlxSHldfxJla9D9mzpdHWFb+JC/+DExw4gBX/gLePJrsJlJPCEurHc98/mbm8NPpNDYan1bWr5SqV69tnK9duPmrdt3VtfuHug4VYzvs1jG6sinmksR8X0UKPlRojgNfckP/eOXefzwhCst4mgPBwnvhLQXiUAwitblrVXe1EnWICHFvh9k2nj4CYzOpypWZcFZLjhmiYQU6uTREC9QVg3ziHY50ib1nuBhbpHkJ9ihlyjmUZyYtiGYJSXUZm9MhOkoiIyhYT4YhG3rWPz+DmaQrljX62+O6Xeswz5yiGFh4WAaNZKSFtWAQBnF/ynBXEGOS2QRMUJxpB42ShBid40ZvFnC6gmibeLC+GSumxHcKgEIo8gThESJWJlfUTyANtAdBpaNR5u+a9FcU1JT4bMuWMG0C6XJZbgNG+gNAQ8Q+mSahM+hSAKNHrY8emVDyMT7j9dOoSyM2mJP/H4o6I2nzPzQ/PlXDO1GqMm968ZmDTSC+DVy2x5fJFf0olGc1mMlSPuTUVcACgezkKyJpyfkyBWVMpiUdM7OF+Tt7re2moNB8wb7thYd8Zj1v9QroxS0MeIZNU67bSrCTUYWCSW5qJNU8oeyY9njbmhENue5kw1fPi7W0wVbo/0ihKF3dkVGQ60HoW+ReY26HMudi2LtFINnUxESWpvFxslClIJ9kjmTyh0heIM5cAalNk7KBiwPlWUoX1oa1YEt9zyvHweMu19tsn6zvPo7kWHuOw+chuM6T50d57Wz6+w7rPKp8rXyo/Kz+rn6vfqr+nsEXV4aS3jPKYzqn7/h7iY/</latexit><latexit sha1_base64="o+w8nuWLDJ0VD8mItgNEHIiSq5Q=">AGNXicxVRLbxMxEN6WBEp4tXDkMqKSFRaZRESCFSpEhyQuBTRlxSHldfxJla9D9mzpdHWFb+JC/+DExw4gBX/gLePJrsJlJPCEurHc98/mbm8NPpNDYan1bWr5SqV69tnK9duPmrdt3VtfuHug4VYzvs1jG6sinmksR8X0UKPlRojgNfckP/eOXefzwhCst4mgPBwnvhLQXiUAwitblrVXe1EnWICHFvh9k2nj4CYzOpypWZcFZLjhmiYQU6uTREC9QVg3ziHY50ib1nuBhbpHkJ9ihlyjmUZyYtiGYJSXUZm9MhOkoiIyhYT4YhG3rWPz+DmaQrljX62+O6Xeswz5yiGFh4WAaNZKSFtWAQBnF/ynBXEGOS2QRMUJxpB42ShBid40ZvFnC6gmibeLC+GSumxHcKgEIo8gThESJWJlfUTyANtAdBpaNR5u+a9FcU1JT4bMuWMG0C6XJZbgNG+gNAQ8Q+mSahM+hSAKNHrY8emVDyMT7j9dOoSyM2mJP/H4o6I2nzPzQ/PlXDO1GqMm968ZmDTSC+DVy2x5fJFf0olGc1mMlSPuTUVcACgezkKyJpyfkyBWVMpiUdM7OF+Tt7re2moNB8wb7thYd8Zj1v9QroxS0MeIZNU67bSrCTUYWCSW5qJNU8oeyY9njbmhENue5kw1fPi7W0wVbo/0ihKF3dkVGQ60HoW+ReY26HMudi2LtFINnUxESWpvFxslClIJ9kjmTyh0heIM5cAalNk7KBiwPlWUoX1oa1YEt9zyvHweMu19tsn6zvPo7kWHuOw+chuM6T50d57Wz6+w7rPKp8rXyo/Kz+rn6vfqr+nsEXV4aS3jPKYzqn7/h7iY/</latexit><latexit sha1_base64="o+w8nuWLDJ0VD8mItgNEHIiSq5Q=">AGNXicxVRLbxMxEN6WBEp4tXDkMqKSFRaZRESCFSpEhyQuBTRlxSHldfxJla9D9mzpdHWFb+JC/+DExw4gBX/gLePJrsJlJPCEurHc98/mbm8NPpNDYan1bWr5SqV69tnK9duPmrdt3VtfuHug4VYzvs1jG6sinmksR8X0UKPlRojgNfckP/eOXefzwhCst4mgPBwnvhLQXiUAwitblrVXe1EnWICHFvh9k2nj4CYzOpypWZcFZLjhmiYQU6uTREC9QVg3ziHY50ib1nuBhbpHkJ9ihlyjmUZyYtiGYJSXUZm9MhOkoiIyhYT4YhG3rWPz+DmaQrljX62+O6Xeswz5yiGFh4WAaNZKSFtWAQBnF/ynBXEGOS2QRMUJxpB42ShBid40ZvFnC6gmibeLC+GSumxHcKgEIo8gThESJWJlfUTyANtAdBpaNR5u+a9FcU1JT4bMuWMG0C6XJZbgNG+gNAQ8Q+mSahM+hSAKNHrY8emVDyMT7j9dOoSyM2mJP/H4o6I2nzPzQ/PlXDO1GqMm968ZmDTSC+DVy2x5fJFf0olGc1mMlSPuTUVcACgezkKyJpyfkyBWVMpiUdM7OF+Tt7re2moNB8wb7thYd8Zj1v9QroxS0MeIZNU67bSrCTUYWCSW5qJNU8oeyY9njbmhENue5kw1fPi7W0wVbo/0ihKF3dkVGQ60HoW+ReY26HMudi2LtFINnUxESWpvFxslClIJ9kjmTyh0heIM5cAalNk7KBiwPlWUoX1oa1YEt9zyvHweMu19tsn6zvPo7kWHuOw+chuM6T50d57Wz6+w7rPKp8rXyo/Kz+rn6vfqr+nsEXV4aS3jPKYzqn7/h7iY/</latexit><latexit sha1_base64="o+w8nuWLDJ0VD8mItgNEHIiSq5Q=">AGNXicxVRLbxMxEN6WBEp4tXDkMqKSFRaZRESCFSpEhyQuBTRlxSHldfxJla9D9mzpdHWFb+JC/+DExw4gBX/gLePJrsJlJPCEurHc98/mbm8NPpNDYan1bWr5SqV69tnK9duPmrdt3VtfuHug4VYzvs1jG6sinmksR8X0UKPlRojgNfckP/eOXefzwhCst4mgPBwnvhLQXiUAwitblrVXe1EnWICHFvh9k2nj4CYzOpypWZcFZLjhmiYQU6uTREC9QVg3ziHY50ib1nuBhbpHkJ9ihlyjmUZyYtiGYJSXUZm9MhOkoiIyhYT4YhG3rWPz+DmaQrljX62+O6Xeswz5yiGFh4WAaNZKSFtWAQBnF/ynBXEGOS2QRMUJxpB42ShBid40ZvFnC6gmibeLC+GSumxHcKgEIo8gThESJWJlfUTyANtAdBpaNR5u+a9FcU1JT4bMuWMG0C6XJZbgNG+gNAQ8Q+mSahM+hSAKNHrY8emVDyMT7j9dOoSyM2mJP/H4o6I2nzPzQ/PlXDO1GqMm968ZmDTSC+DVy2x5fJFf0olGc1mMlSPuTUVcACgezkKyJpyfkyBWVMpiUdM7OF+Tt7re2moNB8wb7thYd8Zj1v9QroxS0MeIZNU67bSrCTUYWCSW5qJNU8oeyY9njbmhENue5kw1fPi7W0wVbo/0ihKF3dkVGQ60HoW+ReY26HMudi2LtFINnUxESWpvFxslClIJ9kjmTyh0heIM5cAalNk7KBiwPlWUoX1oa1YEt9zyvHweMu19tsn6zvPo7kWHuOw+chuM6T50d57Wz6+w7rPKp8rXyo/Kz+rn6vfqr+nsEXV4aS3jPKYzqn7/h7iY/</latexit>Nagabandi, Finn, Levine. Deep Online Learning via Meta-Learning. ICLR ‘19
18
Nagabandi, Finn, Levine. Deep Online Learning via Meta-Learning. ICLR ‘19
Crawler with crippled legs
Does it work?
- nline learning w. MAML initialization
SGD w. MAML initialization MAML (always reset to prior + 1 grad step) model-based, no adaptation model-based, grad steps
Nagabandi, Finn, Levine. Deep Online Learning via Meta-Learning. ICLR ‘19
no meta-learning
19
Nagabandi, Finn, Levine. Deep Online Learning via Meta-Learning. ICLR ‘19
Nagabandi, Finn, Levine. Deep Online Learning via Meta-Learning. ICLR ‘19
Latent task distribution during online learning
Does it work?
Crawler with crippled legs
20
Nagabandi, Finn, Levine. Deep Online Learning via Meta-Learning. ICLR ‘19
Case Study: Can we modify vanilla SGD to avoid nega0ve backward transfer?
21
(from scratch)
Idea:
22
Lopez-Paz & Ranzato. Gradient Episodic Memory for Continual Learning. NeurIPS ‘17
(1) store small amount of data per task in memory (2) when making updates for new tasks, ensure that they don’t unlearn previous tasks
How do we accomplish (2)?
memory: for task
ℳk zk
For t = 0,...,T minimize ℒ( fθ( ⋅ , zt) , (xt, yt) ) subject to for all
ℒ( fθ , ℳk ) ≤ ℒ( f t−1
θ
, ℳk ) zk < zt
learning predictor yt = fθ(xt, zt) (i.e. s.t. loss on previous tasks doesn’t get worse) Can formulate & solve as a QP. Assume local linearity: ⟨gt, gk⟩ := ⟨ ∂ℒ( fθ , (xt, yt) ) ∂θ , ℒ( fθ , ℳk ) ∂θ ⟩ ≥ 0 for all zk < zt
23
Lopez-Paz & Ranzato. Gradient Episodic Memory for Continual Learning. NeurIPS ‘17
Experiments
If we take a step back… do these experimental domains make sense?
BWT: backward transfer, FWT: forward transfer
- MNIST permuta0ons
- MNIST rota0ons
- CIFAR-100 (5 new classes/task)
Problems: Total memory size: 5012 examples
Can we meta-learn how to avoid nega0ve backward transfer?
24
Javed & White. Meta-Learning Representa3ons for Con3nual Learning. NeurIPS ‘19
Plan for Today
25
The lifelong learning problem statement Basic approaches to lifelong learning Can we do better than the basics? Revisiting the problem statement from the meta-learning perspective
More realis8cally:
learn learn learn learn learn learn slow learning rapid learning learn
0me
What might be wrong with the online learning formula0on?
Online Learning
(Hannan ’57, Zinkevich ’03)
Perform sequence of tasks while minimizing sta0c regret.
0me
perform perform perform perform perform perform perform
zero-shot performance
26
Online Learning
(Hannan ’57, Zinkevich ’03)
Perform sequence of tasks while minimizing sta0c regret.
(Finn*, Rajeswaran*, Kakade, Levine ICML ’18)
Online Meta-Learning
Efficiently learn a sequence of tasks from a non-sta0onary distribu0on.
0me
learn learn learn learn learn learn learn
0me
perform perform perform perform perform perform perform
zero-shot performance evaluate performance aQer seeing a small amount of data
What might be wrong with the online learning formula0on?
27
Primarily a difference in evalua&on, rather than the data stream.
The Online Meta-Learning Se=ng
RegretT :=
T
X
t=1
`t(Φt(✓t)) − min
θ∈Θ T
X
t=1
`t(Φt(✓))
<latexit sha1_base64="2+KP9DCIWvgRsB3gBx4m2dSNKQ=">ACu3ichVFNbxMxEPUuX6UFmvJx4mKRWoPRLvhQIUqRIcOHAIKGkrZcPK650kprZ3ZY+LotWK38kP4H/gTfZAUyRGsvz0Zt6M/SavpLAYx7+C8M7de/cf7D3cP3j0+Mlh7+jpuS2d4TDlpSzNZc4sSKFhigIlXFYGmMolXORXH9r8xTUYK0o9wXUFc8WiwEZ+iprPczimiqGK6Mqr/C0gA2YS+H9HUOpXVOEqabxOagpQZHqfjlWgvXAGyDE9O6BsvFjqrtxRNhabpIXN/xt4eRlvX48iDdBb4OkA3SxTg7CqK0KLlToJFLZu0siSuc18yg4BKa/dRZqBi/YkuYeaiZAjuvN0419LVnCrojT8a6Yb9W1EzZe1a5b6yNcXu5lryX7mZw8XpvBa6cgiabwctnKRY0tZ2WgDHOXaA8aN8G+lfMUM4+iXc2MKaKcEgvI/0fCDl0oxXdQp/9zUrYs7tCsq30X5HApZQLeIpvG+Jrsu3gbnw0HydjD8Muyfewc3iMvyStyTBLyjpyRT2RMpoST38FB8Dx4EY5CHn4P5bY0DrNM3IjQvcHPI7YVw=</latexit>Goal: Learning algorithm with sub-linear
Loss of algorithm Loss of best algorithm in hindsight
28
for task t = 1, …, n
- bserve tr
t
use update procedure to produce parameters Φ(θt, tr
t )
ϕt
- bserve label yt
- bserve xt
predict ̂ yt = fϕt(xt) Standard online learning se@ng
(Finn*, Rajeswaran*, Kakade, Levine ICML ’18)
29
Store all the data you’ve seen so far, and train on it. Recall the follow the leader (FTL) algorithm: Follow the meta-leader (FTML) algorithm:
Can we apply meta-learning in lifelong learning seTngs?
Store all the data you’ve seen so far, and meta-train on it. Run update procedure on the current task. Deploy model on current task. What meta-learning algorithms are well-suited for FTML? What if is non-sta0onary?
pt(𝒰)
Experiment with sequences of tasks:
- Colored, rotated, scaled MNIST
- 3D object pose predic1on
- CIFAR-100 classifica0on
Example pose predic0on tasks plane car chair
Experiments
30
Experiments
Learning efficiency
(# datapoints)
Task index Rainbow MNIST Pose Predic8on Task index Rainbow MNIST Pose Predic8on
- TOE (train on everything): train on all data so far
- FTL (follow the leader): train on all data so far, fine-tune on current task
- From Scratch: train from scratch on each task
Learning proficiency
(error)
Comparisons:
31
Follow The Meta-Leader learns each new task faster & with greater proficiency, approaches few-shot learning regime
32
Takeaways
Many flavors of lifelong learning, all under the same name. Defining the problem statement is often the hardest part Meta-learning can be viewed as a slice of the lifelong learning problem. A very open area of research.
33
Reminders
Project milestone due Wednesday. Two guest lectures next week! Jeff Clune Sergey Levine