unifying perspectives on knowledge sharing from atomic to
play

+ Unifying Perspectives on Knowledge Sharing: From Atomic to - PowerPoint PPT Presentation

+ Unifying Perspectives on Knowledge Sharing: From Atomic to Parameterised Domains and Tasks Task-CV @ ECCV 2016 Timothy Hospedales University of Edinburgh & Queen Mary University of London With Yongxin Yang Queen Mary University of


  1. + Unifying Perspectives on Knowledge Sharing: From Atomic to Parameterised Domains and Tasks Task-CV @ ECCV 2016 Timothy Hospedales University of Edinburgh & Queen Mary University of London With Yongxin Yang Queen Mary University of London

  2. + Today’s Topics n Distributed definitions of task/domains, and different problem settings that arise. n A flexible approach to task/domain transfer n Generalizes existing approaches n Generalizes multiple problem settings n Covers shallow and deep models

  3. + Why Transfer Learning? Data Data Model 1 1 1 Lifelong Data Data Learning Model 2 2 2 Model Data Data Model 3 3 3 IID Tasks or Domains But…. Humans seem to generalize across tasks E.g., Crawl => Walk => Run => Scooter => Bike => Motorbike => Driving.

  4. + Taxonomy of Research Issues Sharing Setting Labeling assumption n Sequential / One-way n Supervised n Multi-task n Unsupervised n Life-long learning Transfer Across: Sharing Approach n Task Transfer n Model-based n Domain Transfer n Instance-based n Feature-based Feature/Label Space n Homogeneous Balancing Challenge n Heterogeneous n Positive Transfer Strength n Negative Transfer Robustness

  5. + Taxonomy of Research Issues Sharing Setting Labeling assumption n Sequential / One-way n Supervised n Multi-task n Unsupervised n Life-long learning Transfer Across: Sharing Approach n Task Transfer n Model-based n Domain Transfer n Instance-based n Feature-based Feature/Label Space n Homogeneous Balancing Challenge n Heterogeneous n Positive Transfer Strength n Negative Transfer Robustness

  6. + Overview n A review of some classic methods n A general framework n Example problems and settings n Going deeper n Open questions

  7. + Some Classic Methods – 1 Model Adaptation An example of simple sequential transfer: ∑ T x i + λ w s T w s min y i − w s n Learn a source task: y = f s ( x , w s ) w s i T ( w − w s ) n Learn a target new task: ∑ y i − w T x i + λ ( w − w s ) y = f t ( x , w ) min w i n Regularize new task toward old task n (…rather than toward zero) w 1 w 1 w 2 w 2 Source Target E.g., Yang, ACM MM, 2007

  8. + Some Classic Methods – 1 Model Adaptation An example of simple sequential transfer: T ( w − w s ) n Learn a target new task: ∑ y i − w T x i + λ ( w − w s ) y = f t ( x , w ) min w i n Limitations: ✘ Assumes relatedness of source task ✘ Only sequential, one-way transfer E.g., Yang, ACM MM, 2007

  9. + Some Classic Methods – 2 Regularized Multi-Task An example of simple multi-task transfer: n Learn a set of tasks: { y = f t ( x , w t ) } { } x i , t , y i , t T ( w t − w 0 ) ∑ T x i , t + λ ( w t − w 0 ) min y i , t − w t w 0 , w t i , t t = 1.. T n Regularize each task towards mean of all tasks: w 1 E.g., Evgeniou & Pontil, KDD’04 E.g., Salakhutdinov, CVPR’11 w 2 E.g., Khosla, ECCV’12

  10. + Some Classic Methods – 2 Regularized Multi-Task An example of simple multi-task transfer: n Learn a set of tasks: { y = f t ( x , w t ) } { } x i , t , y i , t T ( w t − w 0 ) ∑ T x i , t + λ ( w t − w 0 ) min y i , t − w t w 0 , w t i , t t = 1.. T y i , t − ( w t + w 0 ) T x i , t ∑ min Or…. n Summary: w 0 , w t i , t t = 1.. T ✔ Now multi-task ✗ Tasks and their mean are inter-dependent: jointly optimise ✗ Still assumes all tasks are (equally) related w 1 w 2

  11. + Some Classic Methods – 3 Task Clustering Relaxing relatedness assumption through task clustering n Learn a set of tasks: { y = f t ( x , w t ) } { } x i , t , y i , t n Assume tasks form K similar groups: n Regularize task towards nearest group T ( w t − w k ' ) ∑ T x i , t + min min y i , t − w t k ' λ ( w t − w k ' ) w k , w t i , t k = 1.. K , t = 1.. T w 1 E.g., Evgeniou et al, JMLR, 2005 E.g., Kang et al, ICML, 2011 w 2

  12. + Some Classic Methods – 3 Task Clustering Multi-task transfer without assuming relatedness n Assume tasks form similar groups: T ( w t − w k ' ) ∑ T x i , t + min min y i , t − w t k ' λ ( w t − w k ' ) w k , w t i , t k = 1.. K , t = 1.. T n Summary: ü Doesn’t require all tasks related => More robust to negative transfer ü Benefits from “more specific” transfer ✗ What about task specific/task independent knowledge? ✗ How to determine number of clusters K? ✗ What if tasks share at the level of “parts”? ✗ Optimization is hard w 1

  13. + Some Classic Methods – 4 Task Factoring n Learn a set of tasks { y = f t ( x , w t ) } { } x i , t , y i , t n Assume related by a factor analysis / latent task structure. Binary task indicator vector { x i , y i , z i } n Notation: Input now triples: T x n STL, weight stacking notation: y = f t ( x , W ) = W T ( ) ( t ,:) x = W z T x i + λ W 2 ( ) ∑ min y i − W z i 2 n Factor Analysis-MTL: W i T x = PQ z T x ( ) ( ) y = W z T x i + λ P + ω Q ∑ ( ) min y i − PQ z i P , Q i E.g., Kumar, ICML’12 E.g., Passos, ICML’12

  14. + Some Classic Methods – 4 Task Factoring n Learn a set of tasks { } x i , y i , z i y = f t ( x , W ) n Assume related by a factor analysis / latent task structure. T x = PQ z T x y = w T ( ) ( ) t x = W z n Factor Analysis-MTL: T x i + λ P + ω Q n What does it mean? ∑ ( ) min y i − PQ z i P , Q i n W: DxK matrix of all task parameters n P: DxK matrix of basis/latent tasks n Q: KxT matrix of low-dimensional task models n => Each task is a low-dimensional linear combination of basis tasks.

  15. + Some Classic Methods – 4 Task Factoring n Learn a set of tasks { } x i , y i , z i y = f t ( x , W ) n Assume related by a factor analysis / latent task structure. T x = PQ z T x y = w T ( ) ( ) t x = W z n What does it mean? n z: (1-hot binary) Activates a column of Q T x i + λ P + ω Q ∑ ( ) min y i − PQ z i n P: DxK matrix of basis/latent tasks P , Q i n Q: KxT matrix of task models n => Tasks lie on a low-dimensional manifold w 1 Q n => Knowledge sharing by jointly learning manifold P n P: Specify the manifold w 2 n Q: Each task’s position on the manifold w 3

  16. + Some Classic Methods – 4 Task Factoring n Summary: n Tasks lie on a low-dimensional manifold n Each task is a low-dimensional linear combination of basis tasks. T x = PQ z T x y = w T ü Can flexibly share or not share: ( ) ( ) t x = W z n Two Q cols (tasks) similarity. T x i + λ P + ω Q ∑ ( ) min y i − PQ z i ü Can share piecewise: P , Q i n Two Q cols (tasks) similar in some rows only ü Can represent globally shared knowledge: w 1 n Uniform row in Q => all tasks activate same basis of P w 2 w 3

  17. + Overview n A review of some classic methods n A general framework n Example problems and settings n Going deeper n Open questions

  18. + MTL Transfer as a Neural Network n Consider a two sided neural network: n Left: Data input x. n Right: Task indicator z. n Output unit y: Inner product of representations n Equivalent to: Task Regularization [Evgeniou KDD’04], if: n Q = W: (trainable) FC layer. P: (fixed) identity matrix. n z: 1-hot task encoding plus a bias bit => The shared knowledge n Linear activation T x i , t ( ) ∑ min y i , t − w t + w 0 w 0 , w t i , t t = 1.. T y = ( w t + w 0 ) T x [ Yang & Hospedales, ICLR’15 ]

  19. + MTL Transfer as a Neural Network n Consider a two sided neural network: n Left: Data input x. n Right: Task indicator z. n Output unit y: Inner product of representation on each side. n Equivalent to: Task Factor Analysis [ Kumar, ICML’12, GO-MTL ] if: Constraining task description/parameters: n Train FC layers P&Q Encompass: 5+ classic MTL/MDL approaches! n z: 1-hot task encoding n Linear activation T x ( ) y = W z T x i ∑ ( ) min y i − PQ z i P , Q i T ∑ ( ) Q z i ( ) = min y i − P x i P , Q i

  20. + MTL Transfer as a Neural Network: Interesting things n Interesting things: n Generalizes many existing frameworks… n Can do regression & classification (activation on y). n Can do multi-task and multi-domain. n As neural network, left side X can be any CNN and train end-to-end T x ( ) y = W z T ∑ ( ) Q z i ( ) min y i − P x i z: Task/Domain-ID x: Data P , Q i

  21. + MTL Transfer as a Neural Network: Interesting things Interesting things: n Non-linear activation on hidden layers: n Have representation learning on both task and data. n Exploit a non-linear task subspace. w 1 n CF GO-MTL’s linear task subspace. n Final classifier can be non-linear in feature space. w 2 w 3 T ( ) σ Q z ( ) y = σ P x T ∑ ( ) σ Q z i ( ) min y i − σ P x i z: Task/Domain-ID x: Data P , Q i

  22. + Overview n A review of some classic methods n A general framework n Example problems and settings n Going deeper n Open questions

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend