learning data representations hierarchies and invariance
play

Learning Data Representations: Hierarchies and Invariance Joachim M. - PowerPoint PPT Presentation

Learning Data Representations: Hierarchies and Invariance Joachim M. Buhmann Computer Science Department, ETH Zurich 23 November 2013 Value chain of IT: Personalized Medicine Activation of the mTOR Signaling Pathway in Renal Clear Cell


  1. Learning Data Representations: Hierarchies and Invariance Joachim M. Buhmann Computer Science Department, ETH Zurich 23 November 2013

  2. Value chain of IT: Personalized Medicine Activation of the mTOR Signaling Pathway in Renal Clear Cell Carcinoma. Robb et al., J Urology 177:346 (2007) my Knowledge my Data my Information my Value happy (alive) patients 23 Nov 2013 Joachim M. Buhmann MIT Workshop 2

  3. Learning features and representations § What are representations good for? § Task specific data reduction § Decision making § Efficient computation § Unfavorable properties of representations § Strongly statistically dependent features D KL ⇣ ⌘ p ( x 1 , . . . , x n k Q i p ( x i ) � 0 difficult to estimate easy to estimate hard to compute simple to compute 23 Nov 2013 Joachim M. Buhmann MIT Workshop 3

  4. Design principles for representations § Decoupling (statistical & computational) find epistemic atoms (symbols) , e.g., grandmother cells Example: chain of boolean variables x 1 ∈ { 0 , 1 } 0 1 0 0 1 n X Consider ξ k = (2 x i − 1) exp( ik 2 π /n ) i =1 23 Nov 2013 Joachim M. Buhmann MIT Workshop 4

  5. Design principles for representations (cont.) § Conditional decoupling § Infer tree structures § Modular structures § Latent variable discovery K-means: sum of average cluster distortions = sum of average pairwise distances 23 Nov 2013 Joachim M. Buhmann MIT Workshop 5

  6. Challenge for learning representations § Learning representations explores the space of structures § Combinatorial search in spaces with dim VC ( ∞ ) § Data adaptive coarsening is required, i.e., in the asymptotic limit we derive a distribution over structures and not a single best one. Current learning theory is insufficient to handle this constraint! => Information / rate distortion theory 23 Nov 2013 Joachim M. Buhmann MIT Workshop 6

  7. Goal: Theory for learning algorithms 10 11 4 1 12 6 2 3 7 8 5 10 11 … 9 4 1 12 A lgorithm A 6 2 3 10 11 7 8 4 5 1 12 6 9 2 3 7 8 5 9 § Modeling in pattern recognition requires § quantization : given identify a set of good hypotheses, A § learning : find an that specifies an informative set! A 23 Nov 2013 Joachim M. Buhmann MIT Workshop 7

  8. Low-Energy Computing § Novel low-power architectures operate near transistor threshold voltage (NTV) § e.g., Intel Claremont § 1.5 mW @10 MHz (x86) § NTV promises 10x more energy efficiency at 10x more parallelism! source: Intel § 10 5 times more soft errors (bits flip stochastically) § Hard to correct in hardware à expose to programmer? @ Thorsten Höffler 23 Nov 2013 Joachim M. Buhmann MIT Workshop 8

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend