23 November 2013
Learning Data Representations: Hierarchies and Invariance Joachim M. - - PowerPoint PPT Presentation
Learning Data Representations: Hierarchies and Invariance Joachim M. - - PowerPoint PPT Presentation
Learning Data Representations: Hierarchies and Invariance Joachim M. Buhmann Computer Science Department, ETH Zurich 23 November 2013 Value chain of IT: Personalized Medicine Activation of the mTOR Signaling Pathway in Renal Clear Cell
23 Nov 2013 Joachim M. Buhmann MIT Workshop 2
Value chain of IT: Personalized Medicine
my Data my Information my Knowledge
Activation of the mTOR Signaling Pathway in Renal Clear Cell Carcinoma. Robb et al., J Urology 177:346 (2007)
my Value
happy (alive) patients
23 Nov 2013 Joachim M. Buhmann MIT Workshop 3
Learning features and representations
§ What are representations good for?
§ Task specific data reduction § Decision making § Efficient computation
§ Unfavorable properties of representations
§ Strongly statistically dependent features
DKL⇣ p(x1, . . . , xnk Q
i p(xi)
⌘
difficult to estimate hard to compute easy to estimate simple to compute
23 Nov 2013 Joachim M. Buhmann MIT Workshop 4
Design principles for representations
§ Decoupling (statistical & computational)
find epistemic atoms (symbols), e.g., grandmother cells Example: chain of boolean variables Consider
1 1
ξk =
n
X
i=1
(2xi − 1) exp(ik2π/n)
x1 ∈ {0, 1}
23 Nov 2013 Joachim M. Buhmann MIT Workshop 5
Design principles for representations (cont.)
§ Conditional decoupling
§ Infer tree structures § Modular structures
§ Latent variable discovery
K-means: sum of average cluster distortions = sum of average pairwise distances
23 Nov 2013 Joachim M. Buhmann MIT Workshop 6
Challenge for learning representations
§ Learning representations explores the space of
structures
§ Combinatorial search in spaces with § Data adaptive coarsening is required, i.e., in the
asymptotic limit we derive a distribution over structures and not a single best one. Current learning theory is insufficient to handle this constraint! => Information / rate distortion theory
dimVC(∞)
23 Nov 2013 Joachim M. Buhmann MIT Workshop 7
Goal: Theory for learning algorithms
§ Modeling in pattern recognition requires
§ quantization: given identify a set of good hypotheses, § learning:
find an that specifies an informative set!
…
1 2 3 4 6 5 9 7 8 10 11 12 1 2 3 4 6 5 9 7 8 10 11 12 1 2 3 4 6 5 9 7 8 10 11 12
Algorithm
A
A A
23 Nov 2013 Joachim M. Buhmann MIT Workshop 8
Low-Energy Computing
§ Novel low-power architectures operate
near transistor threshold voltage (NTV)
§ e.g., Intel Claremont § 1.5 mW @10 MHz (x86)
§ NTV promises 10x more
energy efficiency at 10x more parallelism!
§ 105 times more soft errors (bits flip stochastically) § Hard to correct in hardware à expose to programmer?
source: Intel
@ Thorsten Höffler