#82 Adaptive Neural Trees Ryutaro Tanno , Kai Arulkumaran, Daniel - - PowerPoint PPT Presentation

82
SMART_READER_LITE
LIVE PREVIEW

#82 Adaptive Neural Trees Ryutaro Tanno , Kai Arulkumaran, Daniel - - PowerPoint PPT Presentation

#82 Adaptive Neural Trees Ryutaro Tanno , Kai Arulkumaran, Daniel C. Alexander, Antonio Criminisi, Aditya Nori Two Paradigms of Machine Learning Deep Neural Networks Decision Trees hierarchical representation of data hierarchical


slide-1
SLIDE 1

Adaptive Neural Trees

Ryutaro Tanno, Kai Arulkumaran, Daniel C. Alexander, Antonio Criminisi, Aditya Nori

#82

slide-2
SLIDE 2

Water

Grey matter White Matter

Two Paradigms of Machine Learning

Deep Neural Networks Decision Trees

『hierarchical representation of data』 『hierarchical clustering of data』

Super-resolution of dMR brain images with a DT [Alexander et al. NeuroImage 2017]

ImageNet classifiers with CNNs [Zeiler and Fergus, ECCV 2014]

Low-level features Mid-level features High-level features

Trainable Classifier

Oriented edges & colours Textures & patterns Object parts

slide-3
SLIDE 3

Two Paradigms of Machine Learning

Deep Neural Networks Decision Trees

『hierarchical representation of data』 『hierarchical clustering of data』

slide-4
SLIDE 4

Two Paradigms of Machine Learning

+ learn features of data + scalable learning with stochastic optimisation

  • architectures are hand-designed
  • heavy-weight inference, engaging every

parameter of the model for each input

Deep Neural Networks Decision Trees

『hierarchical representation of data』 『hierarchical clustering of data』

slide-5
SLIDE 5

Two Paradigms of Machine Learning

+ learn features of data + scalable learning with stochastic optimisation

  • architectures are hand-designed
  • heavy-weight inference, engaging every

parameter of the model for each input

Deep Neural Networks

  • operate on hand-designed features
  • limited expressivity with simple splitting functions

+ architectures are learned from data + lightweight inference, activating only a fraction

  • f the model per input

Decision Trees

『hierarchical representation of data』 『hierarchical clustering of data』

slide-6
SLIDE 6

Joining the Paradigms

+ learn features of data + scalable learning with stochastic optimisation + architectures are learned from data + lightweight inference, activating only a fraction

  • f the model per input

『hierarchical representation of data』 『hierarchical clustering of data』

Adaptive Neural Trees

ANTs unify the two paradigms and generalise previous work

slide-7
SLIDE 7

Joining the Paradigms

+ learn features of data + scalable learning with stochastic optimisation + architectures are learned from data + lightweight inference, activating only a fraction

  • f the model per input

『hierarchical representation of data』 『hierarchical clustering of data』

Adaptive Neural Trees

ANTs unify the two paradigms and generalise previous work

slide-8
SLIDE 8
  • ANTs consist of two key designs:

What are ANTs?

slide-9
SLIDE 9
  • ANTs consist of two key designs:

input, x

(1). DTs which uses NNs in every path and routing decisions.

What are ANTs?

slide-10
SLIDE 10
  • ANTs consist of two key designs:

(2). DT-like architecture growth using SGD

(1). DTs which uses NNs in every path and routing decisions. (a) Split (b) Deepen Target Node OR

What are ANTs?

slide-11
SLIDE 11

What are ANTs?

  • ANTs consist of two key designs:

(2). DT-like architecture growth using SGD

(1). DTs which uses NNs in every path and routing decisions.

slide-12
SLIDE 12

Conditional Computation

0.9 1.8

ANT 1 ANT 2 ANT 3

5 10

ANT 1 ANT 2 ANT 3

0.8 1.6

ANT

0K 51K 101K

ANT 1 ANT 2 ANT 3

Multi-path inference Single-path inference

0M 0.65M 1.3M

ANT 1 ANT 2 ANT 3

0K 50K 100K

ANT

Errors

MNIST (%)

Model size drops!

  • Single-path inference enables efficient inference without compromising accuracy.

Number of Parameters

CIFAR10 (%) SARCOS (mse) MNIST CIFAR10 SARCOS

slide-13
SLIDE 13

Adaptive Model Complexity

Models are trained on subsets of size 50, 250, 500, 2.5k, 5k, 25k, 45k examples.

  • ANTs can tune the architecture to the availability of training data.
slide-14
SLIDE 14

Please come & see me at poster #82 for details!

Unsupervised Hierarchical Clustering