Deep neural networks and structured output problems presentation of - - PowerPoint PPT Presentation

deep neural networks and structured output problems
SMART_READER_LITE
LIVE PREVIEW

Deep neural networks and structured output problems presentation of - - PowerPoint PPT Presentation

Deep neural networks and structured output problems presentation of my current PhD work ISP seminar. UCL, Louvain-la-Neuve 2016 Soufiane Belharbi Romain Hrault Clment Chatelain Sbastien Adam soufiane.belharbi@insa-rouen.fr LITIS lab.,


slide-1
SLIDE 1

images/logos

Deep neural networks and structured output problems

presentation of my current PhD work ISP seminar. UCL, Louvain-la-Neuve 2016

Soufiane Belharbi Romain Hérault Clément Chatelain Sébastien Adam

soufiane.belharbi@insa-rouen.fr

LITIS lab., Apprentissage team - INSA de Rouen, France Dec.12t h.2016 LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning

slide-2
SLIDE 2

images/logos Introduction

My PhD work

1

  • S. Belharbi, R.Hérault, C. Chatelain, S. Adam, Deep multi-task learning

with evolving weights, in conference: European Symposium on Artificial Neural Networks (ESANN), 2016

2

  • S. Belharbi, C. Chatelain, R.Hérault, S. Adam, A regularization scheme

for structured output problems: an application to facial landmark

  • detection. 2016. submitted to Pattern Recognition journal (PR). ArXiv:

arxiv.org/abs/1504.07550

3

  • S. Belharbi, R.Hérault, C. Chatelain, R. Modzelewski, S. Adam, M. Chastan,
  • S. Thureau, Spotting L3 slice in CT scans using deep convolutional

network and transfer learning. To be submitted to Medical Image Analysis journal (MIA). 2016.

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 1/71

slide-3
SLIDE 3

images/logos Introduction

Quick-informal introduction to Machine Learning

What is Machine Learning (ML)? ML is programming computers (algorithms) to optimize a performance criterion using example data or past experience. Learning a task Learn general models from data to perform a specific task f. fw : x − → y x: input y: output (target, label) w: parameters of f f(x; w) = y From training to predicting the future: Learn to predict

1

Train the model using data examples (x, y)

2

Predict the ynew for the new coming xnew

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 2/71

slide-4
SLIDE 4

images/logos Introduction

Quick-informal introduction to Machine Learning

What is Machine Learning (ML)? ML is programming computers (algorithms) to optimize a performance criterion using example data or past experience. Learning a task Learn general models from data to perform a specific task f. fw : x − → y x: input y: output (target, label) w: parameters of f f(x; w) = y From training to predicting the future: Learn to predict

1

Train the model using data examples (x, y)

2

Predict the ynew for the new coming xnew

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 2/71

slide-5
SLIDE 5

images/logos Introduction

Quick-informal introduction to Machine Learning

What is Machine Learning (ML)? ML is programming computers (algorithms) to optimize a performance criterion using example data or past experience. Learning a task Learn general models from data to perform a specific task f. fw : x − → y x: input y: output (target, label) w: parameters of f f(x; w) = y From training to predicting the future: Learn to predict

1

Train the model using data examples (x, y)

2

Predict the ynew for the new coming xnew

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 2/71

slide-6
SLIDE 6

images/logos Introduction

Machine Learning applications

Face detection/recognition Image classification Handwriting recognition(postal address recognition, signature verification, writer verification, historical document analysis (DocExplore http://www.docexplore.eu)) Speech recognition, Voice synthesizing Natural language processing (sentiment/intent analysis, statistical machine translation, Question answering (Watson), Text understanding/summarizing, text generation) Anti-virus, anti-spam Weather forecast Fraud detection at banks Mail targeting/advertising Pricing insurance premiums Predicting house prices in real estate companies Win-tasting ratings Self-driving cars, Autonomous robots Factory Maintenance diagnostics Developing pharmaceutical drugs (combinatorial chemistry) Predicting tastes in music (Pandora) Predicting tastes in movies/shows (Netflix) Search engines (Google) Predicting interests (Facebook) Web exploring (sites like this one) Biometrics (finger prints, iris) Medical analysis (image segmentation, disease detection from symptoms) Advertisements/Recommendations engines, predicting other books/products you may like (Amazon) Computational neuroscience, bioinformatics/computational biology, genetics Content (image, video, text) categorization Suspicious activity detection Frequent pattern mining (super-market) Satellite/astronomical image analysis

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 3/71

slide-7
SLIDE 7

images/logos Introduction

ML in physics

Event detection at CERN (The European Organization for Nuclear Research) ⇒ Use ML models to determine the probability of the event being of interest. ⇒ Higgs Boson Machine Learning Challenge (https://www.kaggle.com/c/higgs-boson)

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 4/71

slide-8
SLIDE 8

images/logos Introduction

ML in quantum chemistry

Computing the electronic density of a molecule ⇒ Instead of using physics laws, use ML (FAST). See Stéphane Mallat et al. work: https://matthewhirn. files.wordpress.com/2016/01/hirn_pasc15.pdf

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 5/71

slide-9
SLIDE 9

images/logos Introduction

How to estimate fw?

Models Parametric (w) vs. non-parametric Estimate fw = train the model using data Training: supervised (use (x, y)) vs. unsupervised (use only x) Training = optimizing an objective cost Different models to learn fw Kernel models (support vector machine (SVM)) Decision tree Random forest Linear regression K-nearest neighbor Graphical models

Bayesian networks Hidden Markov Models (HMM) Conditional Random Fields (CRF)

Neural networks (Deep learning): DNN, CNN, RBM, DBN, RNN.

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 6/71

slide-10
SLIDE 10

images/logos Introduction

How to estimate fw?

Models Parametric (w) vs. non-parametric Estimate fw = train the model using data Training: supervised (use (x, y)) vs. unsupervised (use only x) Training = optimizing an objective cost Different models to learn fw Kernel models (support vector machine (SVM)) Decision tree Random forest Linear regression K-nearest neighbor Graphical models

Bayesian networks Hidden Markov Models (HMM) Conditional Random Fields (CRF)

Neural networks (Deep learning): DNN, CNN, RBM, DBN, RNN.

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 6/71

slide-11
SLIDE 11

images/logos Introduction

Optimization using Stochastic Gradient Descent (SGD)

wt ← wt−1 − ∂J (D;w)

∂w

. D is a set of data.

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 7/71

slide-12
SLIDE 12

images/logos Introduction

Optimization using Stochastic Gradient Descent (SGD)

wt ← wt−1 − ∂J (D;w)

∂w

.

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 8/71

slide-13
SLIDE 13

images/logos Deep multi-task learning with evolving weights

My PhD work

1

  • S. Belharbi, R.Hérault, C. Chatelain, S. Adam, Deep multi-task learning

with evolving weights, in conference: European Symposium on Artificial Neural Networks (ESANN), 2016

2

  • S. Belharbi, C. Chatelain, R.Hérault, S. Adam, A regularization scheme

for structured output problems: an application to facial landmark de- tection. 2016. submitted to Pattern Recognition journal (RP). ArXiv: arxiv.org/abs/1504.07550

3

  • S. Belharbi, R.Hérault, C. Chatelain, R. Modzelewski, S. Adam, M. Chastan,
  • S. Thureau, Spotting L3 slice in CT scans using deep convolutional

network and transfer learning. To be submitted to Medical Analysis journal (MIA). 2016.

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 9/71

slide-14
SLIDE 14

images/logos Deep multi-task learning with evolving weights

Deep learning Today

Deep learning state of the art What is new today? Large data Calculation power (GPUS, clouds) ⇒ optimization Dropout Momentum, AdaDelta, AdaGrad, RMSProp, Adam, Adamax Maxout, Local response normalization, local contrast normalization, batch normalization RELU CNN, RBM, RNN

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 10/71

slide-15
SLIDE 15

images/logos Deep multi-task learning with evolving weights

Deep neural networks (DNN)

x1 x2 x3 x4 x5 x6 ˆ y1 ˆ y2 Feed-forward neural network Back-propagation error Training deep neural networks is difficult ⇒ Vanishing gradient ⇒ Pre-training technique [Y.Bengio et al. 06, G.E.Hinton et al. 06] ⇒ More parameters ⇒ Need more data ⇒ Use unlabeled data

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 11/71

slide-16
SLIDE 16

images/logos Deep multi-task learning with evolving weights

Deep neural networks (DNN)

x1 x2 x3 x4 x5 x6 ˆ y1 ˆ y2 Feed-forward neural network Back-propagation error Training deep neural networks is difficult ⇒ Vanishing gradient ⇒ Pre-training technique [Y.Bengio et al. 06, G.E.Hinton et al. 06] ⇒ More parameters ⇒ Need more data ⇒ Use unlabeled data

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 11/71

slide-17
SLIDE 17

images/logos Deep multi-task learning with evolving weights

Semi-supervised learning

General case: Data = { labeled data (x, y)

  • expensive (money, time), few

, unlabeled data (x, −−)

  • cheap, abundant

} E.g: Collect images from the internet Medical images ⇒ semi-supervised learning: Exploit unlabeled data to improve the generalization

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 12/71

slide-18
SLIDE 18

images/logos Deep multi-task learning with evolving weights

Semi-supervised learning

General case: Data = { labeled data (x, y)

  • expensive (money, time), few

, unlabeled data (x, −−)

  • cheap, abundant

} E.g: Collect images from the internet Medical images ⇒ semi-supervised learning: Exploit unlabeled data to improve the generalization

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 12/71

slide-19
SLIDE 19

images/logos Deep multi-task learning with evolving weights

Pre-training and semi-supervised learning

The pre-training technique can exploit the unlabeled data A sequential transfer learning performed in 2 steps:

1

Unsupervised task (x labeled and unlabeled data)

2

Supervised task ( (x, y) labeled data)

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 13/71

slide-20
SLIDE 20

images/logos Deep multi-task learning with evolving weights

Layer-wise pre-training: auto-encoders

x1 x2 x3 x4 x5 x6 ˆ y1 ˆ y2 A DNN to train

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 14/71

slide-21
SLIDE 21

images/logos Deep multi-task learning with evolving weights

Layer-wise pre-training: auto-encoders

x1 x2 x3 x4 x5 x6 ˆ x1 ˆ x2 ˆ x3 ˆ x4 ˆ x5 ˆ x6

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 15/71

1) Step 1: Unsupervised layer-wise pre-training

Train layer by layer sequentially using only x (labeled or unlabeled)

slide-22
SLIDE 22

images/logos Deep multi-task learning with evolving weights

Layer-wise pre-training: auto-encoders

x1 x2 x3 x4 x5 x6 h1,1 h1,2 h1,3 h1,4 h1,5

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 15/71

1) Step 1: Unsupervised layer-wise pre-training

Train layer by layer sequentially using only x (labeled or unlabeled)

slide-23
SLIDE 23

images/logos Deep multi-task learning with evolving weights

Layer-wise pre-training: auto-encoders

x1 x2 x3 x4 x5 x6 h1,1 h1,2 h1,3 h1,4 h1,5 ˆ h1,1 ˆ h1,2 ˆ h1,3 ˆ h1,4 ˆ h1,5

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 15/71

1) Step 1: Unsupervised layer-wise pre-training

Train layer by layer sequentially using only x (labeled or unlabeled)

slide-24
SLIDE 24

images/logos Deep multi-task learning with evolving weights

Layer-wise pre-training: auto-encoders

x1 x2 x3 x4 x5 x6 h2,1 h2,2 h2,3

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 15/71

1) Step 1: Unsupervised layer-wise pre-training

Train layer by layer sequentially using only x (labeled or unlabeled)

slide-25
SLIDE 25

images/logos Deep multi-task learning with evolving weights

Layer-wise pre-training: auto-encoders

x1 x2 x3 x4 x5 x6 h2,1 h2,2 h2,3 ˆ h2,1 ˆ h2,2 ˆ h2,3

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 15/71

1) Step 1: Unsupervised layer-wise pre-training

Train layer by layer sequentially using only x (labeled or unlabeled)

slide-26
SLIDE 26

images/logos Deep multi-task learning with evolving weights

Layer-wise pre-training: auto-encoders

x1 x2 x3 x4 x5 x6 h3,1 h3,2 h3,3

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 15/71

1) Step 1: Unsupervised layer-wise pre-training

Train layer by layer sequentially using only x (labeled or unlabeled)

slide-27
SLIDE 27

images/logos Deep multi-task learning with evolving weights

Layer-wise pre-training: auto-encoders

x1 x2 x3 x4 x5 x6

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 15/71

1) Step 1: Unsupervised layer-wise pre-training

Train layer by layer sequentially using only x (labeled or unlabeled)

At each layer: ⇒ What hyper-parameters to use? When to stop training? ⇒ How to make sure that the pre-training improves the supervised task?

slide-28
SLIDE 28

images/logos Deep multi-task learning with evolving weights

Layer-wise pre-training: auto-encoders

x1 x2 x3 x4 x5 x6 ˆ y1 ˆ y2

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 16/71

2) Step 2: Supervised training

Train the whole network using (x, y)

Back-propagation

slide-29
SLIDE 29

images/logos Deep multi-task learning with evolving weights

Pre-training technique: Pros and cons

Pros Improve generalization Can exploit unlabeled data Provide better initialization than random Train deep networks ⇒ Circumvent the vanishing gradient problem Cons Add more hyper-parameters No good stopping criterion during pre-training phase Good criterion for the unsupervised task But May not be good for the supervised task

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 17/71

slide-30
SLIDE 30

images/logos Deep multi-task learning with evolving weights

Pre-training technique: Pros and cons

Pros Improve generalization Can exploit unlabeled data Provide better initialization than random Train deep networks ⇒ Circumvent the vanishing gradient problem Cons Add more hyper-parameters No good stopping criterion during pre-training phase Good criterion for the unsupervised task But May not be good for the supervised task

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 17/71

slide-31
SLIDE 31

images/logos Deep multi-task learning with evolving weights

Proposed solution

Why is it difficult in practice? ⇒ Sequential transfer learning Possible solution: ⇒ Parallel transfer learning Why in parallel? Interaction between tasks Reduce the number of hyper-parameters to tune Provide one stopping criterion

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 18/71

slide-32
SLIDE 32

images/logos Deep multi-task learning with evolving weights

Proposed solution

Why is it difficult in practice? ⇒ Sequential transfer learning Possible solution: ⇒ Parallel transfer learning Why in parallel? Interaction between tasks Reduce the number of hyper-parameters to tune Provide one stopping criterion

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 18/71

slide-33
SLIDE 33

images/logos Deep multi-task learning with evolving weights

Proposed solution

Why is it difficult in practice? ⇒ Sequential transfer learning Possible solution: ⇒ Parallel transfer learning Why in parallel? Interaction between tasks Reduce the number of hyper-parameters to tune Provide one stopping criterion

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 18/71

slide-34
SLIDE 34

images/logos Deep multi-task learning with evolving weights

Parallel transfer learning: Tasks combination

Train cost = supervised task + unsupervised task

  • reconstruction

l labeled samples, u unlabeled samples, wsh: shared parameters.

Reconstruction (auto-encoder) task: Jr(D; w′ = {wsh, wr}) =

l+u

  • i=1

Cr(R(xi; w′), xi) . Supervised task: Js(D; w = {wsh, ws}) =

l

  • i=1

Cs(M(xi; w), yi) . Weighted tasks combination J (D; {wsh, ws, wr}) = λs · Js(D; {wsh, ws}) + λr · Jr(D; {wsh, wr}) .

λs, λr ∈ [0, 1]: importance weight, λs + λr = 1.

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 19/71

slide-35
SLIDE 35

images/logos Deep multi-task learning with evolving weights

Parallel transfer learning: Tasks combination

Train cost = supervised task + unsupervised task

  • reconstruction

l labeled samples, u unlabeled samples, wsh: shared parameters.

Reconstruction (auto-encoder) task: Jr(D; w′ = {wsh, wr}) =

l+u

  • i=1

Cr(R(xi; w′), xi) . Supervised task: Js(D; w = {wsh, ws}) =

l

  • i=1

Cs(M(xi; w), yi) . Weighted tasks combination J (D; {wsh, ws, wr}) = λs · Js(D; {wsh, ws}) + λr · Jr(D; {wsh, wr}) .

λs, λr ∈ [0, 1]: importance weight, λs + λr = 1.

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 19/71

slide-36
SLIDE 36

images/logos Deep multi-task learning with evolving weights

Parallel transfer learning: Tasks combination

Train cost = supervised task + unsupervised task

  • reconstruction

l labeled samples, u unlabeled samples, wsh: shared parameters.

Reconstruction (auto-encoder) task: Jr(D; w′ = {wsh, wr}) =

l+u

  • i=1

Cr(R(xi; w′), xi) . Supervised task: Js(D; w = {wsh, ws}) =

l

  • i=1

Cs(M(xi; w), yi) . Weighted tasks combination J (D; {wsh, ws, wr}) = λs · Js(D; {wsh, ws}) + λr · Jr(D; {wsh, wr}) .

λs, λr ∈ [0, 1]: importance weight, λs + λr = 1.

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 19/71

slide-37
SLIDE 37

images/logos Deep multi-task learning with evolving weights

Parallel transfer learning: Tasks combination

Train cost = supervised task + unsupervised task

  • reconstruction

l labeled samples, u unlabeled samples, wsh: shared parameters.

Reconstruction (auto-encoder) task: Jr(D; w′ = {wsh, wr}) =

l+u

  • i=1

Cr(R(xi; w′), xi) . Supervised task: Js(D; w = {wsh, ws}) =

l

  • i=1

Cs(M(xi; w), yi) . Weighted tasks combination J (D; {wsh, ws, wr}) = λs · Js(D; {wsh, ws}) + λr · Jr(D; {wsh, wr}) .

λs, λr ∈ [0, 1]: importance weight, λs + λr = 1.

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 19/71

slide-38
SLIDE 38

images/logos Deep multi-task learning with evolving weights

Tasks combination with evolving weights

Weighted tasks combination: J (D; {wsh, ws, wr}) = λs · Js(D; {wsh, ws}) + λr · Jr(D; {wsh, wr}) .

λs, λr ∈ [0, 1]: importance weight, λs + λr = 1.

Problem How to fix λs, λr? Intuition At the end of the training, only Js should matters Tasks combination with evolving weights (our contribution) J (D; {wsh, ws, wr}) = λs(t) · Js(D; {wsh, ws}) + λr(t) · Jr(D; {wsh, wr}) .

t: learning epochs, λs(t), λr (t) ∈ [0, 1]: importance weight, λs(t) + λr (t) = 1.

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 20/71

slide-39
SLIDE 39

images/logos Deep multi-task learning with evolving weights

Tasks combination with evolving weights

Weighted tasks combination: J (D; {wsh, ws, wr}) = λs · Js(D; {wsh, ws}) + λr · Jr(D; {wsh, wr}) .

λs, λr ∈ [0, 1]: importance weight, λs + λr = 1.

Problem How to fix λs, λr? Intuition At the end of the training, only Js should matters Tasks combination with evolving weights (our contribution) J (D; {wsh, ws, wr}) = λs(t) · Js(D; {wsh, ws}) + λr(t) · Jr(D; {wsh, wr}) .

t: learning epochs, λs(t), λr (t) ∈ [0, 1]: importance weight, λs(t) + λr (t) = 1.

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 20/71

slide-40
SLIDE 40

images/logos Deep multi-task learning with evolving weights

Tasks combination with evolving weights

Weighted tasks combination: J (D; {wsh, ws, wr}) = λs · Js(D; {wsh, ws}) + λr · Jr(D; {wsh, wr}) .

λs, λr ∈ [0, 1]: importance weight, λs + λr = 1.

Problem How to fix λs, λr? Intuition At the end of the training, only Js should matters Tasks combination with evolving weights (our contribution) J (D; {wsh, ws, wr}) = λs(t) · Js(D; {wsh, ws}) + λr(t) · Jr(D; {wsh, wr}) .

t: learning epochs, λs(t), λr (t) ∈ [0, 1]: importance weight, λs(t) + λr (t) = 1.

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 20/71

slide-41
SLIDE 41

images/logos Deep multi-task learning with evolving weights

Tasks combination with evolving weights

J (D; {wsh, ws, wr}) = λs(t)·Js(D; {wsh, ws})+λr(t)·Jr(D; {wsh, wr}) .

start 0.2 0.4 0.6 0.8 1

Exponential schedule

Importance weights

t: Train epochs λr(t) λs(t)

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 21/71

  • λr(t) = exp( −t

σ )

, σ : slope λs(t) = 1 − λr(t)

slide-42
SLIDE 42

images/logos Deep multi-task learning with evolving weights

Tasks combination with evolving weights: Optimization

Tasks combination with evolving weights (our contribution) J (D; {wsh, ws, wr}) = λs(t) · Js(D; {wsh, ws}) + λr(t) · Jr(D; {wsh, wr}) .

t: learning epochs, λs(t), λr (t) ∈ [0, 1]: importance weight, λs(t) + λr (t) = 1.

Algorithm 1 Training our model for one epoch

1: D is the shuffled training set. B a mini-batch. 2: for B in D do 3:

Make a gradient step toward Jr using B (update w′)

4:

Bs ⇐ labeled examples of B,

5:

Make a gradient step toward Js using Bs (update w)

6: end for

[R.Caruana 97, J.Weston 08, R.Collobert 08, Z.Zhang 15]

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 22/71

slide-43
SLIDE 43

images/logos Deep multi-task learning with evolving weights

Experimental protocol

Objective: Compare Training DNN using different approaches: No pre-training (base-line) With pre-training (Stairs schedule) Parallel transfer learning (proposed approach) Studied evolving weights schedules:

1

Stairs (Pre-training) Linear

start t1 1

Linear until t1

start

Exponential

Importance weights

t: Train epochs λr λs

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 23/71

slide-44
SLIDE 44

images/logos Deep multi-task learning with evolving weights

Experimental protocol

Task: Classification (MNIST) Number of hidden layers K: 1, 2, 3, 4. Optimization:

Epochs: 5000 Batch size: 600 Options: No regularization, No adaptive learning rate

Hyper-parameters of the evolving schedules:

t1: 100 σ: 40

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 24/71

slide-45
SLIDE 45

images/logos Deep multi-task learning with evolving weights

Shallow networks: (K = 1, l = 1E2)

1E+03 2E+03 5E+03 1E+04 2E+04 4E+04 49900

Size of unlabeled data (u)

28.0 28.5 29.0 29.5 30.0 30.5 31.0 31.5 32.0 32.5

Calssification error MNIST test (%)

Evaluation of the eloving weight schedules (size of labeled data l = 100), K = 1

baseline stairs100 lin100 lin exp40

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 25/71

slide-46
SLIDE 46

images/logos Deep multi-task learning with evolving weights

Shallow networks: (K = 1, l = 1E3)

1E+03 2E+03 5E+03 1E+04 2E+04 4E+04 49900

Size of unlabeled data (u)

11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5

Calssification error MNIST test (%)

Evaluation of the eloving weight schedules (size of labeled data l = 1000), K = 1

baseline stairs100 lin100 lin exp40

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 26/71

slide-47
SLIDE 47

images/logos Deep multi-task learning with evolving weights

Deep networks: exponential schedule (l = 1E3)

1E+03 2E+03 5E+03 1E+04 2E+04 4E+04 49900

Size of unlabeled data (u)

10.0 10.5 11.0 11.5 12.0 12.5 13.0

Calssification error MNIST test (%)

Evaluation of the exp40 eloving weight schedule (size of labeled data l = 1000)

K = 2 K = 3 K = 4

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 27/71

slide-48
SLIDE 48

images/logos Deep multi-task learning with evolving weights

Conclusion

An alternative method to the pre-training. Parallel transfer learning with evolving weights Improve generalization easily. Reduce the number of hyper-parameters (t1, σ)

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 28/71

slide-49
SLIDE 49

images/logos Deep multi-task learning with evolving weights

Perspectives

Optimization Extension to structured output problems Train cost = supervised task + Input unsupervised task + Output unsupervised task

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 29/71

slide-50
SLIDE 50

images/logos A regularization scheme for structured output problems

My PhD work

1

  • S. Belharbi, R.Hérault, C. Chatelain, S. Adam, Deep multi-task learning

with evolving weights, in conference: European Symposium on Artificial Neural Networks (ESANN), 2016

2

  • S. Belharbi, C. Chatelain, R.Hérault, S. Adam, A regularization scheme

for structured output problems: an application to facial landmark

  • detection. 2016. submitted to Pattern Recognition journal (RP). ArXiv:

arxiv.org/abs/1504.07550

3

  • S. Belharbi, R.Hérault, C. Chatelain, R. Modzelewski, S. Adam, M. Chastan,
  • S. Thureau, Spotting L3 slice in CT scans using deep convolutional

network and transfer learning. To be submitted to Medical Analysis journal. 2016.

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 30/71

slide-51
SLIDE 51

images/logos A regularization scheme for structured output problems

Traditional Machine Learning Problems f : X → y Inputs X ∈ Rd: any type of input Outputs y ∈ R for the task: classification, regression, . . . Machine Learning for Structured Output Problems f : X → Y Inputs X ∈ Rd: any type of input Outputs Y ∈ Rd′, d′ > 1 a structured object (dependencies)

See C. Lampert slides. LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 31/71

slide-52
SLIDE 52

images/logos A regularization scheme for structured output problems

Traditional Machine Learning Problems f : X → y Inputs X ∈ Rd: any type of input Outputs y ∈ R for the task: classification, regression, . . . Machine Learning for Structured Output Problems f : X → Y Inputs X ∈ Rd: any type of input Outputs Y ∈ Rd′, d′ > 1 a structured object (dependencies)

See C. Lampert slides. LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 31/71

slide-53
SLIDE 53

images/logos A regularization scheme for structured output problems

Data = representation (values) + structure (dependencies)

Text: part-of-speech tagging, translation speech ⇄ text Protein folding Image Structured data

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 32/71

slide-54
SLIDE 54

images/logos A regularization scheme for structured output problems

Approaches that Deal with Structured Output Data

◮ Kernel based methods: Kernel Density Estimation (KDE) ◮ Discriminative methods: Structure output SVM ◮ Graphical methods: HMM, CRF

, MRF , . . .

Drawbacks Perform one single data transformation Difficult to deal with high dimensional data Ideal approach

◮ Structured output problems ◮ High dimension data ◮ Multiple data transformation (complex mapping functions)

Deep neural networks?

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 33/71

slide-55
SLIDE 55

images/logos A regularization scheme for structured output problems

Approaches that Deal with Structured Output Data

◮ Kernel based methods: Kernel Density Estimation (KDE) ◮ Discriminative methods: Structure output SVM ◮ Graphical methods: HMM, CRF

, MRF , . . .

Drawbacks Perform one single data transformation Difficult to deal with high dimensional data Ideal approach

◮ Structured output problems ◮ High dimension data ◮ Multiple data transformation (complex mapping functions)

Deep neural networks?

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 33/71

slide-56
SLIDE 56

images/logos A regularization scheme for structured output problems

x1 x2 x3 x4 x5 x6 x7

Input layer Hidden layer 1 Hidden layer 2 Hidden layer 3 Hidden layer 4

y1 y2 y3

Output layer

◮ High dimension data OK ◮ Multiple data transformation (complex mapping functions) OK ◮ Structured output problems NO

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 34/71

Traditional Deep neural Network

car bus bike

slide-57
SLIDE 57

images/logos A regularization scheme for structured output problems

x1 x2 x3 x4 x5 x6 x7

Input layer Hidden layer 1 Hidden layer 2 Hidden layer 3 Hidden layer 4

y1 y2 y3 y4 y5 y6 y7

Output layer

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 35/71

High dimensional output:

Structured object

slide-58
SLIDE 58

images/logos A regularization scheme for structured output problems

Proposed framework

x ~ x ^ x y ~ y ^ y

Pin(., wcin) P′

i n(., w d i n)

Rin(.; win)

Pout(., wcout)

Rout(.; wout)

P′

  • u

t(., w d

  • u

t)

m(., ws)

M(.; wsup)

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 36/71

slide-59
SLIDE 59

images/logos A regularization scheme for structured output problems

Proposed framework

F: all the x, L: all the y, S: all supervised data Input task

ˆ x = Rin (x; win) = P′

in (˜

x = Pin (x; wcin) ; wdin) , Jin(F; win) = 1 card F

  • x∈F

Cin(Rin(x; win), x) .

Output task

ˆ y = Rout (y; wout) = P′

  • ut (˜

y = Pout (y; wcout) ; wdout) , Jout(L; wout) = 1 card L

  • y∈L

Cout(Rout(y; wout), y) .

Main task ˆ y = M (x; wsup) = P′

  • ut (m (Pin (x; wcin) ; ws) ; wdout) ,

Js(S; wsup) = 1 card S

  • (x,y)∈S

Cs(M(x; wsup), y) .

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 37/71

slide-60
SLIDE 60

images/logos A regularization scheme for structured output problems

Tasks combination

J (D; w) = λsup(t)·Js(S; wsup)+λin(t)·Jin(F; win)+λout(t)·Jout(L; wout) ,

200 400 600 800 1000

training epochs

0.0 0.2 0.4 0.6 0.8 1.0 1.2

importance weight value Importance weights evolution throughout training epochs

λsup λin,1 λout,4

Figure 5: Linear evolution of the importance weights during training.

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 38/71

slide-61
SLIDE 61

images/logos A regularization scheme for structured output problems

Framework training

Algorithm 2 Training our framework for one epoch

1: D is the shuffled training set. B a mini-batch. 2: for B in D do 3:

BS ⇐ examples of B that contain both (x, y)

4:

BF ⇐ all the x samples of B

5:

BL ⇐ all the y samples of B

6:

Update win: → Make a gradient step toward Jin using BF

7:

Update wout: → Make a gradient step toward Jout using BL

8:

Update wsup: → Make a gradient step toward Js using BS

9:

Update λsup, λin and λout

10: end for

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 39/71

slide-62
SLIDE 62

images/logos A regularization scheme for structured output problems

Framework evaluation

Task: Facial landmark detection. Localize 68 points (x,y).

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 40/71

slide-63
SLIDE 63

images/logos A regularization scheme for structured output problems

Experiments: setup

Datasets: LFPW (1035 images), HELEN (2330 images) Architecture: MLP with 4 hidden layers: 1025, 2500, 136, 64. In: 50x50. Output: 68x2 Data augmentation, no data augmentation

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 41/71

slide-64
SLIDE 64

images/logos A regularization scheme for structured output problems

Experiments: Results (No data augmentation)

200 400 600 800 1000 epochs 0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 MSE

Error over train set (MSE) (HELEN)

Error over train set (MSE) (HELEN): MLP Error over train set (MSE) (HELEN): MLP + in1 Error over train set (MSE) (HELEN): MLP + out Error over train set (MSE) (HELEN): MLP + in1 + out

Figure 7: MSE during training epochs over HELEN train set using different training setups for the MLP (no augmentation).

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 42/71

slide-65
SLIDE 65

images/logos A regularization scheme for structured output problems

Experiments: Results (No data augmentation)

200 400 600 800 1000 epochs 0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 MSE

Error over valid set (MSE) (HELEN)

Error over valid set (MSE) (HELEN): MLP Error over valid set (MSE) (HELEN): MLP + in1 Error over valid set (MSE) (HELEN): MLP + out Error over valid set (MSE) (HELEN): MLP + in1 + out

Figure 8: MSE during training epochs over HELEN valid set using different training setups for the MLP (no augmentation).

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 43/71

slide-66
SLIDE 66

images/logos A regularization scheme for structured output problems

Experiments: Results (No data augmentation)

0.01 0.02 0.05 0.07 0.09 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 NRMSE 0.10 0.20 0.30 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 Data proportion

CDF NRMSE: mean shape, CDF(0.1)=30.804%, AUC=68.787% CDF NRMSE: MLP, CDF(0.1)=46.875%, AUC=76.346% CDF NRMSE: MLP + in1, CDF(0.1)=54.464%, AUC=77.131% CDF NRMSE: MLP + out, CDF(0.1)=66.518%, AUC=80.939% CDF NRMSE: MLP + in1 + out, CDF(0.1)=69.643%, AUC=81.514%

Cumulative distribution function (CDF) of NRMSE over LFPW test set.

Figure 9: CDF curves of different configurations on LFPW.

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 44/71

slide-67
SLIDE 67

images/logos A regularization scheme for structured output problems

Experiments: Results (No data augmentation)

0.01 0.02 0.05 0.07 0.09 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 NRMSE 0.10 0.20 0.30 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 Data proportion

CDF NRMSE: mean shape, CDF(0.1)=23.636%, AUC=64.609% CDF NRMSE: MLP, CDF(0.1)=52.727%, AUC=76.261% CDF NRMSE: MLP + in1, CDF(0.1)=54.848%, AUC=77.082% CDF NRMSE: MLP + out, CDF(0.1)=66.061%, AUC=79.633% CDF NRMSE: MLP + in1 + out, CDF(0.1)=66.667%, AUC=80.408%

Cumulative distribution function (CDF) of NRMSE over HELEN test set.

Figure 10: CDF curves of different configurations on HELEN.

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 45/71

slide-68
SLIDE 68

images/logos A regularization scheme for structured output problems

Experiments: Results (With data augmentation)

Table 1: MSE over LFPW: train and valid sets, at the end of training with and without data augmentation.

No augmentation With augmentation MSE train MSE valid MSE train MSE valid Mean shape 7.74 × 10−3 8.07 × 10−3 7.78 × 10−3 8.14 × 10−3 MLP 3.96 × 10−3 4.28 × 10−3

  • MLP + in

3.64 × 10−3 3.80 × 10−3 1.44 × 10−3 2.62 × 10−3 MLP + out 2.31 × 10−3 2.99 × 10−3 1.51 × 10−3 2.79 × 10−3 MLP + in + out 2.12 × 10−3 2.56 × 10−3 1.10 × 10−3 2.23 × 10−3

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 46/71

slide-69
SLIDE 69

images/logos A regularization scheme for structured output problems

Experiments: Results (With data augmentation)

Table 2: AUC and CDF0.1 performance over LFPW test dataset with and without data augmentation.

No augmentation with augmentation AUC CDF0.1 AUC CDF0.1 Mean shape 68.78% 30.80% 77.81% 22.33% MLP 76.34% 46.87%

  • MLP + in

77.13% 54.46% 80.78% 67.85% MLP + out 80.93% 66.51% 81.77% 67.85% MLP + in + out 81.51% 69.64% 82.48% 71.87%

Table 3: AUC and CDF0.1 performance over HELEN test dataset with and without data augmentation.

No augmentation With augmentation AUC CDF0.1 AUC CDF0.1 Mean shape 64.60% 23.63% 64.76% 23.23% MLP 76.26% 52.72%

  • MLP + in

77.08% 54.84% 79.25% 63.33% MLP + out 79.63% 66.60% 80.48% 65.15% MLP + in + out 80.40% 66.66% 81.27% 71.51%

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 47/71

slide-70
SLIDE 70

images/logos A regularization scheme for structured output problems

Experiments: Visual results

Figure 11: Examples of prediction on LFPW test set. For visualizing errors, red segments have been drawn between ground truth and predicted landmark. Top row: MLP . Bottom row: MLP+in+out. (no data augmentation)

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 48/71

slide-71
SLIDE 71

images/logos A regularization scheme for structured output problems

Experiments: Visual results

Figure 12: Examples of prediction on HELEN test set. Top row: MLP . Bottom row: MLP+in+out. (no data augmentation)

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 49/71

slide-72
SLIDE 72

images/logos A regularization scheme for structured output problems

Conclusion

Generic regularization scheme for structured output problems based on transfer learning Exploit input/output unlabeled data Speedup convergence and improve generalization Code at github:

https://github.com/sbelharbi/structured-output-ae

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 50/71

slide-73
SLIDE 73

images/logos A regularization scheme for structured output problems

Perspectives

Evolve the importance weight according to the train/validation error. Explore other evolving schedules (toward automatic schedule)

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 51/71

slide-74
SLIDE 74

images/logos Spotting L3 slice in CT scans using convolutional network

My PhD work

1

  • S. Belharbi, R.Hérault, C. Chatelain, S. Adam, Deep multi-task learning

with evolving weights, in conference: European Symposium on Artificial Neural Networks (ESANN), 2016

2

  • S. Belharbi, C. Chatelain, R.Hérault, S. Adam, A regularization scheme

for structured output problems: an application to facial landmark de- tection. 2016. submitted to Pattern Recognition journal (PR). ArXiv: arxiv.org/abs/1504.07550

3

  • S. Belharbi, R.Hérault, C. Chatelain, R. Modzelewski, S. Adam, M. Chastan,
  • S. Thureau, Spotting L3 slice in CT scans using deep convolutional

network and transfer learning. To be submitted to Medical Image Analysis

  • journal. 2016.

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 52/71

slide-75
SLIDE 75

images/logos Spotting L3 slice in CT scans using convolutional network

The problem: L3 slice localization

L3 slice

Figure 13: Finding the L3 slice within a whole CT scan.

→ Over a dataset of 642 CT scans, we obtained an average localization error

  • f 1.82 slice (< 5mm).

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 53/71

slide-76
SLIDE 76

images/logos Spotting L3 slice in CT scans using convolutional network

The problem: L3 slice localization

Informal statement Given a CT scan of a part of a body, find the slice which corresponds to the L3 slice from thousands of slices. The L3 slice contains the 3rd lumbar vertebra. Difficulties Inter-patients variability. Visual similarity of the L3 slice. The need to use the context to localize the L3 slice. = ⇒ Machine Learning

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 54/71

slide-77
SLIDE 77

images/logos Spotting L3 slice in CT scans using convolutional network

Possible approaches

Classification (discrete value) Classify each slice for: “L3” or “Not L3”: Simple, No context, Sequence labeling Label all the slices (vertebrae): L1, L2, L3, . . . : Global analysis: context, Existing work with promising results, Requires labeling every slice, Regression (real value) Predict the height (position) of the L3 slice inside the CT scan: Global analysis: context, Requires labeling only the L3 slice position,

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 55/71

slide-78
SLIDE 78

images/logos Spotting L3 slice in CT scans using convolutional network

Possible approaches: Difficulties

Figure 14: Two slices from the same patient: a L3 (up) and a non L3 (L2) (down). The similar shapes of both vertebraes prevent from taking a robust decision given a single slice.

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 56/71

slide-79
SLIDE 79

images/logos Spotting L3 slice in CT scans using convolutional network

Regression for L3 detection

Which model? Deep learning, Convolutional neural network (CNN). No manual feature extraction. State of the art in vision. Requires fixed input size (when using dense layers). Some numbers . . . Input space: 1 scan = N × 512 × 512

  • Problem 1: large input space

, with 400 < N < 1200. Dataset with annotated L3 position: 642 patients

  • Problem 2: few data

. (L3CT1 dataset) Variability

  • Problem 3: Different input size
  • f the height of each scan.

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 57/71

slide-80
SLIDE 80

images/logos Spotting L3 slice in CT scans using convolutional network

Regression for L3 detection

Problem 1: Input dimension space 131M inputs for one example (large input dimension): = ⇒ Frontal or lateral Maximum Intensity Projection (MIP). 512 × 512 × N = ⇒ 512 × N. Conserves pertinent information (skeletal structure)

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 58/71

slide-81
SLIDE 81

images/logos Spotting L3 slice in CT scans using convolutional network

Regression for L3 detection

Problem 2: Few data (642 patients) [1] Train CNN from scratch → poor results. = ⇒ Use pre-trained CNNs over large datasets Alexnet, GoogleNet, VGG16, VGG19, . . . for classification Pre-trained models over ImageNet: 14 millions of natural images [Fei-Fei and Russakovsky 2013].

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 59/71

slide-82
SLIDE 82

images/logos Spotting L3 slice in CT scans using convolutional network

Regression for L3 detection

Problem 2: Few data (642 patients) [2] = ⇒ Transfer learning Exploit pre-trained filters over natural images, Next, refine them over L3 detection task.

Figure 15: System overview. Layers Ci are Convolutionnal layers, while FCi denote Full Connected layers. Convolution parameters of previously learnt ImageNet classifier are used as initial values of corresponding L3 regressor layers to overcome the lack of CT examples LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 60/71

slide-83
SLIDE 83

images/logos Spotting L3 slice in CT scans using convolutional network

Regression for L3 detection

Problem 3: Different input size Classical problem Use sliding window technique Use post-processing

Figure 16: Examples of normalized frontal MIP images with the L3 slice position. LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 61/71

slide-84
SLIDE 84

images/logos Spotting L3 slice in CT scans using convolutional network

Regression for L3 detection

Problem 3: Different input size Classical problem Use sliding window technique Use post-processing Figure 17: System overview describing the three important stage of our approach : MIP transformation, TL-CNN prediction, and post processing.

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 62/71

slide-85
SLIDE 85

images/logos Spotting L3 slice in CT scans using convolutional network

Regression for L3 detection

Problem 3: Different input size Classical problem Use sliding window technique Use post-processing: correlation −40 −20 20 40 100 200 300 400 500 600 700 800 Estimated relative L3 position (in pixel) Center position of the sliding window (in pixel)

550

−500 500 100 200 300 400 500 600 700 800 Correlation with slope Figure 18: [left]: CNN output sequence obtained using for H = 400 and a = 50 on a test CT scan. The sequence contains the typical straight line of slope −1 centered on the L3 (the theoretical line is plotted in green), surrounded by random values. [right]: correlation between the CNN output sequence and the theoretical. The maximum of correlation indicates the position of the L3.

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 63/71

slide-86
SLIDE 86

images/logos Spotting L3 slice in CT scans using convolutional network

Regression for L3 detection: Quantitative results

Cross-validation:

CNN4 Alexnet VGG16 VGG19 Googlenet fold 0 2.85 ± 2.37 2.21 ± 2.11 2.06 ± 4.39 1.89 ± 1.77 1.81 ± 1.74 fold 1 3.12 ± 2.90 2.44 ± 2.41 1.78 ± 2.09 1.96 ± 2.10 3.84 ± 12.86 fold 2 3.12 ± 3.20 2.47 ± 2.38 1.54 ± 1.54 1.65 ± 1.73 2.62 ± 2.52 fold 3 2.98 ± 2.38 2.42 ± 2.23 1.96 ± 1.62 1.76 ± 1.75 2.22 ± 1.79 fold 4 1.87 ± 1.58 2.69 ± 2.41 1.74 ± 1.96 1.90 ± 1.83 2.20 ± 2.20 Average 2.78 ± 2.48 2.45 ± 2.42 1.82 ± 2.32 1.83 ± 1.83 2.54 ± 4.22

Table 4: Error expressed in slice over all the folds using different models: CNN4 (Homemade model), and Alexnet/VGG16/VGG19/GoogleNet (Pre-trained models).

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 64/71

slide-87
SLIDE 87

images/logos Spotting L3 slice in CT scans using convolutional network

Regression for L3 detection: Qualitative results

Localization error: 0 coupes. LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 65/71

slide-88
SLIDE 88

images/logos Spotting L3 slice in CT scans using convolutional network

Regression for L3 detection: Qualitative results

Localization error: 6 coupes. LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 66/71

slide-89
SLIDE 89

images/logos Spotting L3 slice in CT scans using convolutional network

Regression for L3 detection: Evaluation time

Number of parameters Average processing time (seconds/CT scan) CNN4 55 K 04.46 Alexnet 2 M 06.37 VGG16 14 M 13.28 VGG19 20 M 16.02 GoogleNet 6 M 17.75 Table 5: Number of parameters vs. evaluation time over a GPU (K40).

Can be speedup more by increasing the window stride (without loosing in performance). VGG16: stride=1: ∼ 13 seconds/CT scan with a an error of 1.82 ± 2.32. stride=4: ∼ 02 seconds/CT scan with a an error of 1.91 ± 2.69.

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 67/71

slide-90
SLIDE 90

images/logos Spotting L3 slice in CT scans using convolutional network

Regression for L3 detection: CNN vs. Radiologists

Setup

1

New evaluation set: 43 CT scans annotated by the same reference radiologist (who annotated the L3CT1 dataset).

2

Ask 3 other radiologists to localize the L3 slice.

3

Perform this experiment twice.

Errors (slices) / operator CNN4 VGG16 Ragiologist #1 Radiologist #2 Radiologist #3 Review1 2.37 ± 2.30 1.70 ± 1.65 0.81 ± 0.97 0.72 ± 1.51 0.51 ± 0.62 Review2 2.53 ± 2.27 1.58 ± 1.83 0.77 ± 0.68 0.95 ± 1.61 0.86 ± 1.30

Table 6: Comparison of the performance of both the automatic systems and radiologists. The L3 annotations given by the reference radiologist vary between the two reviews.

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 68/71

slide-91
SLIDE 91

images/logos Spotting L3 slice in CT scans using convolutional network

Regression for L3 detection: Conclusion

Interesting results. Adapted pipeline: pre-processing, CNN, post-processing. Use of transfer learning alleviates the need of large training set. Generic framework: can be easily adapted for detecting

  • ther subjects given the required annotation.

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 69/71

slide-92
SLIDE 92

images/logos Questions

My PhD work

1

  • S. Belharbi, R.Hérault, C. Chatelain, S. Adam, Deep multi-task learning

with evolving weights, in conference: European Symposium on Artificial Neural Networks (ESANN), 2016

2

  • S. Belharbi, C. Chatelain, R.Hérault, S. Adam, A regularization scheme

for structured output problems: an application to facial landmark

  • detection. 2016. submitted to Pattern Recognition journal (PR). ArXiv:

arxiv.org/abs/1504.07550

3

  • S. Belharbi, R.Hérault, C. Chatelain, R. Modzelewski, S. Adam, M. Chastan,
  • S. Thureau, Spotting L3 slice in CT scans using deep convolutional

network and transfer learning. To be submitted to Medical Image Analysis journal (MIA). 2016.

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 70/71

slide-93
SLIDE 93

images/logos Questions

Questions

Thank you for your attention, Questions? soufiane.belharbi@insa-rouen.fr

LITIS lab., Apprentissage team - INSA de Rouen, France Deep learning 71/71