Unsupervised Learning and Inverse Problems with Deep Neural - PowerPoint PPT Presentation

Unsupervised Learning and Inverse Problems with Deep Neural Networks J oan Bruna, Stéphane Mallat, Ivan Dokmanic, Martin de Hoop École Normale Supérieure www.di.ens.fr/data

Deep Convolutional Networks ρ Φ ( x ) x ( u ) ρ L j L 1 f M ( x ) ρ ( u ) is a scalar non-linearity: max( u, 0) or | u | or ... Part III Inverse problems

Dimensionality Reduction Multiscale Interactions de d variables x ( u ): pixels, particules, agents... u 1 u 2

Deep Convolutional Trees ρ Φ ( x ) x ( u ) ρ L j y = ˜ f ( x ) L 1 L J Cascade of convolutions: no channel connections

Scale separation with Wavelets • Wavelet filter ψ ( u ): ψ 2 j , θ ( u ) = 2 − j ψ (2 − j r θ u ) rotated and dilated: ω 2 | ˆ ψ λ ( ω ) | 2 real parts imaginary parts ω 1 Z x ? 2 j , θ ( u ) = x ( v ) 2 j , θ ( u − v ) dv ✓ x ? � 2 J ( u ) ◆ : average • Wavelet transform: Wx = x ? 2 j , θ ( u ) : higher j ≤ J, θ frequencies Preserves norm: � Wx � 2 = � x � 2 .

Fast Wavelet Filter Bank 2 0 | W 1 | 2 1 | x ? 2 1 , θ | 2 J Scale

Wavelet Filter Bank x ( u ) 2 0 ρ ( α ) = | α | | W 1 | | W 1 | 2 1 | x ? 2 1 , θ | 2 2 | x ? 2 2 , θ | | x ? 2 j , θ | 2 J Scale

Wavelet Scattering Network 2 0 x ρ L 1 2 1 ρ L 2 | x ? λ 1 | ρ L J 2 J || x ? λ 1 | ? λ 2 | ? � J x ? � J Scale ρ W 1 ρ W 2 ... ρ W J S J = n o ||| x ? λ 1 | ? λ 2 ? ... | ? λ m | ? � J S J x = ρ ( α ) = | α | λ k Interactions across scales

Scattering Properties   x ? � 2 J | x ? λ 1 | ? � 2 J   = . . . | W 3 | | W 2 | | W 1 | x   || x ? λ 1 | ? λ 2 | ? � 2 J S J x =     ||| x ? λ 2 | ? λ 2 | ? λ 3 | ? � 2 J   ... λ 1 , λ 2 , λ 3 ,... k | W k x | � | W k x 0 | k  k x � x 0 k k W k x k = k x k ) Lemma : k [ W k , D τ ] k = k W k D τ � D τ W k k  C kr τ k ∞ Theorem : For appropriate wavelets, a scattering is contractive k S J x � S J y k  k x � y k ( L 2 stability ) preserves norms k S J x k = k x k translations invariance and deformation stability: if D τ x ( u ) = x ( u − τ ( u )) then J →∞ k S J D τ x � S J x k  C kr τ k ∞ k x k lim

Digit Classification: MNIST Joan Bruna Supervised y = f ( x ) S J x x Linear classifier Invariants to specific deformations Invariants to translations Separates di ff erent patterns Linearises small deformations No learning Classification Errors Training size Conv. Net. Scattering 50000 0 . 4% 0 . 4% LeCun et. al.

Part II- Unsupervised Learning Unsupervised learning: Approximate the probability distribution p ( x ) of X ∈ R d given P realisations { x i } i ≤ P with potentially P = 1

Stationary Processes   X ? � 2 J ( t ) | X ? λ 1 | ? � 2 J ( t )   : stationary vector   || X ? λ 1 | ? λ 2 | ? � 2 J ( t ) S J X =     ||| X ? λ 2 | ? λ 2 | ? λ 3 | ? � 2 J ( t )   ... λ 1 , λ 2 , λ 3 ,...

Ergodicity and Moments Central limit theorem with ”weak” ergodicity conditions   E ( X ) E ( | X ? λ 1 | )     E ( SX ) = E ( || X ? λ 1 | ? λ 2 | )     E ( ||| X ? λ 2 | ? λ 2 | ? λ 3 | )   ... λ 1 , λ 2 , λ 3 ,...

Generation of Random Processes • Reconstruction: compute ˜ X which satisfies with random initialisation and gradient descent.

Texture Reconstructions Joan Bruna Ising-critical Turbulence 2D

Representation of Audio Textures Joan Bruna Gaussian Original in time 60 ω 20 40 Applauds 60 t 20 Paper 40 60 Cocktail Party

Max Entropy Canonical Models • A representation Φ ( x ) = { φ k ( x ) } k ≤ K with x ∈ R d Z µ k = E ( φ k X ) = φ k ( x ) p ( x ) dx R maximum entropy: H ( p ) = − p ( x ) log p ( x ) dx p ( x ) = Z − 1 exp ⇣ ⌘ X θ k φ k ( x ) ⇒ − k

Ergodic Microcanonical Model R d

Uniform Distribution on Balls d | x ( k ) | 2 ⌘ 1 / 2 ⇣ X Φ x = d − 1 / 2 k x k 2 = • Sphere in R d d − 1 = µ k =1 d X • Simplex in R d Φ x = d − 1 k x k 1 = d − 1 | x ( k ) | = µ k =1

Scattering Representation • Scattering coe ffi cients of order 0, 1 and 2: n o d − 1 X x ( u ) , d − 1 k x ? λ 1 k 1 , d − 1 k | x ? λ 1 | ? λ 2 k 1 Φ x = u

Microcanonical Scattering R d

Scattering Approximations ˜ X µ = E ( Φ X ) X If

Ergodic Microcanonical Model R d

Singular Ergodic Processes

Scattering Ising

Stochastic Geometry: Cox Process

Non-Ergodic Mixture R d

Non-Ergodic Microcanonical Mixture R d

Scattering Multifractal Processes • Scattering coe ffi cients of order 0, 1 and 2:

Scat Ising at Critical Temperature

Failures of Audio Synthesis J. Anden and V. Lostanlen Time Scattering Original

Time-Frequency Translation Group J. Anden and V. Lostanlen Time-frequency wavelet convolutions t log λ t t t t || x ? λ | ? α ? β | ? � J | x ? λ | ? � J

Joint Time-Frequency Scattering J. Anden and V. Lostanlen Time Scattering Time/Freq Scattering Original

Part III- Supervised Learning x J ( u, k J ) x 2 ( u, k 2 ) x ( u ) x 1 ( u, k 1 ) ρ L J classification ρ L 1 k 1 k 2 x j = ρ L j x j − 1 • L j is a linear combination of convolutions and subsampling: ⇣ X ⌘ x j ( u, k j ) = ⇢ x j − 1 ( · , k ) ? h k j ,k ( u ) k sum across channels What is the role of channel connections ?

Environmental Sound Classification J. Anden and V. Lostanlen Supervised y = f ( x ) S J x x Linear classifier No learning UrbanSound8k: 10 classes air conditioner car horns 8k training examples class-wise average error MFCC audio descriptors 0,39 children playing dog barks time scattering 0,27 drilling engine at idle ConvNet   0,26 (Piczak, MLSP 2015) time-frequency scattering 0,2

Inverse Scattering Transform Joan Bruna • Given S J x we want to compute ˜ x such that:     x ? � 2 J x ? � 2 J ˜ | ˜ x ? λ 1 | ? � 2 J | x ? λ 1 | ? � 2 J = S J x     S J ˜ x = =     ... ...     ||| ˜ x ? λ 1 | ? .. | ? λ m | ? � 2 J ||| x ? λ 1 | ? .. | ? λ m | ? � 2 J λ 1 ,..., λ m λ 1 ,..., λ m We shall use m = 2. • If x ( u ) is a Dirac, or a straight edge or a sinusoid then ˜ x is equal to x up to a translation.

Sparse Shape Reconstruction Joan Bruna With a gradient descent algorithm: Original images of N 2 pixels: m = 1, 2 J = N : reconstruction from O (log 2 N ) scattering coe ff . m = 2, 2 J = N : reconstruction from O (log 2 2 N ) scattering coe ff .

Multiscale Scattering Reconstructions Original Images N 2 pixels Scattering Reconstruction 2 J = 16 1 . 4 N 2 coe ff . 2 J = 32 0 . 5 N 2 coe ff . 2 J = 64 2 J = 128 = N

III- Inverse Problems F x y • Best Linear Method: Least Squares estimate (linear interpolation): y = ( b x b Σ † ˆ Σ xy ) x

Super-Resolution F x y •Best Linear Method: Least Squares estimate (linear interpolation): y = ( b x b Σ † •State-of-the-art Methods: ˆ Σ xy ) x – Dictionary-learning Super-Resolution – CNN-based: Just train a CNN to regress from low-res to high- res. – They optimize cleverly a fundamentally unstable metric criterion: Θ ∗ = arg min k F ( x i , Θ ) � y i k 2 , ˆ X y = F ( x, Θ ∗ ) Θ i

Scattering Super-Resolution x y F S L − α ,J x S L,J x   x ? � 2 J ( u ) | x ? j 1 ,k 1 | ? � 2 J ( u ) S L,J x =   || x ? j 1 ,k 1 | ? j 2 ,k 2 | ? � 2 J ( u ) L ≤ j 1 ,j 2 ≤ J • Linear estimation in the scattering domain • No phase estimation: potentially worst PSNR • Good image quality because of deformation stability

  Super-Resolution Results J. Bruna, P. Sprechmann Linear Estimate Original state-of-the-art Scattering

Super-Resolution Results J. Bruna, P. Sprechmann Best   Scattering   Original state-of-the-art Linear Estimate Estimate

Super-Resolution Results I. Dokmanic, J. Bruna, M. De Hoop l 1 Regularization Original A TV Regularization Original Scattering Scattering Low-Resolution Low-Resolution

Tomography Results I. Dokmanic, J. Bruna, M. De Hoop B C TV Regularization Original Scattering Low-Resolution

Conclusions • Deep convolutional networks have spectacular high-dimensional and generic approximation capabilities. • New stochastic models of images for inverse problems. • Outstanding mathematical problem to understand deep nets: – How to learn representations for inverse problems ? Understanding Deep Convolutional Networks , arXiv 2016.

Unsupervised Learning and Inverse Problems with Deep Neural - PowerPoint PPT Presentation

Unsupervised Learning and Inverse Problems with Deep Neural Networks J oan Bruna, Stphane Mallat, Ivan Dokmanic, Martin de Hoop cole Normale Suprieure www.di.ens.fr/data Deep Convolutional Networks ( x ) x ( u ) L j L 1 f M (

Statistical Inverse Problems and abstract inverse problems examples Instrumental Variables

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Dynamic Inverse Problems: Schmitt Efficient Algorithms and Approximate Inverse Problems

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 Fuxin Li With materials from

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Course on Inverse Problems Albert Tarantola Lesson VI: a) General Formulation of the Inverse

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

1. Algorithms for Inverse Reinforcement Learning 2. Apprenticeship learning via Inverse

fi Finnish Centre of Excellence in Inverse Problems Research p. 1/28 1 Inverse problem in

Inverse Problems Recovering x 0 R N from noisy observations y = x 0 + w R P Inverse

Unsupervised Language Learning: Representation Learning for NLP Katia Shutova ILLC University

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct Learning

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Sparse random graphs with exchangeable point processes Fran cois Caron Department of

A tutorial in spatial statistics for microscopy data analysis Ed Cohen Department of Mathematics,

Zhangs inequality for log-concave functions B. Gonz alez Merino* (joint with D.

Combining information from different sources: A resampling based approach S.N. Lahiri Department

Case study of Cox regression: Regression approach to tire reliability analysis This paper focuses

Dynamic regression Rob Hyndman Author, forecast Forecasting Using R Dynamic regression

Week 1, Video 5 Case Study San Pedro Case Study of Classification With educational data

Malaysian Healthy Ageing Society Azlina Wati Nikmat, Assoc. Prof. Graeme Hawthorne ( 1 Department