unsupervised learning and inverse problems with deep
play

Unsupervised Learning and Inverse Problems with Deep Neural - PowerPoint PPT Presentation

Unsupervised Learning and Inverse Problems with Deep Neural Networks J oan Bruna, Stphane Mallat, Ivan Dokmanic, Martin de Hoop cole Normale Suprieure www.di.ens.fr/data Deep Convolutional Networks ( x ) x ( u ) L j L 1 f M (


  1. Unsupervised Learning and Inverse Problems with Deep Neural Networks J oan Bruna, Stéphane Mallat, Ivan Dokmanic, Martin de Hoop École Normale Supérieure www.di.ens.fr/data

  2. Deep Convolutional Networks ρ Φ ( x ) x ( u ) ρ L j L 1 f M ( x ) ρ ( u ) is a scalar non-linearity: max( u, 0) or | u | or ... Part III Inverse problems

  3. Dimensionality Reduction Multiscale Interactions de d variables x ( u ): pixels, particules, agents... u 1 u 2

  4. Deep Convolutional Trees ρ Φ ( x ) x ( u ) ρ L j y = ˜ f ( x ) L 1 L J Cascade of convolutions: no channel connections

  5. Scale separation with Wavelets • Wavelet filter ψ ( u ): ψ 2 j , θ ( u ) = 2 − j ψ (2 − j r θ u ) rotated and dilated: ω 2 | ˆ ψ λ ( ω ) | 2 real parts imaginary parts ω 1 Z x ? 2 j , θ ( u ) = x ( v ) 2 j , θ ( u − v ) dv ✓ x ? � 2 J ( u ) ◆ : average • Wavelet transform: Wx = x ? 2 j , θ ( u ) : higher j ≤ J, θ frequencies Preserves norm: � Wx � 2 = � x � 2 .

  6. Fast Wavelet Filter Bank 2 0 | W 1 | 2 1 | x ? 2 1 , θ | 2 J Scale

  7. Wavelet Filter Bank x ( u ) 2 0 ρ ( α ) = | α | | W 1 | | W 1 | 2 1 | x ? 2 1 , θ | 2 2 | x ? 2 2 , θ | | x ? 2 j , θ | 2 J Scale

  8. Wavelet Scattering Network 2 0 x ρ L 1 2 1 ρ L 2 | x ? λ 1 | ρ L J 2 J || x ? λ 1 | ? λ 2 | ? � J x ? � J Scale ρ W 1 ρ W 2 ... ρ W J S J = n o ||| x ? λ 1 | ? λ 2 ? ... | ? λ m | ? � J S J x = ρ ( α ) = | α | λ k Interactions across scales

  9. Scattering Properties   x ? � 2 J | x ? λ 1 | ? � 2 J   = . . . | W 3 | | W 2 | | W 1 | x   || x ? λ 1 | ? λ 2 | ? � 2 J S J x =     ||| x ? λ 2 | ? λ 2 | ? λ 3 | ? � 2 J   ... λ 1 , λ 2 , λ 3 ,... k | W k x | � | W k x 0 | k  k x � x 0 k k W k x k = k x k ) Lemma : k [ W k , D τ ] k = k W k D τ � D τ W k k  C kr τ k ∞ Theorem : For appropriate wavelets, a scattering is contractive k S J x � S J y k  k x � y k ( L 2 stability ) preserves norms k S J x k = k x k translations invariance and deformation stability: if D τ x ( u ) = x ( u − τ ( u )) then J →∞ k S J D τ x � S J x k  C kr τ k ∞ k x k lim

  10. Digit Classification: MNIST Joan Bruna Supervised y = f ( x ) S J x x Linear classifier Invariants to specific deformations Invariants to translations Separates di ff erent patterns Linearises small deformations No learning Classification Errors Training size Conv. Net. Scattering 50000 0 . 4% 0 . 4% LeCun et. al.

  11. Part II- Unsupervised Learning Unsupervised learning: Approximate the probability distribution p ( x ) of X ∈ R d given P realisations { x i } i ≤ P with potentially P = 1

  12. Stationary Processes   X ? � 2 J ( t ) | X ? λ 1 | ? � 2 J ( t )   : stationary vector   || X ? λ 1 | ? λ 2 | ? � 2 J ( t ) S J X =     ||| X ? λ 2 | ? λ 2 | ? λ 3 | ? � 2 J ( t )   ... λ 1 , λ 2 , λ 3 ,...

  13. Ergodicity and Moments Central limit theorem with ”weak” ergodicity conditions   E ( X ) E ( | X ? λ 1 | )     E ( SX ) = E ( || X ? λ 1 | ? λ 2 | )     E ( ||| X ? λ 2 | ? λ 2 | ? λ 3 | )   ... λ 1 , λ 2 , λ 3 ,...

  14. Generation of Random Processes • Reconstruction: compute ˜ X which satisfies with random initialisation and gradient descent.

  15. Texture Reconstructions Joan Bruna Ising-critical Turbulence 2D

  16. Representation of Audio Textures Joan Bruna Gaussian Original in time 60 ω 20 40 Applauds 60 t 20 Paper 40 60 Cocktail Party

  17. Max Entropy Canonical Models • A representation Φ ( x ) = { φ k ( x ) } k ≤ K with x ∈ R d Z µ k = E ( φ k X ) = φ k ( x ) p ( x ) dx R maximum entropy: H ( p ) = − p ( x ) log p ( x ) dx p ( x ) = Z − 1 exp ⇣ ⌘ X θ k φ k ( x ) ⇒ − k

  18. Ergodic Microcanonical Model R d

  19. Uniform Distribution on Balls d | x ( k ) | 2 ⌘ 1 / 2 ⇣ X Φ x = d − 1 / 2 k x k 2 = • Sphere in R d d − 1 = µ k =1 d X • Simplex in R d Φ x = d − 1 k x k 1 = d − 1 | x ( k ) | = µ k =1

  20. Scattering Representation • Scattering coe ffi cients of order 0, 1 and 2: n o d − 1 X x ( u ) , d − 1 k x ? λ 1 k 1 , d − 1 k | x ? λ 1 | ? λ 2 k 1 Φ x = u

  21. Microcanonical Scattering R d

  22. Scattering Approximations ˜ X µ = E ( Φ X ) X If

  23. Ergodic Microcanonical Model R d

  24. Singular Ergodic Processes

  25. Scattering Ising

  26. Stochastic Geometry: Cox Process

  27. Non-Ergodic Mixture R d

  28. Non-Ergodic Microcanonical Mixture R d

  29. Scattering Multifractal Processes • Scattering coe ffi cients of order 0, 1 and 2:

  30. Scat Ising at Critical Temperature

  31. Failures of Audio Synthesis J. Anden and V. Lostanlen Time Scattering Original

  32. Time-Frequency Translation Group J. Anden and V. Lostanlen Time-frequency wavelet convolutions t log λ t t t t || x ? λ | ? α ? β | ? � J | x ? λ | ? � J

  33. Joint Time-Frequency Scattering J. Anden and V. Lostanlen Time Scattering Time/Freq Scattering Original

  34. Part III- Supervised Learning x J ( u, k J ) x 2 ( u, k 2 ) x ( u ) x 1 ( u, k 1 ) ρ L J classification ρ L 1 k 1 k 2 x j = ρ L j x j − 1 • L j is a linear combination of convolutions and subsampling: ⇣ X ⌘ x j ( u, k j ) = ⇢ x j − 1 ( · , k ) ? h k j ,k ( u ) k sum across channels What is the role of channel connections ?

  35. Environmental Sound Classification J. Anden and V. Lostanlen Supervised y = f ( x ) S J x x Linear classifier No learning UrbanSound8k: 10 classes air conditioner car horns 8k training examples class-wise average error MFCC audio descriptors 0,39 children playing dog barks time scattering 0,27 drilling engine at idle ConvNet 
 0,26 (Piczak, MLSP 2015) time-frequency scattering 0,2

  36. Inverse Scattering Transform Joan Bruna • Given S J x we want to compute ˜ x such that:     x ? � 2 J x ? � 2 J ˜ | ˜ x ? λ 1 | ? � 2 J | x ? λ 1 | ? � 2 J = S J x     S J ˜ x = =     ... ...     ||| ˜ x ? λ 1 | ? .. | ? λ m | ? � 2 J ||| x ? λ 1 | ? .. | ? λ m | ? � 2 J λ 1 ,..., λ m λ 1 ,..., λ m We shall use m = 2. • If x ( u ) is a Dirac, or a straight edge or a sinusoid then ˜ x is equal to x up to a translation.

  37. Sparse Shape Reconstruction Joan Bruna With a gradient descent algorithm: Original images of N 2 pixels: m = 1, 2 J = N : reconstruction from O (log 2 N ) scattering coe ff . m = 2, 2 J = N : reconstruction from O (log 2 2 N ) scattering coe ff .

  38. Multiscale Scattering Reconstructions Original Images N 2 pixels Scattering Reconstruction 2 J = 16 1 . 4 N 2 coe ff . 2 J = 32 0 . 5 N 2 coe ff . 2 J = 64 2 J = 128 = N

  39. III- Inverse Problems F x y • Best Linear Method: Least Squares estimate (linear interpolation): y = ( b x b Σ † ˆ Σ xy ) x

  40. Super-Resolution F x y •Best Linear Method: Least Squares estimate (linear interpolation): y = ( b x b Σ † •State-of-the-art Methods: ˆ Σ xy ) x – Dictionary-learning Super-Resolution – CNN-based: Just train a CNN to regress from low-res to high- res. – They optimize cleverly a fundamentally unstable metric criterion: Θ ∗ = arg min k F ( x i , Θ ) � y i k 2 , ˆ X y = F ( x, Θ ∗ ) Θ i

  41. Scattering Super-Resolution x y F S L − α ,J x S L,J x   x ? � 2 J ( u ) | x ? j 1 ,k 1 | ? � 2 J ( u ) S L,J x =   || x ? j 1 ,k 1 | ? j 2 ,k 2 | ? � 2 J ( u ) L ≤ j 1 ,j 2 ≤ J • Linear estimation in the scattering domain • No phase estimation: potentially worst PSNR • Good image quality because of deformation stability

  42. 
 Super-Resolution Results J. Bruna, P. Sprechmann Linear Estimate Original state-of-the-art Scattering

  43. Super-Resolution Results J. Bruna, P. Sprechmann Best 
 Scattering 
 Original state-of-the-art Linear Estimate Estimate

  44. Super-Resolution Results J. Bruna, P. Sprechmann Best 
 Scattering 
 Original state-of-the-art Linear Estimate Estimate

  45. Super-Resolution Results I. Dokmanic, J. Bruna, M. De Hoop l 1 Regularization Original A TV Regularization Original Scattering Scattering Low-Resolution Low-Resolution

  46. Tomography Results I. Dokmanic, J. Bruna, M. De Hoop B C TV Regularization Original Scattering Low-Resolution

  47. Conclusions • Deep convolutional networks have spectacular high-dimensional and generic approximation capabilities. • New stochastic models of images for inverse problems. • Outstanding mathematical problem to understand deep nets: – How to learn representations for inverse problems ? Understanding Deep Convolutional Networks , arXiv 2016.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend