Harmonic Analysis of Deep Convolutional Neural Networks Helmut B - PowerPoint PPT Presentation

Vertical translation invariance Theorem (Wiatowski and HB, 2015) Assume that the filters, non-linearities, and poolings satisfy B n ≤ min { 1 , L − 2 n R − 2 ∀ n ∈ N . n } , Let the pooling factors be S n ≥ 1 , n ∈ N . Then, � � � t � ||| Φ n ( T t f ) − Φ n ( f ) ||| = O , S 1 . . . S n for all f ∈ L 2 ( R d ) , t ∈ R d , n ∈ N . ⇒ applies to general filters, non-linearities, and poolings

Philosophy behind invariance results Mallat’s “horizontal” translation invariance [ Mallat, 2012 ]: ∀ f ∈ L 2 ( R d ) , ∀ t ∈ R d J →∞ ||| Φ W ( T t f ) − Φ W ( f ) ||| = 0 , lim “Vertical” translation invariance: n →∞ ||| Φ n ( T t f ) − Φ n ( f ) ||| = 0 , ∀ f ∈ L 2 ( R d ) , ∀ t ∈ R d lim

Philosophy behind invariance results Mallat’s “horizontal” translation invariance [ Mallat, 2012 ]: ∀ f ∈ L 2 ( R d ) , ∀ t ∈ R d J →∞ ||| Φ W ( T t f ) − Φ W ( f ) ||| = 0 , lim - features become invariant in every network layer, but needs J → ∞ “Vertical” translation invariance: n →∞ ||| Φ n ( T t f ) − Φ n ( f ) ||| = 0 , ∀ f ∈ L 2 ( R d ) , ∀ t ∈ R d lim - features become more invariant with increasing network depth

Philosophy behind invariance results Mallat’s “horizontal” translation invariance [ Mallat, 2012 ]: ∀ f ∈ L 2 ( R d ) , ∀ t ∈ R d J →∞ ||| Φ W ( T t f ) − Φ W ( f ) ||| = 0 , lim - features become invariant in every network layer, but needs J → ∞ - applies to wavelet transform and modulus non-linearity without pooling “Vertical” translation invariance: n →∞ ||| Φ n ( T t f ) − Φ n ( f ) ||| = 0 , ∀ f ∈ L 2 ( R d ) , ∀ t ∈ R d lim - features become more invariant with increasing network depth - applies to general filters, general non-linearities, and general poolings

Non-linear deformations Non-linear deformation ( F τ f )( x ) = f ( x − τ ( x )) , where τ : R d → R d For “small” τ :

Non-linear deformations Non-linear deformation ( F τ f )( x ) = f ( x − τ ( x )) , where τ : R d → R d For “large” τ :

Deformation sensitivity for signal classes Consider ( F τ f )( x ) = f ( x − τ ( x )) = f ( x − e − x 2 ) f 1 ( x ) , ( F τ f 1 )( x ) x f 2 ( x ) , ( F τ f 2 )( x ) x For given τ the amount of deformation induced can depend drastically on f ∈ L 2 ( R d )

Philosophy behind deformation stability/sensitivity bounds Mallat’s deformation stability bound [ Mallat, 2012 ]: � � 2 − J � τ � ∞ + J � Dτ � ∞ + � D 2 τ � ∞ ||| Φ W ( F τ f ) − Φ W ( f ) ||| ≤ C � f � W , for all f ∈ H W ⊆ L 2 ( R d ) - The signal class H W and the corresponding norm � · � W depend on the mother wavelet (and hence the network) Our deformation sensitivity bound: ||| Φ( F τ f ) − Φ( f ) ||| ≤ C C � τ � α ∀ f ∈ C ⊆ L 2 ( R d ) ∞ , - The signal class C (band-limited functions, cartoon functions, or Lipschitz functions) is independent of the network

Philosophy behind deformation stability/sensitivity bounds Mallat’s deformation stability bound [ Mallat, 2012 ]: � � 2 − J � τ � ∞ + J � Dτ � ∞ + � D 2 τ � ∞ ||| Φ W ( F τ f ) − Φ W ( f ) ||| ≤ C � f � W , for all f ∈ H W ⊆ L 2 ( R d ) - Signal class description complexity implicit via norm � · � W Our deformation sensitivity bound: ||| Φ( F τ f ) − Φ( f ) ||| ≤ C C � τ � α ∀ f ∈ C ⊆ L 2 ( R d ) ∞ , - Signal class description complexity explicit via C C - L -band-limited functions: C C = O ( L ) - cartoon functions of size K : C C = O ( K 3 / 2 ) - M -Lipschitz functions C C = O ( M )

Philosophy behind deformation stability/sensitivity bounds Mallat’s deformation stability bound [ Mallat, 2012 ]: � � 2 − J � τ � ∞ + J � Dτ � ∞ + � D 2 τ � ∞ ||| Φ W ( F τ f ) − Φ W ( f ) ||| ≤ C � f � W , for all f ∈ H W ⊆ L 2 ( R d ) Our deformation sensitivity bound: ||| Φ( F τ f ) − Φ( f ) ||| ≤ C C � τ � α ∀ f ∈ C ⊆ L 2 ( R d ) ∞ , - Decay rate α > 0 of the deformation error is signal-class- specific (band-limited functions: α = 1 , cartoon functions: α = 1 2 , Lipschitz functions: α = 1 )

Philosophy behind deformation stability/sensitivity bounds Mallat’s deformation stability bound [ Mallat, 2012 ]: � � 2 − J � τ � ∞ + J � Dτ � ∞ + � D 2 τ � ∞ ||| Φ W ( F τ f ) − Φ W ( f ) ||| ≤ C � f � W , for all f ∈ H W ⊆ L 2 ( R d ) - The bound depends explicitly on higher order derivatives of τ Our deformation sensitivity bound: ||| Φ( F τ f ) − Φ( f ) ||| ≤ C C � τ � α ∀ f ∈ C ⊆ L 2 ( R d ) ∞ , - The bound implicitly depends on derivative of τ via the 1 condition � Dτ � ∞ ≤ 2 d

Philosophy behind deformation stability/sensitivity bounds Mallat’s deformation stability bound [ Mallat, 2012 ]: � � 2 − J � τ � ∞ + J � Dτ � ∞ + � D 2 τ � ∞ ||| Φ W ( F τ f ) − Φ W ( f ) ||| ≤ C � f � W , for all f ∈ H W ⊆ L 2 ( R d ) - The bound is coupled to horizontal translation invariance ∀ f ∈ L 2 ( R d ) , ∀ t ∈ R d J →∞ ||| Φ W ( T t f ) − Φ W ( f ) ||| = 0 , lim Our deformation sensitivity bound: ||| Φ( F τ f ) − Φ( f ) ||| ≤ C C � τ � α ∀ f ∈ C ⊆ L 2 ( R d ) ∞ , - The bound is decoupled from vertical translation invariance n →∞ ||| Φ n ( T t f ) − Φ n ( f ) ||| = 0 , ∀ f ∈ L 2 ( R d ) , ∀ t ∈ R d lim

CNNs in a nutshell CNNs used in practice employ potentially hundreds of layers and 10 , 000 s of nodes!

CNNs in a nutshell CNNs used in practice employ potentially hundreds of layers and 10 , 000 s of nodes! e.g.: Winner of the ImageNet 2015 challenge [ He et al., 2015 ] - Network depth : 152 layers - average # of nodes per layer: 472 - # of FLOPS for a single forward pass: 11 . 3 billion

CNNs in a nutshell CNNs used in practice employ potentially hundreds of layers and 10 , 000 s of nodes! e.g.: Winner of the ImageNet 2015 challenge [ He et al., 2015 ] - Network depth : 152 layers - average # of nodes per layer: 472 - # of FLOPS for a single forward pass: 11 . 3 billion Such depths (and breadths) pose formidable computational challenges in training and operating the network!

Topology reduction Determine how fast the energy contained in the propagated signals (a.k.a. feature maps) decays across layers

Topology reduction Determine how fast the energy contained in the propagated signals (a.k.a. feature maps) decays across layers Guarantee trivial null-space for feature extractor Φ

Topology reduction Determine how fast the energy contained in the propagated signals (a.k.a. feature maps) decays across layers Guarantee trivial null-space for feature extractor Φ Specify the number of layers needed to have “most” of the input signal energy be contained in the feature vector

Topology reduction Determine how fast the energy contained in the propagated signals (a.k.a. feature maps) decays across layers Guarantee trivial null-space for feature extractor Φ Specify the number of layers needed to have “most” of the input signal energy be contained in the feature vector For a fixed (possibly small) depth, design CNNs that capture “most” of the input signal energy

Building blocks Basic operations in the n -th network layer g λ ( k ) | · | ↓ S n . . f . g λ ( r ) | · | ↓ S n Filters: Semi-discrete frame Ψ n := { χ n } ∪ { g λ n } λ n ∈ Λ n Non-linearity: Modulus | · | Pooling: Sub-sampling with pooling factor S ≥ 1

Demodulation effect of modulus non-linearity Components of feature vector given by | f ∗ g λ n | ∗ χ n +1 g λ n ( ω ) � χ n +1 ( ω ) � 1 � f ( ω ) · · · · · · ω

Demodulation effect of modulus non-linearity Components of feature vector given by | f ∗ g λ n | ∗ χ n +1 g λ n ( ω ) � χ n +1 ( ω ) � 1 � f ( ω ) · · · · · · ω � 1 f ( ω ) · � g λ n ( ω ) ω

Demodulation effect of modulus non-linearity Components of feature vector given by | f ∗ g λ n | ∗ χ n +1 g λ n ( ω ) � χ n +1 ( ω ) � 1 � f ( ω ) · · · · · · ω � 1 f ( ω ) · � g λ n ( ω ) ω Modulus squared : | f ∗ g λ n ( x ) | 2 R � g λn ( ω ) f · �

Demodulation effect of modulus non-linearity Components of feature vector given by | f ∗ g λ n | ∗ χ n +1 g λ n ( ω ) � χ n +1 ( ω ) � 1 � f ( ω ) · · · · · · ω � 1 f ( ω ) · � g λ n ( ω ) ω Φ( f ) 1 via � � χ n +1 | f ∗ g λ n | ( ω ) ω

Do all non-linearities demodulate? High-pass filtered signal: F ( f ∗ g λ ) ω − 2 R 2 R 2 R

Do all non-linearities demodulate? High-pass filtered signal: F ( f ∗ g λ ) ω − 2 R 2 R 2 R Modulus : Yes ! |F ( | f ∗ g λ | ) | ω − 2 R 2 R ... but (small) tails!

Do all non-linearities demodulate? High-pass filtered signal: F ( f ∗ g λ ) ω − 2 R 2 R 2 R Modulus squared : Yes, and sharply so ! |F ( | f ∗ g λ | 2 ) | ω − 2 R 2 R ... but not Lipschitz-continuous!

Do all non-linearities demodulate? High-pass filtered signal: F ( f ∗ g λ ) ω − 2 R 2 R 2 R Rectified linear unit : No ! |F ( ReLU ( f ∗ g λ )) | ω − 2 R 2 R

First goal: Quantify feature map energy decay W 2 ( f ) || f ∗ g λ ( k ) 1 | ∗ g λ ( l ) 2 | || f ∗ g λ ( p ) 1 | ∗ g λ ( r ) 2 | · ∗ χ 3 · ∗ χ 3 W 1 ( f ) | f ∗ g λ ( k ) 1 | | f ∗ g λ ( p ) 1 | · ∗ χ 2 · ∗ χ 2 f · ∗ χ 1

Assumptions (on the filters) i) Analyticity : For every filter g λ n there exists a (not necessarily canonical) orthant H λ n ⊆ R d such that supp ( � g λ n ) ⊆ H λ n . ii) High-pass : There exists δ > 0 such that � g λ n ( ω ) | 2 = 0 , | � a.e. ω ∈ B δ (0) . λ n ∈ Λ n

Assumptions (on the filters) i) Analyticity : For every filter g λ n there exists a (not necessarily canonical) orthant H λ n ⊆ R d such that supp ( � g λ n ) ⊆ H λ n . ii) High-pass : There exists δ > 0 such that � g λ n ( ω ) | 2 = 0 , | � a.e. ω ∈ B δ (0) . λ n ∈ Λ n ⇒ Comprises various contructions of WH filters, wavelets, ridgelets, ( α )-curvelets, shearlets ω 2 e.g.: analytic band-limited curvelets: ω 1

Input signal classes Sobolev functions of order s ≥ 0 : � � � � � R d (1 + | ω | 2 ) s | � H s ( R d ) = f ∈ L 2 ( R d ) f ( ω ) | 2 d ω < ∞ �

Input signal classes Sobolev functions of order s ≥ 0 : � � � � � R d (1 + | ω | 2 ) s | � H s ( R d ) = f ∈ L 2 ( R d ) f ( ω ) | 2 d ω < ∞ � H s ( R d ) contains a wide range of practically relevant signal classes

Input signal classes Sobolev functions of order s ≥ 0 : � � � � � R d (1 + | ω | 2 ) s | � H s ( R d ) = f ∈ L 2 ( R d ) f ( ω ) | 2 d ω < ∞ � H s ( R d ) contains a wide range of practically relevant signal classes - square-integrable functions L 2 ( R d ) = H 0 ( R d )

Input signal classes Sobolev functions of order s ≥ 0 : � � � � � R d (1 + | ω | 2 ) s | � H s ( R d ) = f ∈ L 2 ( R d ) f ( ω ) | 2 d ω < ∞ � H s ( R d ) contains a wide range of practically relevant signal classes - square-integrable functions L 2 ( R d ) = H 0 ( R d ) - L -band-limited functions L 2 L ( R d ) ⊆ H s ( R d ) , ∀ L > 0 , ∀ s ≥ 0

Input signal classes Sobolev functions of order s ≥ 0 : � � � � � R d (1 + | ω | 2 ) s | � H s ( R d ) = f ∈ L 2 ( R d ) f ( ω ) | 2 d ω < ∞ � H s ( R d ) contains a wide range of practically relevant signal classes - square-integrable functions L 2 ( R d ) = H 0 ( R d ) - L -band-limited functions L 2 L ( R d ) ⊆ H s ( R d ) , ∀ L > 0 , ∀ s ≥ 0 - cartoon functions [ Donoho, 2001 ] C CART ⊆ H s ( R d ) , ∀ s ∈ [0 , 1 2 ) Handwritten digits from MNIST database [ LeCun & Cortes, 1998 ]

Exponential energy decay Theorem Let the filters be wavelets with mother wavelet supp ( � ψ ) ⊆ [ r − 1 , r ] , r > 1 , or Weyl-Heisenberg (WH) filters with prototype function supp ( � g ) ⊆ [ − R, R ] , R > 0 . Then, for every f ∈ H s ( R d ) , there exists β > 0 such that � � a − n (2 s + β ) W n ( f ) = O , 2 s + β +1 where a = r 2 +1 r 2 − 1 in the wavelet case, and a = 1 2 + 1 R in the WH case.

Exponential energy decay Theorem Let the filters be wavelets with mother wavelet supp ( � ψ ) ⊆ [ r − 1 , r ] , r > 1 , or Weyl-Heisenberg (WH) filters with prototype function supp ( � g ) ⊆ [ − R, R ] , R > 0 . Then, for every f ∈ H s ( R d ) , there exists β > 0 such that � � a − n (2 s + β ) W n ( f ) = O , 2 s + β +1 where a = r 2 +1 r 2 − 1 in the wavelet case, and a = 1 2 + 1 R in the WH case. ⇒ decay factor a is explicit and can be tuned via r, R

Exponential energy decay Exponential energy decay: � � a − n (2 s + β ) W n ( f ) = O 2 s + β +1

Exponential energy decay Exponential energy decay: � � a − n (2 s + β ) W n ( f ) = O 2 s + β +1 - β > 0 determines the decay of � f ( ω ) (as | ω | → ∞ ) according to 2 + 1 4 + β | � f ( ω ) | ≤ µ (1 + | ω | 2 ) − ( s 4 ) , ∀ | ω | ≥ L, for some µ > 0 , and L acts as an “effective bandwidth”

Exponential energy decay Exponential energy decay: � � a − n (2 s + β ) W n ( f ) = O 2 s + β +1 - β > 0 determines the decay of � f ( ω ) (as | ω | → ∞ ) according to 2 + 1 4 + β | � f ( ω ) | ≤ µ (1 + | ω | 2 ) − ( s 4 ) , ∀ | ω | ≥ L, for some µ > 0 , and L acts as an “effective bandwidth” - smoother input signals (i.e., s ↑ ) lead to faster energy decay

Exponential energy decay Exponential energy decay: � � a − n (2 s + β ) W n ( f ) = O 2 s + β +1 - β > 0 determines the decay of � f ( ω ) (as | ω | → ∞ ) according to 2 + 1 4 + β | � f ( ω ) | ≤ µ (1 + | ω | 2 ) − ( s 4 ) , ∀ | ω | ≥ L, for some µ > 0 , and L acts as an “effective bandwidth” - smoother input signals (i.e., s ↑ ) lead to faster energy decay - pooling through sub-sampling f �→ S 1 / 2 f ( S · ) leads to decay factor a S

Exponential energy decay Exponential energy decay: � � a − n (2 s + β ) W n ( f ) = O 2 s + β +1 - β > 0 determines the decay of � f ( ω ) (as | ω | → ∞ ) according to 2 + 1 4 + β | � f ( ω ) | ≤ µ (1 + | ω | 2 ) − ( s 4 ) , ∀ | ω | ≥ L, for some µ > 0 , and L acts as an “effective bandwidth” - smoother input signals (i.e., s ↑ ) lead to faster energy decay - pooling through sub-sampling f �→ S 1 / 2 f ( S · ) leads to decay factor a S What about general filters? ⇒ polynomial energy decay!

... our second goal ... trivial null-space for Φ Why trivial null-space? Feature space w : � w, Φ( f ) � > 0 : � w, Φ( f ) � < 0

... our second goal ... trivial null-space for Φ Why trivial null-space? Feature space w : � w, Φ( f ) � > 0 : � w, Φ( f ) � < 0 Φ( f ∗ ) Non -trivial null-space: ∃ f ∗ � = 0 such that Φ( f ∗ ) = 0 ⇒ � w, Φ( f ∗ ) � = 0 for all w ! ⇒ these f ∗ become unclassifiable !

... our second goal ... Trivial null-space for feature extractor: � � � � f ∈ L 2 ( R d ) | Φ( f ) = 0 = 0 Feature extractor Φ( · ) = � ∞ n =0 Φ n ( · ) shall satisfy 2 ≤ ||| Φ( f ) ||| 2 ≤ B � f � 2 A � f � 2 ∀ f ∈ L 2 ( R d ) , 2 , for some A, B > 0 .

“Energy conservation” Theorem For the frame upper { B n } n ∈ N and frame lower bounds { A n } n ∈ N , define B := � ∞ n =1 max { 1 , B n } and A := � ∞ n =1 min { 1 , A n } . If 0 < A ≤ B < ∞ , then 2 ≤ ||| Φ( f ) ||| 2 ≤ B � f � 2 A � f � 2 ∀ f ∈ L 2 ( R d ) . 2 ,

“Energy conservation” Theorem For the frame upper { B n } n ∈ N and frame lower bounds { A n } n ∈ N , define B := � ∞ n =1 max { 1 , B n } and A := � ∞ n =1 min { 1 , A n } . If 0 < A ≤ B < ∞ , then 2 ≤ ||| Φ( f ) ||| 2 ≤ B � f � 2 A � f � 2 ∀ f ∈ L 2 ( R d ) . 2 , - For Parseval frames (i.e., A n = B n = 1 , n ∈ N ), this yields ||| Φ( f ) ||| 2 = � f � 2 2

“Energy conservation” Theorem For the frame upper { B n } n ∈ N and frame lower bounds { A n } n ∈ N , define B := � ∞ n =1 max { 1 , B n } and A := � ∞ n =1 min { 1 , A n } . If 0 < A ≤ B < ∞ , then 2 ≤ ||| Φ( f ) ||| 2 ≤ B � f � 2 A � f � 2 ∀ f ∈ L 2 ( R d ) . 2 , - For Parseval frames (i.e., A n = B n = 1 , n ∈ N ), this yields ||| Φ( f ) ||| 2 = � f � 2 2 - Connection to energy decay: n − 1 � ||| Φ k ( f ) ||| 2 + W n ( f ) � f � 2 2 = � �� k =0 → 0

... and our third goal ... For a given CNN, specify the number of layers needed to capture “most” of the input signal energy

... and our third goal ... For a given CNN, specify the number of layers needed to capture “most” of the input signal energy How many layers n are needed to have at least ((1 − ε ) · 100)% of the input signal energy be contained in the feature vector , i.e., n � ||| Φ k ( f ) ||| 2 ≤ � f � 2 (1 − ε ) � f � 2 ∀ f ∈ L 2 ( R d ) . 2 ≤ 2 , k =0

Number of layers needed Theorem Let the frame bounds satisfy A n = B n = 1 , n ∈ N . Let the input signal f be L -band-limited, and let ε ∈ (0 , 1) . If � � �� L n ≥ log a (1 − √ 1 − ε ) , then n � ||| Φ k ( f ) ||| 2 ≤ � f � 2 (1 − ε ) � f � 2 2 ≤ 2 . k =0

Number of layers needed Theorem Let the frame bounds satisfy A n = B n = 1 , n ∈ N . Let the input signal f be L -band-limited, and let ε ∈ (0 , 1) . If � � �� L n ≥ log a (1 − √ 1 − ε ) , then n � ||| Φ k ( f ) ||| 2 ≤ � f � 2 (1 − ε ) � f � 2 2 ≤ 2 . k =0 ⇒ also guarantees trivial null-space for � n k =0 Φ k ( f )

Number of layers needed Theorem Let the frame bounds satisfy A n = B n = 1 , n ∈ N . Let the input signal f be L -band-limited, and let ε ∈ (0 , 1) . If � � �� L n ≥ log a (1 − √ 1 − ε ) , then n � ||| Φ k ( f ) ||| 2 ≤ � f � 2 (1 − ε ) � f � 2 2 ≤ 2 . k =0 - lower bound depends on - description complexity of input signals (i.e., bandwidth L ) - decay factor (wavelets a = r 2 +1 r 2 − 1 , WH filters a = 1 2 + 1 R )

Number of layers needed Theorem Let the frame bounds satisfy A n = B n = 1 , n ∈ N . Let the input signal f be L -band-limited, and let ε ∈ (0 , 1) . If � � �� L n ≥ log a (1 − √ 1 − ε ) , then n � ||| Φ k ( f ) ||| 2 ≤ � f � 2 (1 − ε ) � f � 2 2 ≤ 2 . k =0 - lower bound depends on - description complexity of input signals (i.e., bandwidth L ) - decay factor (wavelets a = r 2 +1 r 2 − 1 , WH filters a = 1 2 + 1 R ) - similar estimates for Sobolev input signals and for general filters (polynomial decay!)

Number of layers needed Numerical example for bandwidth L = 1 : (1 − ε ) 0.25 0.5 0.75 0.9 0.95 0.99 wavelets ( r = 2 ) 2 3 4 6 8 11 WH filters ( R = 1 ) 2 4 5 8 10 14 general filters 2 3 7 19 39 199

Number of layers needed Numerical example for bandwidth L = 1 : (1 − ε ) 0.25 0.5 0.75 0.9 0.95 0.99 wavelets ( r = 2 ) 2 3 4 6 8 11 WH filters ( R = 1 ) 2 4 5 8 10 14 general filters 2 3 7 19 39 199 Recall: Winner of the ImageNet 2015 challenge [ He et al., 2015 ] - Network depth : 152 layers - average # of nodes per layer: 472 - # of FLOPS for a single forward pass: 11 . 3 billion

... our fourth and last goal ... For a fixed (possibly small) depth N , design scattering networks that capture “most” of the input signal energy

... our fourth and last goal ... For a fixed (possibly small) depth N , design scattering networks that capture “most” of the input signal energy Recall : Let the filters be wavelets with mother wavelet supp ( � ψ ) ⊆ [ r − 1 , r ] , r > 1 , or Weyl-Heisenberg filters with prototype function supp ( � g ) ⊆ [ − R, R ] , R > 0 .

... our fourth and last goal ... For a fixed (possibly small) depth N , design scattering networks that capture “most” of the input signal energy For fixed depth N , want to choose r in the wavelet and R in the WH case so that N � ||| Φ k ( f ) ||| 2 ≤ � f � 2 (1 − ε ) � f � 2 ∀ f ∈ L 2 ( R d ) . 2 ≤ 2 , k =0

Depth-constrained networks Theorem Let the frame bounds satisfy A n = B n = 1 , n ∈ N . Let the input signal f be L -band-limited, and fix ε ∈ (0 , 1) and N ∈ N . If, in the wavelet case, � κ + 1 1 < r ≤ κ − 1 , or, in the WH case, � 1 0 < R ≤ , κ − 1 2 � � 1 N , then L where κ := (1 −√ 1 − ε ) N � ||| Φ k ( f ) ||| 2 ≤ � f � 2 (1 − ε ) � f � 2 2 ≤ 2 . k =0

Depth-width tradeoff Spectral supports of wavelet filters: g 1 � g 2 � g 3 � � ψ 1 ω r r 1 1 r 2 r 3 L

Depth-width tradeoff Spectral supports of wavelet filters: g 1 � g 2 � g 3 � � ψ 1 ω r r 1 1 r 2 r 3 L Smaller depth N ⇒ smaller “bandwidth” r of mother wavelet

Depth-width tradeoff Spectral supports of wavelet filters: g 1 � g 2 � g 3 � � ψ 1 ω r r 1 1 r 2 r 3 L Smaller depth N ⇒ smaller “bandwidth” r of mother wavelet ⇒ larger number of wavelets ( O (log r ( L )) ) to cover the spectral support [ − L, L ] of input signal

Harmonic Analysis of Deep Convolutional Neural Networks Helmut B - PowerPoint PPT Presentation

Harmonic Analysis of Deep Convolutional Neural Networks Helmut B olcskei Department of Information Technology and Electrical Engineering October 2017 joint work with Thomas Wiatowski and Philipp Grohs ImageNet ImageNet ski rock plant

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Contents averages averages Contents Contents Harmonic mean (average) Harmonic mean (average)

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Harmonic Map Let f : T 2 S 3 = SU (2) be a harmonic map. A harmonic map is a critical

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

CS7015 (Deep Learning) : Lecture 13 Visualizing Convolutional Neural Networks, Guided

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Neural Nets 4-25-16 Reading Quiz Convolutional neural networks are most commonly

Deep Convolutional Neural Nets COMPSCI 371D Machine Learning COMPSCI 371D Machine

Convolutional Neural Networks (Part III) 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image

MICROBOONE Taritree Wongjirad DPF 2017 Tufts/MIT Outline Convolutional neural networks

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Neural Network Part 3: Convolutional Neural Networks CS 760@UW-Madison Goals for the lecture

Posterior odds interpretation of a sigmoid Artificial Intelligence: Neural Networks Michael S.

Neural Networks for Machine Learning Lecture 3a Learning the weights of a linear neuron Geoffrey

MMI 2: Mobile Human- Computer Interaction Sensor-Based Mobile Interaction Prof. Dr. Michael

Autonomic Computing Introduction, Motivations, Overview Manish Parashar The Applied Software

Greedy Layerwise Learning Can Scale to ImageNet Edouard Oyallon Eugene Belilovsky, Michael

Lecture 9: Recurrent Neural Networks Princeton University COS 495 Instructor: Yingyu Liang

Neural Grammar Induction Yoon Kim Harvard University (with Chris Dyer, Alexander Rush) 1/69

Neural Importance Sampling Fabrice Rousselle Markus Gross Jan Novk A ffi liation: Work done

Harmonic Analysis of Deep Convolutional Neural Networks Helmut B - PowerPoint PPT Presentation

Harmonic Analysis of Deep Convolutional Neural Networks Helmut B olcskei Department of Information Technology and Electrical Engineering October 2017 joint work with Thomas Wiatowski and Philipp Grohs ImageNet ImageNet ski rock plant

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Contents averages averages Contents Contents Harmonic mean (average) Harmonic mean (average)

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Convolutional Neural Networks 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Harmonic Map Let f : T 2 S 3 = SU (2) be a harmonic map. A harmonic map is a critical

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

CS7015 (Deep Learning) : Lecture 13 Visualizing Convolutional Neural Networks, Guided

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Neural Nets 4-25-16 Reading Quiz Convolutional neural networks are most commonly

Deep Convolutional Neural Nets COMPSCI 371D Machine Learning COMPSCI 371D Machine

Convolutional Neural Networks (Part III) 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image

MICROBOONE Taritree Wongjirad DPF 2017 Tufts/MIT Outline Convolutional neural networks

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Neural Network Part 3: Convolutional Neural Networks CS 760@UW-Madison Goals for the lecture

Posterior odds interpretation of a sigmoid Artificial Intelligence: Neural Networks Michael S.

Neural Networks for Machine Learning Lecture 3a Learning the weights of a linear neuron Geoffrey

MMI 2: Mobile Human- Computer Interaction Sensor-Based Mobile Interaction Prof. Dr. Michael

Autonomic Computing Introduction, Motivations, Overview Manish Parashar The Applied Software

Greedy Layerwise Learning Can Scale to ImageNet Edouard Oyallon Eugene Belilovsky, Michael

Lecture 9: Recurrent Neural Networks Princeton University COS 495 Instructor: Yingyu Liang

Neural Grammar Induction Yoon Kim Harvard University (with Chris Dyer, Alexander Rush) 1/69

Neural Importance Sampling Fabrice Rousselle Markus Gross Jan Novk A ffi liation: Work done

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Networks (Part III) 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image