Nearly-tight VC-dimension bounds for piecewise linear neural networks
Nicholas J. A. Harvey, Christopher Liaw, Abbas Mehrabian University of British Columbia COLT β17 July 10, 2017
Nearly-tight VC-dimension bounds for piecewise linear neural - - PowerPoint PPT Presentation
Nearly-tight VC-dimension bounds for piecewise linear neural networks Nicholas J. A. Harvey, Christopher Liaw , Abbas Mehrabian University of British Columbia COLT 17 July 10, 2017 Neural networks () = max {, 0} (ReLU)
Nicholas J. A. Harvey, Christopher Liaw, Abbas Mehrabian University of British Columbia COLT β17 July 10, 2017
π π π π π π π
hidden layer 1 hidden layer 2
layer input layer
[BMM β98]
[M β94]
[BMM β98]
[GJ β95]
W Β‘-Ββ # Β‘parameters/edges L Β‘-Ββ # Β‘layers
[BMM β98]
[M β94]
[BMM β98]
[GJ β95]
Means there exists NN with this VCdim W Β‘-Ββ # Β‘parameters/edges L Β‘-Ββ # Β‘layers
[BMM β98]
[M β94]
[BMM β98]
[GJ β95]
Independently proved by Bartlett β17
Means there exists NN with this VCdim W Β‘-Ββ # Β‘parameters/edges L Β‘-Ββ # Β‘layers
[BMM β98]
[M β94]
[BMM β98]
[GJ β95]
Independently proved by Bartlett β17 Recently, lots of work on βpower of depthβ for expressiveness of NNs
[T β16, ES β16, Y β16, LS β16, SSβ16, CSSβ 16, LGMRA β17, D β17]
Means there exists NN with this VCdim W Β‘-Ββ # Β‘parameters/edges L Β‘-Ββ # Β‘layers
(refinement of [BMM β98])
c}cβ[d]
π^,c = π(π^, π
c)
(refinement of [BMM β98])
c}cβ[d]
π^,c = π(π^, π
c)
π^ π^ π^," π^,# π^,c π^,d 1
NN block extracts bits from π^
1
Rest of NN
c
Select bit j from π^
(refinement of [BMM β98])
c}cβ[d]
π^,c = π(π^, π
c)
β Ξ©(ππ Β‘log Β‘ (π/π))
π^ π^ π^," π^,# π^,c π^,d 1
NN block extracts bits from π^
1
Rest of NN
c
Select bit j from π^
(refinement of [BMM β98])
c}cβ[d]
π^,c = π(π^, π
c)
β Ξ©(ππ Β‘log Β‘ (π/π))
π^ π^ π^," π^,# π^,c π^,d 1
NN block extracts bits from π^
1
Rest of NN
c
Select bit j from π^
π¦$
^
π π π π π π π
π¦#
^
π¦"
^
π¦%
^
π¦&
^
(refinement of [BMM β98] for ReLU)
hidden layer has constant sign
partition of size β€ (π·ππ)h(gi)
so total # of signings β€ π·ππ h gi
which implies π = π(ππ log π)
π¦$
^
π π π π π π π
π¦#
^
π¦"
^
π¦%
^
π¦&
^
(refinement of [BMM β98] for ReLU)
hidden layer has constant sign
partition of size β€ (π·ππ)h(gi)
so total # of signings β€ π·ππ h gi
which implies π = π(ππ log π)
π¦$
^
π π π π
π¦#
^
π¦"
^
π¦%
^
π¦&
^
(refinement of [BMM β98] for ReLU)
hidden layer has constant sign
partition of size β€ (π·ππ)h(gi)
so total # of signings β€ π·ππ h gi
which implies π = π(ππ log π)
π¦$
^
π π π π
π¦#
^
π¦"
^
π¦%
^
π¦&
^
(refinement of [BMM β98] for ReLU)
hidden layer has constant sign
partition of size β€ (π·ππ)h(gi)
so total # of signings β€ π·ππ h gi
which implies π = π(ππ log π)
* Β‘π· Β‘ > Β‘1 is some constant
π¦$
^
π¦#
^
π¦"
^
π¦%
^
π¦&
^
(refinement of [BMM β98] for ReLU)
hidden layer has constant sign
partition of size β€ (π·ππ)h(gi)
so total # of signings β€ π·ππ h gi
which implies π = π(ππ log π)
* Β‘π· Β‘ > Β‘1 is some constant
π¦$
^
π¦#
^
π¦"
^
π¦%
^
π¦&
^
(refinement of [BMM β98] for ReLU)
hidden layer has constant sign
partition of size β€ (π·ππ)h(gi)
so total # of signings β€ π·ππ h gi
which implies π = π(ππ log π)
* Β‘π· Β‘ > Β‘1 is some constant