Why deep nets
Is deep better
than shallow
When
XX Shan Deep Somehistory 80 S N N 2018 Approximation Theory is - - PDF document
Why deep nets Is deep better than shallow When Why in i w XX Shan Deep Somehistory 80 S N N 2018 Approximation Theory is depth better why Shallow networks wit O O 02 k X kernel machine Rain example 9 K x a xi y ta Ef Gil
than shallow
When
w
S N N
2018
is depth better
O
O
02
k
X
9
a
K
x
xi
O
O
O
O
w
ex
ex
max
zJ
fzy
x I
64W
x
b
but
I
eliminate b
b
anury
the
components
is
Xd
L so
2
O
O
O
O
O
O
O
O
very
E in
W
xp
E summation convention
Are deep
nets
better
than
shallow
The
answer
in the SO
was
no
see
a
new
answer
nets
can
be much
better for certain f
ideas
in approximation
mm
Example
Vefss.fi fCx ist
fxfafeC
K
Md
HE
O
t
qq.pe
cxi gcxllcE
I
set of
networks
d
gnefn I f g la
e
is N
gakEcicw.ie
st
with
N
E
In
a optimization
cannot be done by
Rs
a function approximation
E
D integration
a Smoothness
Barrow's Green
2
dtk
d
K
monomials A function of 10 Variables
a
10 D
table
each
dimension
is discretized
vi Just
10
I
have
table
with
10
entries
100 pixels
e
s
then
10
entries
Ym
For
E
e
l O
d
too
N Off
iCi6
Ai
X
bi
r
Pol
E
Me
PY CW xD
E
Pge fol updegee
kind
variables
Z
Kd
p
k
qYd
I.EETEEqfmda
Sobolev
F
Wdm Rz Lp
a Ck
E
F
W
Es
30
p
E
g
C
2 Vd
I
w'd
I
Logic of
t t
Networks
approximate
univariate
poly
Univariate
get
in CW X
represent
multivariate
Multivariate
pet approximate
Sobolev functions
Thus
theorem
p
x
can berepresented
as
linear
proof
fun
6
ath
x
b
Xt b
h
h
6
X
6
b
6
is
not
a polynomial
to E C
the closureof Nr
pair
r 8
contains
the
linear space of
IT
p
a of
a
which
needs
3
terms
x
Thus
is dense in
C CH
because of Weierstrass
theorem
DD
i
win variate
pi
i
Rd
pohffffeneous
d
variables of
deputy
2
dim Hh
dth
Cd al
k
thus
Hnd pol
can
be represented
a
network
with
2 fed
units
want
to show that
d
some ghoice of
r
and
consider
assume
network
units
r s
t
Cl of
d variables
I
can
how
do
I get
x xz
Well
II
aka Reimann
And
hour pot degree him 2Nd
Pnd
a
4
u
n
din
HE
r
din Pu
tan
E fB
X
Lp
c B
Pex
Wam Nz
Lp
Yet
II
Be
m Pie
Lp ectin
Juice
r
Ii'd
p K
No 4 NECW
C
c Even
without
a shallow
net
can represent arts
Well
in PL
with
2
units
2
fed
e
shallow
and deep
nets
curse ofdrinensionality
But
nets
unlike
shallow
do
not
have
curse
LHC
Swnplesteeample
x
he
4
4h
fg
3
h
Another eeaus.pk
x.kz
AX.Xz
iBx.eCxzJ
shallow not require
2
units
a deep
net
n
3
t
3
10
shallow wet
4
units
deep wet
2
each
node
6
N
A
fer
total
3
12 units
ee.am
VsinCX.exz oCxzeXuY
h6
h
h2C4xz
has
34
Deep nets
with
same
graph
approximate functions
in
d
variables
in WE
with 4 units per node
r
O
m
total units
0 Kd D
Each
h
can
be approximated with
O
units
assume
each
h
is Lipschitz continuous
that
is
1ham
a
LE
11h
Pace
1h
pike
1h
pl
e
h
h
hz
PG pal
1h
h
hu h
P
Pz
h
R Pz
PNR
a Ih
h I
t
I h
Pl
e GE
t
E
n O
l
Minkowski
p
Lipson h
hypothers
f Kal ten
Ix
El
More general
theorems for
G
s
EH
This theorem may eeplain why deep
nets
are
successful
and why all
the really good
are
C
N Ns
h h
Ann
Ann
Em
is hey
weight sharing of weight
sharing helps butuoteepuentially