XX Shan Deep Somehistory 80 S N N 2018 Approximation Theory is - PDF document

Why deep nets Is deep better than shallow When Why

in i w XX Shan Deep

Somehistory 80 S N N 2018

Approximation Theory is depth better why

Shallow networks wit O O 02 k X kernel machine Rain example 9 K x a xi y ta Ef Gil yi WE III n

Another example I O O O O w ex ex 8CZt lz4 zJ max x I fzy 2iwiiok.w eliminate b but b I b 64W usually x one of the is components Xd L so anury wit bi

Networks Deep 2 of o H l O O O O h l O O O O 0 very E in VK's Wii W xp y convention E summation

Networkstoappn oxuriate representfunctions Are deep better nets than shallow ones The in the S O answer was no We will see proof of above deep a new answer be much nets can better for certain f

in approximation key ideas theory Functions and approxuriators mm get V efss.fi fCx ist Example iRd fxfafeC aspgu6Cwiif gEVu

Density V compact Md K KEITH HE O cxi gcxllcE Igers t qq.pe Ieuan I set of networks

Degreeafapproximation H f e CCR d gnefn I f g la e dinge is N

Shallown ets density idegreeappeximation Considertargetfundious ftp iiifPnfhei few fo structure assumed gakEcicw.ie Theorem i V fewdm st 7gc.VN 1gal false with N O E

Curse of dimensionality In Bellman's term a optimization cannot be done by Rs a function approximation yd evaluations requires for f Lipschitz order E D integration

Blessings of a Smoothness Barrow's Green compositionelity 2

Examples dtk tnk Pf has d K Kd monomials A function of 10 Variables 10 D corresponds to a table If each dimension vi Just is discretized 10 I have table with partitions entries If D 10 100 pixels

If d p els s o e entries then 10 Ym N O If f e Wdm N Off l O For E d too e

Summary proof Me X Ai bi iCi6 Pol E r PY CW xD Pge E kind fol updegee variables Kd qYd k Z p I.EETEEqfmda Wdm Rz Lp a Ck F Sobolev E F W Es 2 Vd o's E C 30 I p g w'd I

Logic of t t univariate Networks approximate poly Univariate in CW X get represent multivariate gel pet approximate Multivariate Sobolev functions Thus theorem

univariate Any x p linear can be represented as combination of smooth ReLU proof da Xt b b 6 ath fun x 2 h o h 6 b 6 X Ida xtbYE.o Theorem a polynomial not If 6 is the closure of Nr to E C pair linear of contains the r 8 space of IT

a of a p Second derivative which needs terms 3 x gives 0 0 0 Nr Thus C CH is dense in because of Weierstrass theorem

FROM ID To DD win variate i I i Rd PC pi f Hn variables of pohffffeneous deputy d dth dim Hh Ye 2 k Cd al Kd H nd pol be represented thus can fed with network 2 by a units

We show that to want d if Pan IR then pol ou P Lx pikwi.es Wi some ghoice of for r and Pi No general proof but consider following with units network assume t s r pi fu Pal of Can I synthesise tf

Cl of Can yuthere I variables d can E xp get I get x xz do Well how EE aka II Reimann hour pot degree him 2Nd And Pnd a 4 n u Ith din HE r din Pu tan

E fB Define X Lp c B inf Ip flip tf Pex theorem Yet Wam Nz ect F Lp II statist proof m Pie Lp ectin Be classically Ii'd Ed Juice p K r No 4 NECW ECW Ralph IF C

Remarks a shallow c Even without can represent arts Well net in PL with polynomials fed units 2 2

Depths e For general functions and deep shallow nets curse of drinensionality suffer for Local Hierarchical But Compositional functions unlike shallow nets deep have not do ones curse

LHC functions Swnplesteea mple g ftp.flkrkisxn he x 4 4h f Lx Xa fg f2 h 3 few unhfiew Another eeaus.pk f AX.Xz x.kz iBx.eCxzJ shallow not require units 2

a deep net 3 3 10 n t Intuition Thf 4 units shallow wet f 2 deep wet each node for f 6 A N 12 units total 3 fer Another ee.am VsinCX.exz oCxze XuY y h h2C4xz h6 has 34

Theorem Deep nets with graph same approximate functions d in in WE with 4 units variables O per node fer m r 0 Kd D total units Th Proof µ µ µ be approximated with Each h can units We O assume

is Lipschitz continuous each h that het E I LE 1ham is a By hypothesis 11h Pace 1h pike 1h pl Then e PG pal I h h hz h h PNR 1h h hu P R Pz Pz a Ih h I n O e GE I h Pl l E t t p Minkowski Lipson h hypothers If di f Kal ten Ix El theorems for More general DA G s

EH This theorem may eeplain nets successful why deep are the really good and why all h h N Ns C ones are Ann Ann Em is hey Locality not weight sharing of weight sharing helps butuoteep uentially

XX Shan Deep Somehistory 80 S N N 2018 Approximation Theory is - PDF document

Why deep nets Is deep better than shallow When Why in i w XX Shan Deep Somehistory 80 S N N 2018 Approximation Theory is depth better why Shallow networks wit O O 02 k X kernel machine Rain example 9 K x a xi y ta Ef Gil

R a D I A T E Collaboration Ishida J-PARC | 7 th High-Power Targetry Workshop, MSU-FRIB, 4-8

AAT Tax Update Series 2014 1 Presenters Michael Steed MA(Cantab), CTA(Fellow), ATT(Fellow),

Module 15 Standard Costing and Variance Analysis Dr. Varadraj Bapat 1 Standard Costing

Latin and Greek Elements in English Lesson 12: Specialization and Generalization part of our

The Diopsis Multiprocessor Tile of ShApes The Diopsis Multiprocessor Tile of ShApes Pier

1 8th Grade Thermal Energy Study Guide 20151009 www.njctl.org 2 Thermal Energy Study

http://ecosequestrust.org/GAMA resilience.io programme Global Update Global Update Preliminary

Domain adaptation with optimal transport from mapping to learning with joint distribution R.

1 & 2 Samuel Series Lesson #021 August 11, 2015 Dean Bible Ministries

Tracking Communities of Spammers by Evolutionary Clustering Kevin Xu 1 , Mark Kliger 2 , Alfred O.

Racetrack video Hangry is a clever portmanteau of hungry and angry, and an adjective that

Extending the MHD ePortal Feng He Supervisors: Rob Baxter, Lindsay Pottage, Gavin Pringle Summer

Mrs. Carter (SCITT student) Support Staff: Mrs. Eckhardt Miss Waters Mrs.

Motivation Laboratory work in TDDI04 Operating systems are nearly Introduction to Pintos

Writing and Writing Strategies Lydia Stack lstack@mac.com 1 What is Writing? Make a T

Big Red Biosecurity Program MODULE 3 Biosecurity Principles and Practices Description of Module

2 Syntax 2.1 Words 2.2 The Elements of Simple Noun Phrases 2.3 Verb Phrases and Simple

TEI Manuscript Description James Cummings July 2014 1/35 Manuscript Description Why are

Welcome! www.cafod.org.uk Slide 2 Objectives for the Meeting To provide some space and time

Feasibility of Motion Primitives for Choreographed Quadrocopter Flight Angela Schoellig, Markus

Polynomial approximation via de la Vall ee Poussin means Lecture 2: Discrete operators Woula

Polynomial approximation via de la Vall ee Poussin means Woula Themistoclakis CNR - National

de la Vallee Poussin

Rational minimax approximation via adaptive barycentric representations Silviu Filip , CAIRN team,

XX Shan Deep Somehistory 80 S N N 2018 Approximation Theory is - PDF document

Why deep nets Is deep better than shallow When Why in i w XX Shan Deep Somehistory 80 S N N 2018 Approximation Theory is depth better why Shallow networks wit O O 02 k X kernel machine Rain example 9 K x a xi y ta Ef Gil

R a D I A T E Collaboration Ishida J-PARC | 7 th High-Power Targetry Workshop, MSU-FRIB, 4-8

AAT Tax Update Series 2014 1 Presenters Michael Steed MA(Cantab), CTA(Fellow), ATT(Fellow),

Module 15 Standard Costing and Variance Analysis Dr. Varadraj Bapat 1 Standard Costing

Latin and Greek Elements in English Lesson 12: Specialization and Generalization part of our

The Diopsis Multiprocessor Tile of ShApes The Diopsis Multiprocessor Tile of ShApes Pier

1 8th Grade Thermal Energy Study Guide 20151009 www.njctl.org 2 Thermal Energy Study

http://ecosequestrust.org/GAMA resilience.io programme Global Update Global Update Preliminary

Domain adaptation with optimal transport from mapping to learning with joint distribution R.

1 &amp; 2 Samuel Series Lesson #021 August 11, 2015 Dean Bible Ministries

Tracking Communities of Spammers by Evolutionary Clustering Kevin Xu 1 , Mark Kliger 2 , Alfred O.

Racetrack video Hangry is a clever portmanteau of hungry and angry, and an adjective that

Extending the MHD ePortal Feng He Supervisors: Rob Baxter, Lindsay Pottage, Gavin Pringle Summer

Mrs. Carter (SCITT student) Support Staff: Mrs. Eckhardt Miss Waters Mrs.

Motivation Laboratory work in TDDI04 Operating systems are nearly Introduction to Pintos

Writing and Writing Strategies Lydia Stack lstack@mac.com 1 What is Writing? Make a T

Big Red Biosecurity Program MODULE 3 Biosecurity Principles and Practices Description of Module

2 Syntax 2.1 Words 2.2 The Elements of Simple Noun Phrases 2.3 Verb Phrases and Simple

TEI Manuscript Description James Cummings July 2014 1/35 Manuscript Description Why are

Welcome! www.cafod.org.uk Slide 2 Objectives for the Meeting To provide some space and time

Feasibility of Motion Primitives for Choreographed Quadrocopter Flight Angela Schoellig, Markus

Polynomial approximation via de la Vall ee Poussin means Lecture 2: Discrete operators Woula

Polynomial approximation via de la Vall ee Poussin means Woula Themistoclakis CNR - National

de la Vallee Poussin

Rational minimax approximation via adaptive barycentric representations Silviu Filip , CAIRN team,

1 & 2 Samuel Series Lesson #021 August 11, 2015 Dean Bible Ministries