Discrete Geometry meets Machine Learning Amitabh Basu Johns - PowerPoint PPT Presentation

Discrete Geometry meets Machine Learning Amitabh Basu Johns Hopkins University 22nd Combinatorial Opt. Workshop at Aussois January 11, 2018 Joint work with Anirbit Mukherjee, Raman Arora, Poorya Mianjy

Two Problems in Discrete Geometry Problem 1: Given two polytopes P and Q, do there exist simplices A 1 , …, A p and B 1 , …, B q such that P + A 1 + … + A p = Q + B 1 + … + B q

Two Problems in Discrete Geometry Problem 1: Given two polytopes P and Q, do there exist simplices A 1 , …, A p and B 1 , …, B q such that P + A 1 + … + A p = Q + B 1 + … + B q Problem 2: For natural number k, define k-zonotope as the Minkowski sum of a finite set of polytopes, where each is a convex hull of k points [2-zonotope = regular zonotope] Given two 2 n -zonotopes P and Q, do there exist two 2 n+1 -zonotopes A and B such that conv(P U Q) + A = B

What is a Deep Neural Network (DNN) ?

What is a Deep Neural Network (DNN) ? • Directed Acyclic Graph (Network Architecture)

What is a Deep Neural Network (DNN) ? • Directed Acyclic Graph (Network Architecture) • Weights on every edge and every vertex

What is a Deep Neural Network (DNN) ? • Directed Acyclic Graph (Network Architecture) 2 . 45 • Weights on every edge and 2 every vertex 1 . 65 0 . 53 − 1 − 6 . 8 3

What is a Deep Neural Network (DNN) ? • Directed Acyclic Graph (Network Architecture) 2 . 45 • Weights on every edge and 2 every vertex 1 . 65 0 . 53 • R -> R “Activation Function” Examples: f(x) = max{0,x} — Rectified − 1 Linear Unit (ReLU) f(x) = e x /(1 + e x ) — Sigmoid − 6 . 8 3

What is a Deep Neural Network (DNN) ? • Directed Acyclic Graph (Network Architecture) 2 . 45 • Weights on every edge and 2 every vertex 1 . 65 0 . 53 • R -> R “Activation Function” Examples: f(x) = max{0,x} — Rectified − 1 Linear Unit (ReLU) f(x) = e x /(1 + e x ) — Sigmoid − 6 . 8 3 • Sources = input, Sinks = output

What is a Deep Neural Network (DNN) ? • Directed Acyclic Graph (Network Architecture) 2 . 45 • Weights on every edge and 2 every vertex y 1 x 1 1 . 65 0 . 53 • R -> R “Activation Function” Examples: x 2 f(x) = max{0,x} — Rectified − 1 Linear Unit (ReLU) y 2 x 3 f(x) = e x /(1 + e x ) — Sigmoid − 6 . 8 3 • Sources = input, Sinks = output

What is a Deep Neural Network (DNN) ? • Directed Acyclic Graph (Network Architecture) 2 . 45 • Weights on every edge and 2 every vertex u 1 y 1 x 1 a 1 1 . 65 0 . 53 u 2 a 2 • R -> R “Activation Function” o b Examples: x 2 a k f(x) = max{0,x} — Rectified u k − 1 Linear Unit (ReLU) o = f ( a 1 u 1 + a 2 u 2 + . . . + a k u k + b ) y 2 x 3 f(x) = e x /(1 + e x ) — Sigmoid − 6 . 8 3 • Sources = input, Sinks = output

What is a Deep Neural Network (DNN) ? • Directed Acyclic Graph (Network Architecture) 2 . 45 • Weights on every edge and 2 every vertex u 1 y 1 x 1 a 1 1 . 65 0 . 53 u 2 a 2 • R -> R “Activation Function” o b Examples: x 2 a k f(x) = max{0,x} — Rectified u k − 1 Linear Unit (ReLU) o = max { 0 , a 1 u 1 + a 2 u 2 + . . . + a k u k + b } y 2 x 3 f(x) = e x /(1 + e x ) — Sigmoid − 6 . 8 3 • Sources = input, Sinks = output

Problems of interest for DNNs • Expressiveness: What family of functions can one represent using DNNs? • Efficiency: How many layers (depth) and vertices (size) needed to represent functions in the family? • Training the network: Given architecture, data points (x,y), find weights for the ``best fit” function. • Generalization error: Rademacher complexity, VC dimension

Calculus of DNN functions

Calculus of DNN functions • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 + f 2 in DNN(max{k 1 ,k 2 }, s 1 +s 2 )

Calculus of DNN functions • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 + f 2 in DNN(max{k 1 ,k 2 }, s 1 +s 2 ) f 1 x y = f 1 + f 2 f 2

Calculus of DNN functions • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 + f 2 in DNN(max{k 1 ,k 2 }, s 1 +s 2 ) • f in DNN(k,s), c in R => cf in DNN(k,s)

Calculus of DNN functions • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 + f 2 in DNN(max{k 1 ,k 2 }, s 1 +s 2 ) • f in DNN(k,s), c in R => cf in DNN(k,s) • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 o f 2 in DNN(k 1 +k 2 , s 1 +s 2 )

Calculus of DNN functions • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 + f 2 in DNN(max{k 1 ,k 2 }, s 1 +s 2 ) • f in DNN(k,s), c in R => cf in DNN(k,s) f 2 f 1 x f 1 � f 2 • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 o f 2 in DNN(k 1 +k 2 , s 1 +s 2 )

Calculus of DNN functions • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 + f 2 in DNN(max{k 1 ,k 2 }, s 1 +s 2 ) • f in DNN(k,s), c in R => cf in DNN(k,s) • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 o f 2 in DNN(k 1 +k 2 , s 1 +s 2 ) • f 1 in ReLU-DNN(k 1 ,s 1 ), f 2 in ReLU-DNN(k 2 ,s 2 ) => max{f 1 , f 2 } in ReLU-DNN(max{k 1 ,k 2 }+1, s 1 +s 2 +4)

Calculus of DNN functions • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 + f 2 in DNN(max{k 1 ,k 2 }, s 1 +s 2 ) F : R n ! R 2 , F ( x ) = ( f 1 ( x ) , f 2 ( x )) • f in DNN(k,s), c in R => cf in DNN(k,s) G : R 2 ! R , G ( z 1 , z 2 ) = max { z 1 , z 2 } max { f 1 , f 2 } = G � F • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 o f 2 in DNN(k 1 +k 2 , s 1 +s 2 ) • f 1 in ReLU-DNN(k 1 ,s 1 ), f 2 in ReLU-DNN(k 2 ,s 2 ) => max{f 1 , f 2 } in ReLU-DNN(max{k 1 ,k 2 }+1, s 1 +s 2 +4)

Calculus of DNN functions • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 + f 2 in DNN(max{k 1 ,k 2 }, s 1 +s 2 ) G : R 2 → R , G ( z 1 , z 2 ) = max { z 1 , z 2 } • f in DNN(k,s), c in R => cf in DNN(k,s) + | z 1 − z 2 | max { z 1 , z 2 } = z 1 + z 2 2 2 • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 o f 2 in DNN(k 1 +k 2 , s 1 +s 2 ) • f 1 in ReLU-DNN(k 1 ,s 1 ), f 2 in ReLU-DNN(k 2 ,s 2 ) => max{f 1 , f 2 } in ReLU-DNN(max{k 1 ,k 2 }+1, s 1 +s 2 +4)

Calculus of DNN functions • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 + f 2 in DNN(max{k 1 ,k 2 }, s 1 +s 2 ) 1 1 1 2 Input x 1 -1 − 1 • f in DNN(k,s), c in R => cf in DNN(k,s) 2 -1 + | x 1 − x 2 | x 1 + x 2 2 2 -1 1 2 Input x 2 1 1 1 2 -1 • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 o f 2 in DNN(k 1 +k 2 , s 1 +s 2 ) • f 1 in ReLU-DNN(k 1 ,s 1 ), f 2 in ReLU-DNN(k 2 ,s 2 ) => max{f 1 , f 2 } in ReLU-DNN(max{k 1 ,k 2 }+1, s 1 +s 2 +4)

Calculus of DNN functions • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 + f 2 in DNN(max{k 1 ,k 2 }, s 1 +s 2 ) • f in DNN(k,s), c in R => cf in DNN(k,s) • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 o f 2 in DNN(k 1 +k 2 , s 1 +s 2 ) • f 1 in ReLU-DNN(k 1 ,s 1 ), f 2 in ReLU-DNN(k 2 ,s 2 ) => max{f 1 , f 2 } in ReLU-DNN(max{k 1 ,k 2 }+1, s 1 +s 2 +4) • Affine functions can be implemented in ReLU-DNN(1,2n)

Problems of interest for DNNs • Expressiveness: What family of functions can one represent using DNNs? • Efficiency: How many layers (depth) and vertices (size) needed to represent functions in the family? • Training the network: Given architecture, data points (x,y), find weights for the ``best fit” function. • Generalization error: Rademacher complexity, VC dimension

Expressiveness of ReLU DNNs Theorem (Arora, Basu, Mianjy, Mukherjee 2016): Any ReLU DNN with ’n’ inputs implements a continuous piecewise affine function on R n . Conversely, any continuous piecewise affine function on R n can be implemented by some ReLU DNN. Moreover, at most log(n+1) hidden layers are needed.

Expressiveness of ReLU DNNs Theorem (Arora, Basu, Mianjy, Mukherjee 2016): Any ReLU DNN with ’n’ inputs implements a continuous piecewise affine function on R n . Conversely, any continuous piecewise affine function on R n can be implemented by some ReLU DNN. Moreover, at most log(n+1) hidden layers are needed. Proof: Tropical Geometry result [Ovchinnikov 2002] says any continuous piecewise affine function can be written as max i=1, …, k min j in Si {l j }

Discrete Geometry meets Machine Learning Amitabh Basu Johns - PowerPoint PPT Presentation

Discrete Geometry meets Machine Learning Amitabh Basu Johns Hopkins University 22nd Combinatorial Opt. Workshop at Aussois January 11, 2018 Joint work with Anirbit Mukherjee, Raman Arora, Poorya Mianjy Two Problems in Discrete Geometry

Stochastic geometry and random generation 1 Stochastic geometry and random generation

4.1 Discrete Differential Geometry Hao Li http://cs599.hao-li.com 1 Outline Discrete

4.2 Discrete Differential Geometry Hao Li http://cs621.hao-li.com 1 Outline Discrete

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

48-175 Descriptive Geometry Basic Concepts of Descriptive Geometry Descriptive geometry is

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Discrete Mathematics Jeremy Siek Spring 2010 Jeremy Siek Discrete Mathematics 1 / 118 Jeremy

Cyber-Physical Systems Discrete Dynamics IECE 553/453 Fall 2019 Prof. Dola Saha 1 Discrete

CMSC 222: Discrete Mathematics Prof S Fall 2018 What is Discrete Mathematics? Discrete

Cyber-Physical Systems Discrete Dynamics ICEN 553/453 Fall 2018 Prof. Dola Saha 1 Discrete

Plan Discrete paths as Heyting algebras Discrete paths as categories Discrete paths as quantales

Discrete-time Systems in the Time Domain Chaiwoot Boonyasiriwat August 21, 2020 Discrete-time

Lecture 6 Lecture 6

The Effects of Transformations on the Optimal Set in Interval Linear Programming 1 Department of

Improved Approximation Algorithms for Budgeted Allocations Lecture outlines 1. Presenting the

Introduction to computation Lirong Xia Todays schedule Computation Linear programming: a

DISTRIBUTED HASH TABLES Soumya Basu November 5, 2015 CS 6410 OVERVIEW Why DHTs? Chord

Algorithms for Computing Betti Numbers of Semi-algebraic Sets Recent Progress and Open

Real lines on random cubic surfaces Chiara Meroni ICERM August 28, 2020 August 28, 2020 1 / 26

Polynomial threshold functions and Boolean threshold circuits Kristoffer Arnsfelt Hansen 1