Multilayer Networks L eon Bottou COS 424 3/11/2010 Agenda - PowerPoint PPT Presentation

Multilayer Networks L´ eon Bottou COS 424 – 3/11/2010

Agenda Classification, clustering, regression, other. Goals Parametric vs. kernels vs. nonparametric Probabilistic vs. nonprobabilistic Representation Linear vs. nonlinear Deep vs. shallow Explicit: architecture, feature selection Explicit: regularization, priors Capacity Control Implicit: approximate optimization Implicit: bayesian averaging, ensembles Loss functions Operational Budget constraints Considerations Online vs. offline Exact algorithms for small datasets. Computational Stochastic algorithms for big datasets. Considerations Parallel algorithms. L´ eon Bottou 2/26 COS 424 – 3/11/2010

Summary 1. Brains and machines. 2. Multilayer networks. 3. Modular back-propagation. 4. Examples 5. Tricks L´ eon Bottou 3/26 COS 424 – 3/11/2010

Cybernetics Mature communication technologies: telegraph, telephone, radio, . . . Nascent computing technologies: Eniac (1946) Norber Wiener (1948) Cybernetics or Control and Communication in the Animal and the Machine . Redefining of the man–machine boundary. L´ eon Bottou 4/26 COS 424 – 3/11/2010

What should a computer be? A universal machine to process information. – which structure? what building blocks? – which model to emulate? Biological computer Mathematical computer Mathematical logic offers a lot more guidance. → Turing machines. → Von Neumann architecture. → Software and hardware. → Today’s computer science. L´ eon Bottou 5/26 COS 424 – 3/11/2010

An engineering perspective on the brain The brain as a computer – Compact – Energy efficient (20 Watts) – Amazingly good for perception and informal reasoning. Bill of materials ≈ 90%: support, energy, cooling. ≈ 10%: signalling wires. A lot of wires in a small box – Severe wiring constraints force a very specific architecture. – Local connections (98%) vs. long distance connections (2%). – Layered structure (at least in the visual system.) – This is not a universal machine! – But this machine defines what we belive is interesting! L´ eon Bottou 6/26 COS 424 – 3/11/2010

Computing with artificial neurons? McCulloch and Pitts (1943) Retina – Neurons as linear threshold units. Associative area Treshold element sign (w’ x) w’ x Perceptron (1957) x Adaline (1961) – Training linear threshold units. – A viable computing primitive? ⇐ People really tried things! – Madaline, NeoCognitron. – But how to train them? L´ eon Bottou 7/26 COS 424 – 3/11/2010

Computing with artificial neurons? Circuits of linear threshold units? – You can do complicated things that actually work. . . – But how to train them? Fukushima’s NeoCognitron (1980) – Leveraging symmetries and invariances. L´ eon Bottou 8/26 COS 424 – 3/11/2010

Minsky and Papert “Perceptrons” (1969) Cicuits of logic gates – Linear threshold unit ≈ logic gate. – Computers ≈ lots of logic gates. – Which functions require what kind of circuit? Counter-examples – Easily solvable on a general purpose computer. – Demand deep circuits to solve effectively. – Perceptron can train a single logic gate! – Training deep circuits seem hopeless. In the background – Universal computers need a universal representation of knowledge. – Mathematical logic is offering first order logic. – First order logic can represent a lot more than perceptrons. – This is absolutely correct. L´ eon Bottou 9/26 COS 424 – 3/11/2010

Choose your Evil Training first order logic Training deep circuits of logic gates – Symbolic domains, discrete space, – Combinatorial explosion, – Non Polynomial Continuous approximations – Replace the threshold by a sigmoid function. – Continuous and differentiable. – Usually nonconvex. Circuits of linear units − → Multilayer networks (1985) First order logic − → Markov Logic networks (2010) Human logic − → ? L´ eon Bottou 10/26 COS 424 – 3/11/2010

Multilayer networks, 1980s style “ ANN accurately predicts the effectiveness of the Micro-Compact Heat Exchanger and compares well with those obtained from the finite element simulation. [. . . ] computational effort has been minimized and simulation time has been drastically reduced. ” L´ eon Bottou 11/26 COS 424 – 3/11/2010

Multilayer networks, modularized The generic brick ∂L ∂L ∂y × ∂y = ∂w ∂w �� ∂L ∂L ∂y × ∂y = ∂x ∂x �� Forward pass in a two layer network – Present example x , compute output f ( x ) , compute loss L ( x, y, w ) . �� L´ eon Bottou 12/26 COS 424 – 3/11/2010

Back-propagation algorithm Backward pass in the two layer network – Set dL/dL = 1 , compute gradients dL/dy and dL/dw for all boxes. �� Update weights – For instance with a stochastic gradient update. ∂L w ← w − γ t ∂w ( x, y, w ) . L´ eon Bottou 13/26 COS 424 – 3/11/2010

Modules Build representations with any piece you need. Module Symbol Forward Backward Gradient Wx x = W ⊤ ˇ y x ⊤ Linear y = Wx ˇ y w = ˇ ˇ (x-w) 2 y k = ( x − w k ) 2 Euclidian x = 2( x − w k )ˇ ˇ y k w k = 2( w k − x )ˇ ˇ y k sigmoid x i = σ ′ ( x i )ˇ y i = σ ( x i ) ˇ y i Sigmoid x = 2( x − y )ˇ MSE L = ( x − y ) 2 MSE loss ˇ L I( yx ≤ 0)ˇ Perceptron loss Perceptron L = max { 0 , − yx } x = − 1 ˇ L x = − (1 + e yx ) − 1 ˇ LogLoss L = log(1 + e − yx ) Log loss ˇ L · · · L´ eon Bottou 14/26 COS 424 – 3/11/2010

Combine modules L´ eon Bottou 15/26 COS 424 – 3/11/2010

Composite modules Convolutional module – many linear modules with shared parameters. Remember the NeoCognitron? L´ eon Bottou 16/26 COS 424 – 3/11/2010

CNNs for signal processing Time-Delay Neural Networks – 1990: speaker-independent phoneme recognition – 1991: speaker-independent word recognition – 1992: continuous speech recognition. L´ eon Bottou 17/26 COS 424 – 3/11/2010

CNNs for image analysis 2D Convolutional Neural Networks – 1989: isolated handwritten digit recognition – 1991: face recognition, sonar image analysis – 1993: vehicle recognition – 1994: zip code recognition – 1996: check reading C3: f. maps 16@10x10 C1: feature maps S4: f. maps 16@5x5 INPUT 6@28x28 32x32 S2: f. maps C5: layer OUTPUT F6: layer 6@14x14 120 10 84 Gaussian connections Full connection Subsampling Subsampling Full connection Convolutions Convolutions L´ eon Bottou 18/26 COS 424 – 3/11/2010

CNNs for character recognition C1 S2 C3 S4 C5 Output 4 4 4 F6 4 3 8 4 3 3 L´ eon Bottou 19/26 COS 424 – 3/11/2010

CNNs for face recognition Note: same code as the digit recognizer. L´ eon Bottou 20/26 COS 424 – 3/11/2010

Combining CNNs and HMM E dforw + − C dforw C forw C1 C3 C5 Answer 2345 Forward Scorer Compose + Viterbi Constrained G c Forward Scorer SDNN Interpretation Graph 2 33 4 5 Output Desired F6 Path Selector Sequence G int Interpretation Graph Input Character Compose Model Transducer S....c.....r......i....p....t SDNN Output s....e.....n.....e.j...o.T 5......a...i...u......p.....f SDNN Transformer L´ eon Bottou 21/26 COS 424 – 3/11/2010

Combining CNNs and HMM 540 1114 55 4 0 1 1 1 441 Answer 678 3514 SDNN 6 777 88 3 55 114 output F6 Input L´ eon Bottou 22/26 COS 424 – 3/11/2010

Combining CNNs and FSTs Check reading involves Viterbi Answer – locating the fields. Best Amount Graph – segmenting the characters. Viterbi Transformer – recognizing the characters. "$" 0.2 Interpretation Graph "*" 0.4 "3" 0.1 – making sense of the string. ....... Grammar Compose "$" 0.2 Global training Recognition Graph "*" 0.4 "3" 0.1 "B" 23.6 ....... – integrate all these modules Recognition Transformer into a single trainable system. $ 3 * Segmentation Graph 45 ** Segmentation Transf. Deployment $ *** 3.45 Field Graph 45/xx $10,000.00 – deployed in 1996-1997 Field Location Transf. – was still in use in 2007. 2nd Nat. Bank Check Graph not to exceed $10,000.00 $ *** 3.45 three dollars and 45/xx – processing ≈ 15% of the US checks. L´ eon Bottou 23/26 COS 424 – 3/11/2010

Optimisation for multilayer network The simplest multilayer network �� – Two weights w 1 , w 2 – One example { (1 , 1) } �� L´ eon Bottou 24/26 COS 424 – 3/11/2010

Optimisation for multilayer network Landscape – Ravine along w 1 w 2 = 1 . – Massive saddle point near the origin. – Mountains in the quadrants w 1 w 2 < 0 . – Plateaux in the distance. Tricks of the trade – How to initialize the weights? – How to avoid the great saddle point? – etc. L´ eon Bottou 25/26 COS 424 – 3/11/2010

Multilayer Networks L eon Bottou COS 424 3/11/2010 Agenda - PowerPoint PPT Presentation

Multilayer Networks L eon Bottou COS 424 3/11/2010 Agenda Classification, clustering, regression, other. Goals Parametric vs. kernels vs. nonparametric Probabilistic vs. nonprobabilistic Representation Linear vs. nonlinear Deep vs.

Introduction to Machine Learning Multilayer Perceptron Barnabs Pczos The Multilayer

CORE DECOMPOSITION AND DENSEST SUBGRAPH IN MULTILAYER NETWORKS CORE DECOMPOSITION AND DENSEST

MULTILAYER NEURAL NETWORKS Jeff Robble, Brian Renzenbrink, Doug Roberts Multilayer Neural

New CDE Type RA 125 C Radial, Multilayer Film Capacitors For high-frequency RFI/EMI

LCA OF BIODEGRADABLE LCA OF BIODEGRADABLE MULTILAYER FILM FROM MULTILAYER FILM FROM BIOPOLYMERS

CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer

Implementing a Multilayer Perceptron from Scratch Implementing a Multilayer Perceptron from

MultiLayer Neural Networks Xiaogang Wang xgwang@ee.cuhk.edu.hk January 15, 2019 cuhk Xiaogang

Applied Machine Learning Applied Machine Learning Multilayer Perceptron Siamak Ravanbakhsh

CSC421/2516 Lecture 3: Multilayer Perceptrons Roger Grosse and Jimmy Ba Roger Grosse and Jimmy

Enhancement of near-cloaking using multilayer structures Mikyoung LIM (KAIST) June 23, 2012

Artificial Neural Networks Threshold units Gradient descent Multilayer networks

Perceptrons Introduction: Neural Networks 1 The Perceptron 2 Using Perceptrons Perceptrons

Perceptrons Steven J Zeil Old Dominion Univ. Fall 2010 1 Introduction: Neural Networks The

Energy-aware Traffic Allocation to Optical Lightpaths in Multilayer Core Networks Prof. P,

Multilayer networks in GraphLab An open source project study Mariana Marasoiu, R212 GraphLab

CPSC 121: Models of Computation Module 1: Propositional Logic By the start of the class, you

MLE example Let us consider a mock loop simulator of the cardiovascular system, i.e., a circuit

Chapter 2 Chapter 2 Systems Defined by Systems Defined by Differential or Difference

Theorem Proving for Verification John Harrison Intel Corporation Galois talk (repeat of CAV

scattering on electrons: Ge & Xe Detectors Mukesh Kumar Pandey Dept. of Physics, National

Ionization cross sections of neutrino non-standard interactions with electrons Chih-Pan Wu

= 2 Molecules can absorb and emit electromagnetic radiation in a similar

Welcome! Todays Agenda: Introduction Float to Fixed Point and Back Operations

Multilayer Networks L eon Bottou COS 424 3/11/2010 Agenda - PowerPoint PPT Presentation

Multilayer Networks L eon Bottou COS 424 3/11/2010 Agenda Classification, clustering, regression, other. Goals Parametric vs. kernels vs. nonparametric Probabilistic vs. nonprobabilistic Representation Linear vs. nonlinear Deep vs.

Introduction to Machine Learning Multilayer Perceptron Barnabs Pczos The Multilayer

CORE DECOMPOSITION AND DENSEST SUBGRAPH IN MULTILAYER NETWORKS CORE DECOMPOSITION AND DENSEST

MULTILAYER NEURAL NETWORKS Jeff Robble, Brian Renzenbrink, Doug Roberts Multilayer Neural

New CDE Type RA 125 C Radial, Multilayer Film Capacitors For high-frequency RFI/EMI

LCA OF BIODEGRADABLE LCA OF BIODEGRADABLE MULTILAYER FILM FROM MULTILAYER FILM FROM BIOPOLYMERS

CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer

Implementing a Multilayer Perceptron from Scratch Implementing a Multilayer Perceptron from

MultiLayer Neural Networks Xiaogang Wang xgwang@ee.cuhk.edu.hk January 15, 2019 cuhk Xiaogang

Applied Machine Learning Applied Machine Learning Multilayer Perceptron Siamak Ravanbakhsh

CSC421/2516 Lecture 3: Multilayer Perceptrons Roger Grosse and Jimmy Ba Roger Grosse and Jimmy

Enhancement of near-cloaking using multilayer structures Mikyoung LIM (KAIST) June 23, 2012

Artificial Neural Networks Threshold units Gradient descent Multilayer networks

Perceptrons Introduction: Neural Networks 1 The Perceptron 2 Using Perceptrons Perceptrons

Perceptrons Steven J Zeil Old Dominion Univ. Fall 2010 1 Introduction: Neural Networks The

Energy-aware Traffic Allocation to Optical Lightpaths in Multilayer Core Networks Prof. P,

Multilayer networks in GraphLab An open source project study Mariana Marasoiu, R212 GraphLab

CPSC 121: Models of Computation Module 1: Propositional Logic By the start of the class, you

MLE example Let us consider a mock loop simulator of the cardiovascular system, i.e., a circuit

Chapter 2 Chapter 2 Systems Defined by Systems Defined by Differential or Difference

Theorem Proving for Verification John Harrison Intel Corporation Galois talk (repeat of CAV

scattering on electrons: Ge &amp; Xe Detectors Mukesh Kumar Pandey Dept. of Physics, National

Ionization cross sections of neutrino non-standard interactions with electrons Chih-Pan Wu

= 2 Molecules can absorb and emit electromagnetic radiation in a similar

Welcome! Todays Agenda: Introduction Float to Fixed Point and Back Operations

scattering on electrons: Ge & Xe Detectors Mukesh Kumar Pandey Dept. of Physics, National