Machine learning from a complexity point of view Artemy - PowerPoint PPT Presentation

Machine learning   from a complexity point of view Artemy Kolchinsky   SFI CSSS 2019 � 1

PART I: Overview PART II: Deep nets deep dive What is machine learning? Why do deep nets work so well? Learning in deep nets, in the What are neural networks? brain, and in evolution The rise of deep learning Caveats of deep learning � 2

(1)   What   is machine learning ? � 3

Artificial Intelligence vs. Machine Learning Artificial intelligence :   General science of creating intelligent automated systems   Chess playing, robot control, automating industrial processes, etc.   Machine learning (ML) :   Subset of AI , aims to develop algorithms that can learn from data   Strongly influenced by statistics   � 4

Example that’s not ML Example of ML problem Given data , make model of how personal “Traffic collision avoidance system” (TCAS)   annual income depends on if distance(plane1, plane2)<=1.0 sound_alarm() - Age if altitude(plane1)>=altitude(plane2) - Gender alert(plane1, GO_UP) - Years in school else - Zip code alert(plane2, GO_UP) … - … � 5

Unsupervised Learning Supervised Learning learn an input → output mapping   find meaningful patterns in data   (“right answer” usually unknown) (“right answer” provided) “Dog” Dimensionality   Identify Reduction clusters “Tengo hambre” “I’m hungry” Reinforcement Learning Generative Modelling learn control strategy based on   generate high-resolution audio,   +/- reward at end of run photo, text, etc. de novo ⊙ initial   state ⊙ target state motor program � 6

          Supervised Learning Training data set   Statistical model Parameterized set of   New input x   → “Cat” input-output maps:   { Output = f θ (Input) } θ → ???   → “Dog” “Trained Model”   f θ * → “Cat” Predictions f θ * ( x )   Training algorithm → “Dog”   Chooses optimal   → “Cat”   parameter values θ * …. Example models/algorithms: logistic regression , support vector machines (SVMs), random forests , neural networks , “ deep learning ” (deep neural networks), etc. Each algorithm has strengths and weaknesses.   No “universally” best one for all domains / situations � 7

          A geometric view of supervised learning Image can be represented digitally as Each vector indicates a point in high- list of numbers specifying RGB color dimensional “ data space”   intensities at each pixel (a “ vector ”) (# dimensions = 3 × # of pixels) = <0.271,0.543,0.198,0.362,…> For conceptual simplicity , consider as coordinates in an abstract 2-D space = <0.842,0.527,0.924,0.421,…> = <0.873,0.321,0.187,0.011,…>   = <0.641,0.874,0.983,0.232,…> Cat Dog � 8

A geometric view of supervised learning Training dataset Training algorithm “Data space” selects parameters (i.e., twists “knobs”) to find the best separating surface Error Cat Dog “Loss   surface” Choose parameters via   θ * θ 2 � θ * = argmin θ Error( θ , TrainData) θ 1

A geometric view of supervised learning × “dog” The separating surface splits “data space” into dog and cat regions � 10

A geometric view of supervised learning “Training error” on training dataset   Training adjusts parameters to minimize such errors × “Testing error”   Errors made on new data provided after training � 11

A geometric view of supervised learning “Underfitting” “Overfitting” Too many parameters   Too few parameters Good model won’t generalize on new data   Doesn’t fit data well (i.e., “ memorized ” training data, rather than learnt “ the pattern ”) � 12

A geometric view of supervised learning “Underfitting” “Overfitting” × × × “dog” ✓ “cat” ✗ “cat” ✗ � 13

“ Generalization performance ”:   ability of learning algorithm to do well on new data How to select optimal number of parameters? 1. Cross-validation : split training data into two chunks; train on one and validate on the other Testing error Error 2. Regularization : prevent overfitting by penalizing models that are “too flexible” Training error θ * = argmin θ TrainError ( θ ) + λ ∥ θ ∥ 2 E.g.: � # of parameters CAVEAT : in Part II, we’ll see that recent research is putting much of the “common wisdom” about the above trade-off curve into question! � 14

Supervised learning summary Supervised learning uses training data to learn input-output mapping Many supervised learning algorithms exist, each with different strengths Goal is low testing error on new unseen data High testing error when model is too simple and underfits , or when model is too complex and overfits � 15

(2)   What   are   neural   nets   ? � 16

1940s: Donald Hebb Proposed that networks of simple interconnected units (aka “ nodes ” or “ neurons ”) using simple rules can learn to perform very complicated tasks The simplest rule: if two units are active at the same time, strengthen the connection between them (“Hebbian learning”) Inspired by biological neurons � 17

Late 1950s: Perceptron A computational model of learning by psychologist Frank Rosenblatt The first neural network , along with a learning rule to minimize training error Demonstrated that it could recognize simple patterns � 18

  Late 1950s: Perceptron “Threshold Nonlinearity”: ∑ i w i x i < b 0 if � Weighted   ∑ i w i x i ≥ b 1 if � Sum Input 1   � x 1 w 1 ∑ w i x i Output y   (either 0 or 1) i w 2 Input 2   � x 2 w 1 w 2 θ and : connections “weights”, i.e., the parameters Learning involves following a simple rule for changing the weights, so as to minimize training error � 19

Late 1950s: Perceptron Perceptron separating surface is a line Input 1   � x 1 w 1 Output y   Σ + (0 or 1) w 2 Input 2   � x 2 Has almost all the ingredients of a modern neural network � 20

1969: Minsky & Papert, Perceptrons Two AI pioneers analyzed mathematics of learning with perceptrons Showed that a single-layer perceptron could never be taught to recognize some simple patterns Killed neural network research for 20 years � 21

1969: Minsky & Papert, Perceptrons Non-linearly   Perceptron separating separable problem: surface is a line w 1 x 1 Σ + w 2 x 2 � 22

1969: Minsky & Papert, Perceptrons Linearly   Non-linearly   separable problem: separable problem: w 1 x 1 Σ + w 2 x 2 Perceptron cannot learn this Perceptron can learn this � 23

1986: Modern neural nets Three crucial ingredients: Nature, 1986 1.More layers 2.Differentiable activations and error functions 3.New training algorithm (“backpropagation”) � 24

More layers “Intersection Nonlinearity” 0 if Σ i x i < 2   1 if Σ i x i ≥ 2 Input1   Σ + x 1 Σ + Out Σ + Input2   x 2 Can solve non-linearly separable problems! � 25

Learning by gradient descent : Differentiability θ t +1 = θ t − α ∇ L ( θ ) Input1   Σ + Error x 1 Σ + Out θ 2 Σ + Input2   x 2 θ 1 Threshold nonlinearity replaced by Differentiable error: differentiable activation function 2 ∑ 1 ( f θ ( x ) − y ) x i = ϕ ( ∑ j w ji x j ) E.g.: L ( θ ) = E.g.: ϕ ( x ) = 1 + e − x x , y ∈ Dataset � 26

� 1986: Backpropagation For prediction , activity flows forward Learning by gradient descent : layer-by-layer, from inputs to outputs θ t +1 = θ t − α ∇ L ( θ ) � Unfortunately, in general ∇ L ( θ )   can hard to compute! Input1   Σ + x 1 Σ + Out The backpropagation trick x ( i +1) = ϕ ( W ( i ) x ( i ) ) � Σ + Input2   x 2 ↓ � chain rule of calculus ∂ x ( i +1) ∂ L ∂ L = ∂ x ( i ) ∂ x ( i +1) ∂ x ( i ) For learning , error gradients flow backwards Error gradient   Error gradient   Partial of layer   in layer i in layer i +1 i+1 w.r.t layer i layer-by-layer, from outputs to inputs � 27

1989: Universal Approximation Theorem f : ℝ n → ℝ Any continuous function can be computed by a neural network with one hidden layer, up to any desired accuracy ε > 0. (Cybenko, 1989; Hornik, 1991). Caveat 1 : The number of hidden neurons may be exponentially large. Caveat 2 : We can represent any function. But that doesn’t guarantee that we can learn any function (even given infinite data!) � 28

1990s - 2010s Neural nets attract attention from cognitive scientists and psychologists However, their was not competitive for most applications A neural network winter lasts for two decades � 29

Neural networks summary ● Neural nets: supervised learning algorithms consisting of multiple layers of interconnected “neurons”, with nonlinear transformations ● Connection strengths (“ weights ”) are the learnable parameters ● Trained using backpropagation , a clever trick for efficient gradient descent ● Foundational neural net ideas begin in the 40s-50s; appeared in their modern form by the mid-1980s � 30

(3)   The   rise   of   deep learning � 31

Machine learning from a complexity point of view Artemy - PowerPoint PPT Presentation

Machine learning from a complexity point of view Artemy Kolchinsky SFI CSSS 2019 1 PART I: Overview PART II: Deep nets deep dive What is machine learning? Why do deep nets work so well? Learning in deep nets, in the What are

Cumbernauld Academy Existing aerial view from west Site Plan Aerial view from South Aerial view

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

From Complexity to Intelligence Machine Learning and Complexity 17 novembre 2016

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Maple View Flats 1.24.17 1 Site Plan Maple View Flats - 1.24.17 Historic Homes to be moved and

Student view socrative.com STUDENT LOGIN or LOGIN, STUDENT LOGIN Room name: ALLIANCE Student

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Commercial Detection in Heterogeneous Video Streams Using Fused Multi-Modal and Temporal Features

Language-based Colorization of Scene Sketches Changqing Zou* 1,2 , Haoran Mo* 1 , Chengying Gao 1 ,

Preprocessing and Dimensionality Reduction J er emy Fix CentraleSup elec

Priority Queues & Heaps CS16: Introduction to Data Structures & Algorithms Spring 2020

IoDDoS The Internet of Distributed Denial of Service Attacks A Case Study of the Mirai

L A S T E X P L O I T A T I O N cp / zet / f9a About Us Researchers from TeamT5 Core

Fanfiction, Canon, and Possible Worlds, Or, Why Academics Should Care About Fanfiction Dr. Sara

Latent Variable Models for Text, Event, and Network Data MURI Project: University of California,

Sambuz

Useful Links

Newsletter

Mail Us

Machine learning from a complexity point of view Artemy - PowerPoint PPT Presentation

Machine learning from a complexity point of view Artemy Kolchinsky SFI CSSS 2019 1 PART I: Overview PART II: Deep nets deep dive What is machine learning? Why do deep nets work so well? Learning in deep nets, in the What are

Cumbernauld Academy Existing aerial view from west Site Plan Aerial view from South Aerial view

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

From Complexity to Intelligence Machine Learning and Complexity 17 novembre 2016

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Maple View Flats 1.24.17 1 Site Plan Maple View Flats - 1.24.17 Historic Homes to be moved and

Student view socrative.com STUDENT LOGIN or LOGIN, STUDENT LOGIN Room name: ALLIANCE Student

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Commercial Detection in Heterogeneous Video Streams Using Fused Multi-Modal and Temporal Features

Language-based Colorization of Scene Sketches Changqing Zou* 1,2 , Haoran Mo* 1 , Chengying Gao 1 ,

Preprocessing and Dimensionality Reduction J er emy Fix CentraleSup elec

Priority Queues &amp; Heaps CS16: Introduction to Data Structures &amp; Algorithms Spring 2020

IoDDoS The Internet of Distributed Denial of Service Attacks A Case Study of the Mirai

L A S T E X P L O I T A T I O N cp / zet / f9a About Us Researchers from TeamT5 Core

Fanfiction, Canon, and Possible Worlds, Or, Why Academics Should Care About Fanfiction Dr. Sara

Latent Variable Models for Text, Event, and Network Data MURI Project: University of California,

Sambuz

Useful Links

Newsletter

Mail Us

Priority Queues & Heaps CS16: Introduction to Data Structures & Algorithms Spring 2020