Course setup 9 ec course examination based on computer exercises - PowerPoint PPT Presentation

Course setup • 9 ec course • examination based on computer exercises • weekly exercises discussed in tutorial class • All course materials (slides, exercises) and schedule via http://www.snn.ru. nl/˜bertk/machinelearning/ Bert Kappen ML 1

Handout Perceptrons The Perceptron Relevant in history of pattern recognition and neural networks. • Perceptron learning rule + convergence, Rosenblatt (1962) • Perceptron critique (Minsky and Papert, 1969) → ”Dark ages of neural networks” • Revival in the 80’s: Backpropagation and Hopfield model. Statistical physics entered. • 1995. Bayesian methods take over. Start of modern machine learning. NN out of fashion. • 2006 Deep learning, big data. Bert Kappen ML 2

Handout Perceptrons The Perceptron y ( x ) = sign ( w T φ ( x )) where � + 1 , a ≥ 0 sign ( a ) = − 1 , a < 0 . and φ ( x ) is a feature vector (e.g. hard wired neural network). Bert Kappen ML 3

Handout Perceptrons The Perceptron Ignore φ , ie. consider inputs x µ and outputs t µ = ± 1 Define w T x = � n j = 1 w j x j + w 0 . Then, the learning condition becomes sign ( w T x µ ) = t µ , µ = 1 , . . . , P We have sign ( w T x µ t µ ) = 1 w T z µ > 0 or with z µ j = x µ j t µ . Bert Kappen ML 4

Handout Perceptrons Linear separation Classification depends on sign of w T x . Thus, decision boundary is hyper plane: n 0 = w T x = � w j x j + w 0 j = 1 Perceptron can solve linearly separable problems. AND problem is linearly separable. XOR problem and linearly dependent inputs not linearly separable. Bert Kappen ML 5

Handout Perceptrons Perceptron learning rule Learning succesful when w T z µ > 0 , all patterns Learning rule is ’Hebbian’: w new w old = + ∆ w j j j j t µ = η Θ ( − w T z µ ) z µ η Θ ( − w T z µ ) x µ ∆ w j = j η is the learning rate. Bert Kappen ML 6

Handout Perceptrons Depending on the data, there may be many or few solutions to the learning problem (or non at all) The quality of the solution is determined by the worst pattern. Since the solution does not depend on the size of w : D ( w ) = 1 µ w T z µ | w | min Acceptable solutions have D ( w ) > 0 . The best solution is given by D max = max w D ( w ) . Bert Kappen ML 7

Handout Perceptrons D max > 0 iff the problem is linearly separable. Bert Kappen ML 8

Handout Perceptrons Convergence of Perceptron rule Assume that the problem is linearly separable, so that there is a solution w ∗ with D ( w ∗ ) > 0 . At each iteration, w is updated only if w · z µ < 0 . Let M µ denote the number of times pattern µ has been used to update w . Thus, � M µ z µ w = η µ Consider the quanty − 1 < w · w ∗ | w || w ∗ | < 1 We will show that √ w · w ∗ | w || w ∗ | ≥ O ( M ) , µ M µ the total number of iterations. with M = � Therefore, M can not grow indefinitely. Thus, the perceptron learning rule con- verges in a finite number of steps when the problem is linearly separable. Bert Kappen ML 9

Handout Perceptrons Proof: M µ z µ · w ∗ ≥ η M min µ z µ · w ∗ � w · w ∗ η = µ η MD ( w ∗ ) | w ∗ | = | w + η z µ | 2 − | w | 2 = 2 η w · z µ + η 2 | z µ | 2 ∆ | w | 2 = η 2 | z µ | 2 = η 2 N ≤ √ | w | ≤ η NM Thus, √ 1 ≥ w · w ∗ MD ( w ∗ ) | w || w ∗ | ≥ √ N Number of weight updates: N M ≤ D 2 ( w ∗ ) Bert Kappen ML 10

Handout Perceptrons Capacity of the Perceptron Consider P patterns in N dimensions in general position: - no subset of size less than N is linearly dependent. - general position is necessary for linear separability Question: What is the probability that a problem of P samples in N dimensions is linearly separable? Bert Kappen ML 11

Handout Perceptrons Define C ( P , N ) the number of linearly separable colorings on P points in N dimensions, with separability plane through the origin. Then (Cover 1966): � P − 1 N − 1 � � C ( P , N ) = 2 i i = 0 � P − 1 � = 2(1 + 1) P − 1 = 2 P When P ≤ N small, then C ( P , N ) = 2 � P − 1 i = 0 i � 2 N − 1 � When P = 2 N , then 50 % is linearly separable: C ( P , N ) = 2 � N − 1 = i = 0 i � 2 N − 1 � = 2 2 N − 1 = 2 P − 1 � 2 N − 1 i = 0 i Bert Kappen ML 12

Handout Perceptrons Proof by induction. Add one point X . The set C ( P , N ) consists of - colorings with separator through X (A) - rest (B) Thus, C ( P + 1 , N ) 2 A + B = C ( P , N ) + A = = C ( P , N ) + C ( P , N − 1) Yields � P − 1 N − 1 � � C ( P , N ) = 2 i i = 0 Bert Kappen ML 13

5.2 Network training Regression: t n continue valued, h 2 ( x ) = x and one usually minimizes the squared error (one output) N 1 � ( y ( x n , w ) − t n ) 2 E ( w ) = 2 n = 1 N � N ( t n | y ( x n , w ) , β − 1 ) + . . . − log = n = 1 Classification: t n = 0 , 1 , h 2 ( x ) = σ ( x ) , y ( x n , w ) is probability to belong to class 1 . N � E ( w ) = − { t n log y ( x n , w ) + (1 − t n ) log(1 − y ( x n , w )) } n = 1 N � y ( x n , w ) t n (1 − y ( x n , w )) 1 − t n − log = n = 1 Bert Kappen ML 14

5.2 Network training More than two classes: consider network with K outputs. t nk = 1 if x n belongs to class k and zero otherwise. y k ( x n , w ) is the network output N K � � − t nk log p k ( x n , w ) E ( w ) = n = 1 k = 1 exp( y k ( x , w )) p k ( x , w ) = � K k ′ = 1 exp( y k ′ ( x , w )) Bert Kappen ML 15

5.2 Parameter optimization E ( w ) w 1 w A w B w C w 2 ∇ E E is minimal when ∇ E ( w ) = 0 , but not vice versa! As a consequence, gradient based methods find a local minimum, not necessary the global minimum. Bert Kappen ML 16

5.2 Gradient descent optimization The simplest procedure to optimize E is to start with a random w and iterate w τ + 1 = w τ − η ∇ E ( w τ ) This is called batch learning, where all training data are included in the computation of ∇ E . Does this algorithm converge? Yes, if ǫ is ”sufficiently small” and E bounded from below. Proof: Denote ∆ w = − η ∇ E . � ∂ E � 2 ≤ E ( w ) � E ( w + ∆ w ) ≈ E ( w ) + ( ∆ w ) T ∇ E = E ( w ) − η ∂ w i i In each gradient descent step the value of E is lowered. Since E bounded from below, the procedure must converge asymptotically. Bert Kappen ML 17

Handouts Ch. Perceptrons Convergence of gradient descent in a quadratic well 1 � λ i w 2 E ( w ) = i 2 i − η ∂ E = − ηλ i w i ∆ w i = ∂ w i w new w old = + ∆ w i = (1 − ηλ i ) w i i i Convergence when | 1 − ηλ i | < 1 . Oscillations when 1 − ηλ i < 0 . Optimal learning parameter depends on curvature of each dimension. Bert Kappen ML 18

Handouts Ch. Perceptrons Learning with momentum One solution is adding momentum term: ∆ w t + 1 − η ∇ E ( w t ) + α ∆ w t = = − η ∇ E ( w t ) + α ( − η ∇ E ( w t − 1 ) + α ( − η ∇ E ( w t − 2 ) + . . . )) t � α k ∇ E ( w t − k ) − η = k = 0 Consider two extremes: No oscillations all derivative are equal: t η ∂ E α k = − � ≈ − η ∇ E ∆ w t + 1 1 − α ∂ w k = 0 results in acceleration Bert Kappen ML 19

Handouts Ch. Perceptrons Oscillations all derivatives are equal but have opposite sign: t η ∂ E � ( − α ) k = − ∆ w ( t + 1) ≈ − η ∇ E 1 + α ∂ w k = 0 results in decceleration Bert Kappen ML 20

Newtons method One can also use Hessian information for optimization. As an example, consider a quadratic approximation to E around w 0 : E ( w 0 ) + b T ( w − w 0 ) + 1 E ( w ) 2( w − w 0 ) H ( w − w 0 ) = H i j = ∂ 2 E ( w 0 ) ∂ E ( w 0 ) b i = ∂ w i ∂ w i ∂ w j ∇ E ( w ) b + H ( w − w 0 ) = We can solve ∇ E ( w ) = 0 and obtain w = w 0 − H − 1 ∇ E ( w 0 ) This is called Newtons method. Quadratic approximation is exact when E is quadratic, so convergence in one step. Quasi-Newton: Consider only diagonal of H . Bert Kappen ML 21

Line search Another solution is line optimisation: w 1 = w 0 + λ 0 d 0 , d 0 = −∇ E ( w 0 ) λ 0 > 0 is found by a one dimensional optimisation 0 = ∂ E ( w 0 + λ 0 d 0 ) = d 0 · ∇ E ( w 1 ) = d 0 · d 1 ∂λ 0 Therefore, subsequent search directions are orthogonal. Bert Kappen ML 22

Conjugate gradient descent We choose as new direction a combination of the gradient and the old direction d ′ 1 = −∇ E ( w 1 ) + β d 0 Line optimisation w 2 = w 1 + λ 1 d ′ 1 yields λ 1 > 0 such that d ′ 1 · ∇ E ( w 2 ) = 0 . The direction d ′ 1 is found by demanding that ∇ E ( w 2 ) ≈ 0 also in the ’old’ direction d 0 : 0 = d 0 · ∇ E ( w 2 ) ≈ d 0 · ( ∇ E ( w 1 ) + λ 1 H ( w 1 ) d ′ 1 ) or d 0 H ( w 1 ) d ′ 1 = 0 The subsequent search directions d 0 , d ′ 1 are said to be conjugate. Bert Kappen ML 23

Course setup 9 ec course examination based on computer exercises - PowerPoint PPT Presentation

Course setup 9 ec course examination based on computer exercises weekly exercises discussed in tutorial class All course materials (slides, exercises) and schedule via http://www.snn.ru. nl/bertk/machinelearning/ Bert Kappen ML

Scintillators: Setup, performance and lessons learned Ran Hong CENPA, University of Washington

Course Orientation q Course Description q Course Outcomes q Course Requirements q Course Outline

Pile Driving Setup for Ohio Soils mer Bilgin, PhD, PE University of Dayton Dayton, Ohio 2019

1.113.5 2.113.7 Set up secure shell (OpenSSH) Setup and configure basic DNS services Setup and

SCS Scorecard System V3.0 Super Admin (SHRU) Setup agency, category, location, period type

PVMD Delft University of Technology Learning objectives Typical JV testing setup Learning

School Setup & Course Scheduling Using the Digital Academy Administrator Training 1717

Hands on Virtualization with Ganeti OSCON 2011 Setup Guide This setup guide covers installing and

Organisational setup setup of of external external Organisational QA entities entities: :

Every Seed is Sacred Olivier Blazy Orr Dunkelman Saqib A. Kakvi Michael Naehrig Peter Schwabe

TensorRT 2. Setup of the TensorRT inference engine 2. Setup of the TensorRT inference engine 3. I/O

Tutorial Setup Interactive Session Temporary shell account provided Environment setup

Part 2: Setup Google Map API Project in Android Studio Everytime when you create a map app, please

Available post processors Common and uncommon settings Tool setup and variables

Forecast setup: Forecasting is about the future! The practical setup: we are at time t (e.g., at

Part 1: Setup Google Map API for Android Prerequisites Please make sure that you can deploy a

Introduction to Artificial Intelligence What is Artificial Intelligence for YOU? CPSC 533

Algorithmic Learning Theory Theoretical Computer Science Peter Rossmanith Felix Reidl, Fernando

POLAR: Attention-based CNN for One-shot Personalized Article Recommendation Zhengxiao Du, Jie

Dense Associative Memories and Deep Learning Dmitry Krotov IBM Research MIT-IBM Watson AI Lab

Fundamentals of Computational Neuroscience 2e Thomas Trappenberg December 11, 2009 Chapter 8:

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

Integer Representation But first, encode deck of cards. Representation

OpenStack Heat OpenShift Autoscaling on OpenStack Heat Steven Dake (sdake@redhat.com) Twitter:

Course setup 9 ec course examination based on computer exercises - PowerPoint PPT Presentation

Course setup 9 ec course examination based on computer exercises weekly exercises discussed in tutorial class All course materials (slides, exercises) and schedule via http://www.snn.ru. nl/bertk/machinelearning/ Bert Kappen ML

Scintillators: Setup, performance and lessons learned Ran Hong CENPA, University of Washington

Course Orientation q Course Description q Course Outcomes q Course Requirements q Course Outline

Pile Driving Setup for Ohio Soils mer Bilgin, PhD, PE University of Dayton Dayton, Ohio 2019

1.113.5 2.113.7 Set up secure shell (OpenSSH) Setup and configure basic DNS services Setup and

SCS Scorecard System V3.0 Super Admin (SHRU) Setup agency, category, location, period type

PVMD Delft University of Technology Learning objectives Typical JV testing setup Learning

School Setup &amp; Course Scheduling Using the Digital Academy Administrator Training 1717

Hands on Virtualization with Ganeti OSCON 2011 Setup Guide This setup guide covers installing and

Organisational setup setup of of external external Organisational QA entities entities: :

Every Seed is Sacred Olivier Blazy Orr Dunkelman Saqib A. Kakvi Michael Naehrig Peter Schwabe

TensorRT 2. Setup of the TensorRT inference engine 2. Setup of the TensorRT inference engine 3. I/O

Tutorial Setup Interactive Session Temporary shell account provided Environment setup

Part 2: Setup Google Map API Project in Android Studio Everytime when you create a map app, please

Available post processors Common and uncommon settings Tool setup and variables

Forecast setup: Forecasting is about the future! The practical setup: we are at time t (e.g., at

Part 1: Setup Google Map API for Android Prerequisites Please make sure that you can deploy a

Introduction to Artificial Intelligence What is Artificial Intelligence for YOU? CPSC 533

Algorithmic Learning Theory Theoretical Computer Science Peter Rossmanith Felix Reidl, Fernando

POLAR: Attention-based CNN for One-shot Personalized Article Recommendation Zhengxiao Du, Jie

Dense Associative Memories and Deep Learning Dmitry Krotov IBM Research MIT-IBM Watson AI Lab

Fundamentals of Computational Neuroscience 2e Thomas Trappenberg December 11, 2009 Chapter 8:

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

Integer Representation But first, encode deck of cards. Representation

OpenStack Heat OpenShift Autoscaling on OpenStack Heat Steven Dake (sdake@redhat.com) Twitter:

School Setup & Course Scheduling Using the Digital Academy Administrator Training 1717