Pattern Recognition Two main challenges Representation Matching - PowerPoint PPT Presentation

Chapter 6: Multilayer Neural Networks (Sections 6.1-6.3) • Introduction • Feedforward Operation and Classification • Backpropagation Algorithm

Pattern Recognition Two main challenges • Representation • Matching Jain CSE 802, Spring 2013

Representation and Matching

How Good is Face Representation? Courtesy: Pete Langenfeld, MSP Driver License Information Gallery: 34 million (30M DMV photos, 4M mugshots) 2009 driver license photo

How Good is Face Representation? Smile makes a difference! Top-10 retrievals 1 2 3 4 5 6 7 8 9 10 Gallery: 34 million (30M DMV photos, 4M mugshots) Courtesy: Pete Langenfeld, MSP

State of the Art in FR: Verification FRGC v2.0 (2006) MBGC (2010) LFW (2007) IJB-A (2015) LFW Standard Protocol 99.77% (Accuracy) 3,000 genuine & 3,000 imposter pairs; 10-fold CV LFW BLUFR Protocol 88% TAR @ 0.1% FAR 156,915 genuine, ~46M imposter pairs; 10-fold CV D. Wang, C. Otto and A. K. Jain, "Face Search at Scale: 80 Million Gallery", arXiv, July 28, 2015

Neural Networks n Massive parallelism is essential for complex recognition tasks (speech & image recognition) n Humans take only ~200ms for most cognitive tasks; this suggests parallel computation in human brain n Biological networks achieve excellent recognition performance via dense interconnection of simple computational elements (neurons) n Number of neurons » 10 10 – 10 12 n Number of interconnections/neuron » 10 3 – 10 4 n Total number of interconnections » 10 14 n Damage to a few neurons or synapse (links) does not appear to impair performance (robustness)

Neuron n Nodes are nonlinear, typically analog x 1 w 1 x 2 Y (output) x d w d where is internal threshold or offset

Neural Networks n Feed-forward networks with one or more layers (hidden) between input & output nodes n How many nodes & hidden layers? . . . . . . . . . c outputs d inputs First hidden layer Second hidden layer NH 1 input units NH 2 input units n Network training?

Form of the Discriminant Function • Linear: Hyperplane decision boundaries • Non-Linear: Arbitrary decision boundaries • Adopt a model and then use the resulting decision boundary • Specify the desired decision boundary

Linear Discriminant Function • For a 2-class problem, discriminant function that is a linear combination of input features can be written as Sign of the function value gives Bias or Threshold the class Weight weight label vector

Quadratic Discriminant Function • Quadratic Discriminant Function • Obtained by adding pair-wise products of features Linear Part Quadratic part, d(d+1)/2 (d+1) parameters additional parameters • g(x) positive implies class 1; g(x) negative implies class 2 • g(x) = 0 represents a hyperquadric, as opposed to hyperplanes in linear discriminant case • Adding more terms such as w ijk x i x j x k results in polynomial discriminant functions

Generalized Discriminant Function • A generalized linear discriminant function, where y= f(x) can be written as Setting y i (x) to be Dimensionality of the monomials results in augmented feature space. polynomial discriminant functions Weights in the augmented feature space. Note that the function is linear in a . • Equivalently, = = t t a [ a , a ,..., a ] y [ y ( x ), y ( x ),..., y d x ( )] ˆ ˆ 1 2 1 2 d also called the augmented feature vector .

13 Perceptron • Perceptron is a linear classifier; it makes predictions based on a linear predictor function combining a set of weights with feature vector • The perceptron algorithm was invented by Rosenblatt in the late 1950s; its first implementation, in custom hardware, was one of the first artificial neural networks to be produced • The algorithm allows for online learning; it processes training samples one at a time

Two-category Linearly Separable Case • Let y 1 , y 2 ,…, y n be n training samples in augmented feature space, which are linearly separable • We need to find a weight vector a such that • a t y > 0 for examples from the positive class • a t y < 0 for examples from the negative class • “Normalizing” the input examples by multiplying them with their class label (replace all samples from class 2 by their negatives), find a weight vector a such that • a t y > 0 for all the examples (here y is multiplied with class label) • The resulting weight vector is called a separating vector or a solution vector

The Perceptron Criterion Function • Goal: Find weight vector a such that a t y > 0 for all the samples (assuming it exists) • Mathematically, this can be expressed as finding a weight vector a that minimizes the no. of samples misclassified • Function is piecewise constant (discontinuous, and hence non- differentiable) and is difficult to optimize • Perceptron Criterion Function: Find a that minimizes this criterion The criterion is proportional to the sum of distances from the misclassified samples to the decision boundary Now, the minimization is mathematically tractable, and hence it is a better criterion fn. than no. of misclassifications .

Fixed-increment Single Sample Perceptron • Also called perceptron learning in an online setting • For large datasets, this is more efficient compared to batch mode n = no. of training samples; a = weight vector; k = iteration # Chapter 5, page 230

Perceptron Convergence Theorem If training samples are linearly separable, then the sequence of weight vectors given by Fixed- increment single-sample Perceptron will terminate at a solution vector What happens if the patterns are non-linearly separable?

18 Multilayer Perceptron Can we learn the nonlinearity at the same time as the linear discriminant? This is the goal of multilayer neural networks or multilayer Perceptrons Pattern Classification, Chapter 6

19 Pattern Classification, Chapter 6

20 Feedforward Operation and Classification • A three-layer neural network consists of an input layer, a hidden layer and an output layer interconnected by modifiable (learned) weights represented by links between layers • Multilayer neural network implements linear discriminants, but in a space where the inputs have been mapped nonlinearly • Figure 6.1 shows a simple three-layer network Pattern Classification, Chapter 6

21 NNo training here No training involved here, since we are implementing a known input/output mapping

22 Pattern Classification, Chapter 6

23 • A single “ bias unit ” is connected to each unit in addition to the input units d d å å • Net activation: = + = º t net x w w x w w . x , j i ji j 0 i ji j = = i 1 i 0 where the subscript i indexes units in the input layer, j in the hidden layer; w ji denotes the input-to-hidden layer weights at the hidden unit j . (In neurobiology, such weights or connections are called “ synapses ” ) • Each hidden unit emits an output that is a nonlinear function of its activation, that is: y j = f(net j ) Pattern Classification, Chapter 6

24 Figure 6.1 shows a simple threshold function ³ ì 1 if net 0 = º f ( net ) sgn( net ) í - < 1 if net 0 î • The function f(.) is also called the activation function or “ nonlinearity ” of a unit. There are more general activation functions with desirables properties • Each output unit similarly computes its net activation based on the hidden unit signals as: n n å H å H = + = = t net y w w y w w . y , k j kj k 0 j kj k = = j 1 j 0 where the subscript k indexes units in the ouput layer and n H denotes the number of hidden units Pattern Classification, Chapter 6

25 • The output units are referred as z k . An output unit computes the nonlinear function of its net input, emitting z k = f(net k ) • In the case of c outputs (classes), we can view the network as computing c discriminant functions z k = g k (x); the input x is classified according to the largest discriminant function g k (x) " k = 1, …,c • The three-layer network with the weights listed in fig. 6.1 solves the XOR problem Pattern Classification, Chapter 6

26 • The hidden unit y 1 computes the boundary: ³ 0 Þ y 1 = +1 x 1 + x 2 + 0.5 = 0 < 0 Þ y 1 = -1 • The hidden unit y 2 computes the boundary: £ 0 Þ y 2 = +1 x 1 + x 2 -1.5 = 0 < 0 Þ y 2 = -1 • Output unit emits z 1 = +1 if and only if y 1 = +1 and y 2 = +1 Using the terminology of computer logic, the units are behaving like gates, where the first hidden unit is an OR gate, the second hidden unit is an AND gate, and the output unit implements z k = y 1 AND NOT y 2 = (x 1 OR x 2 ) and NOT (x 1 AND x 2 ) = x 1 XOR x 2 which provides the nonlinear decision of fig. 6.1 Pattern Classification, Chapter 6

Pattern Recognition Two main challenges Representation Matching - PowerPoint PPT Presentation

Chapter 6: Multilayer Neural Networks (Sections 6.1-6.3) Introduction Feedforward Operation and Classification Backpropagation Algorithm Pattern Recognition Two main challenges Representation Matching Jain CSE 802, Spring

Part 5 pattern recognition pattern recognition track pattern recognition: associate hits

Feature Selection Pattern Recognition: The Early Days Pattern Recognition: The Early Days Only

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Pattern Recogniton Pattern: Any

CS 7616 Pattern Recognition Introduction Aaron Bobick School of Interactive Computing

Pattern Recognition CSE 802 Michigan State University Spring 2017 Lecture 1, January 9, 2017

Applications of Pattern Recognition in Computational Biology Pattern Recognition Course

Pattern Recognition: An Overview Prof. Richard Zanibbi Pattern Recognition (One) Definition

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

CS 7616 Pattern Recognition Bayesian Decision Theory Aaron Bobick School of Interactive Computing

Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders

An NFR Pattern Approach to Dealing An NFR Pattern Approach to Dealing An NFR Pattern Approach to

Scope Constrained Frequent Pattern Mining: Constrained Frequent Pattern Mining: A A

A common pattern: map Another common pattern: filter Pattern: take a list and produce a new list,

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Pattern Recognition Theory Lecture 12 : Correlation Filters Pattern Matching a How to match

Lecture 10: Training Neural Networks (Part 1) Justin Johnson Lecture 1 - 1 October 7, 2019

Neural Networks Philipp Koehn 14 April 2020 Philipp Koehn Artificial Intelligence: Neural

When Neurons Fail El Mahdi El Mhamdi, Rachid Guerraoui BDA, Chicago July 25th, 2016 1 / 28

ResNet with one-neuron hidden layers is universal approximator Hongzhou Lin, Stefanie Jegelka

NEURON + Python Michael Hines HBP CodeJam Workshop #7 Manchester 2016 NINDS I r n e t t

Neural Networks Part 1 Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University

THE NEURAL SIMULATION TOOL NEST 1st HPAC Platform Training December 11, 2018 Jochen M. Eppler

Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides

Pattern Recognition Two main challenges Representation Matching - PowerPoint PPT Presentation

Chapter 6: Multilayer Neural Networks (Sections 6.1-6.3) Introduction Feedforward Operation and Classification Backpropagation Algorithm Pattern Recognition Two main challenges Representation Matching Jain CSE 802, Spring

Part 5 pattern recognition pattern recognition track pattern recognition: associate hits

Feature Selection Pattern Recognition: The Early Days Pattern Recognition: The Early Days Only

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Pattern Recogniton Pattern: Any

CS 7616 Pattern Recognition Introduction Aaron Bobick School of Interactive Computing

Pattern Recognition CSE 802 Michigan State University Spring 2017 Lecture 1, January 9, 2017

Applications of Pattern Recognition in Computational Biology Pattern Recognition Course

Pattern Recognition: An Overview Prof. Richard Zanibbi Pattern Recognition (One) Definition

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

CS 7616 Pattern Recognition Bayesian Decision Theory Aaron Bobick School of Interactive Computing

Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders

An NFR Pattern Approach to Dealing An NFR Pattern Approach to Dealing An NFR Pattern Approach to

Scope Constrained Frequent Pattern Mining: Constrained Frequent Pattern Mining: A A

A common pattern: map Another common pattern: filter Pattern: take a list and produce a new list,

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Pattern Recognition Theory Lecture 12 : Correlation Filters Pattern Matching a How to match

Lecture 10: Training Neural Networks (Part 1) Justin Johnson Lecture 1 - 1 October 7, 2019

Neural Networks Philipp Koehn 14 April 2020 Philipp Koehn Artificial Intelligence: Neural

When Neurons Fail El Mahdi El Mhamdi, Rachid Guerraoui BDA, Chicago July 25th, 2016 1 / 28

ResNet with one-neuron hidden layers is universal approximator Hongzhou Lin, Stefanie Jegelka

NEURON + Python Michael Hines HBP CodeJam Workshop #7 Manchester 2016 NINDS I r n e t t

Neural Networks Part 1 Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University

THE NEURAL SIMULATION TOOL NEST 1st HPAC Platform Training December 11, 2018 Jochen M. Eppler

Logistic Regression &amp; Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides

Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides