Lecture 8 N.MORGAN / B.GOLD LECTURE 8 - - PowerPoint PPT Presentation

lecture 8
SMART_READER_LITE
LIVE PREVIEW

Lecture 8 N.MORGAN / B.GOLD LECTURE 8 - - PowerPoint PPT Presentation

LECTURE ON PATTERN RECOGNITION EE 225D University of California Berkeley College of Engineering Department of Electrical Engineering and Computer Sciences Professors : N.Morgan / B.Gold EE225D Spring,1999 Pattern Classification Lecture 8


slide-1
SLIDE 1

EE 225D N.MORGAN / B.GOLD LECTURE 8 8.1 LECTURE ON PATTERN RECOGNITION

University of California Berkeley

College of Engineering Department of Electrical Engineering and Computer Sciences

Professors : N.Morgan / B.Gold EE225D Spring,1999

Pattern Classification

Lecture 8

slide-2
SLIDE 2

EE 225D N.MORGAN / B.GOLD LECTURE 8 8.2 LECTURE ON PATTERN RECOGNITION

Speech Pattern Recognition

  • Soft pattern classification plus temporal sequence

integration

  • Supervised pattern classification: class labels used

in training

  • Unsupervised pattern classification: class labels not

available or used

slide-3
SLIDE 3

EE 225D N.MORGAN / B.GOLD LECTURE 8 8.3 LECTURE ON PATTERN RECOGNITION

Feature Extraction Pattern Feature Vector Classification x1 x2 xd 1 k K < ≤ ωk

slide-4
SLIDE 4

EE 225D N.MORGAN / B.GOLD LECTURE 8 8.4 LECTURE ON PATTERN RECOGNITION

  • Training: learning parameters of classifier
  • Testing: classify independent test set, compare with

labels and score

slide-5
SLIDE 5

EE 225D N.MORGAN / B.GOLD LECTURE 8 8.5 LECTURE ON PATTERN RECOGNITION

slide-6
SLIDE 6

EE 225D N.MORGAN / B.GOLD LECTURE 8 8.6 LECTURE ON PATTERN RECOGNITION

slide-7
SLIDE 7

EE 225D N.MORGAN / B.GOLD LECTURE 8 8.7 LECTURE ON PATTERN RECOGNITION

Feature Extraction Criteria

  • Class discrimination
  • Generalization
  • Parsimony (efficiency)
slide-8
SLIDE 8

EE 225D N.MORGAN / B.GOLD LECTURE 8 8.8 LECTURE ON PATTERN RECOGNITION

plosive + vowel energies for 2 different gains t E t) ( ) t E t ( )

slide-9
SLIDE 9

EE 225D N.MORGAN / B.GOLD LECTURE 8 8.9 LECTURE ON PATTERN RECOGNITION

t ∂ ∂ CE t ( ) log t ∂ ∂ C log E t ( ) log + ( ) = t ∂ ∂ E t ( ) log =

slide-10
SLIDE 10

EE 225D N.MORGAN / B.GOLD LECTURE 8 8.10 LECTURE ON PATTERN RECOGNITION

Feature Vector Size

  • Best representations for discrimination on training

set are large (highly dimensioned)

  • Best representations for generalization to test set are

(typically) succinct)

slide-11
SLIDE 11

EE 225D N.MORGAN / B.GOLD LECTURE 8 8.11 LECTURE ON PATTERN RECOGNITION

Dimensionality Reduction

  • Principal components (i.e., SVD, KL transform,

eigenanalysis ...)

  • Linear Discriminant Analysis (LDA)
  • Application-specific knowledge
  • Feature Selection via PR Evaluation
slide-12
SLIDE 12

EE 225D N.MORGAN / B.GOLD LECTURE 8 8.12 LECTURE ON PATTERN RECOGNITION

x x x x x x x x

  • f1

f2

slide-13
SLIDE 13

EE 225D N.MORGAN / B.GOLD LECTURE 8 8.13 LECTURE ON PATTERN RECOGNITION

slide-14
SLIDE 14

EE 225D N.MORGAN / B.GOLD LECTURE 8 8.14 LECTURE ON PATTERN RECOGNITION

PR Methods

  • Minimum Distance
  • Discriminant Functions
  • Linear Discriminant
  • Nonlinear Discriminant

(e.g, quadratic, neural networks)

  • Statistical Discriminant Functions
slide-15
SLIDE 15

EE 225D N.MORGAN / B.GOLD LECTURE 8 8.15 LECTURE ON PATTERN RECOGNITION

Minimum Distance

  • Vector or matrix representing element
  • Define a distance function
  • Choose the class of stored element closest to new

input

  • Choice of distance equivalent to implicit statistical

assumptions

  • For speech, temporal variability complicates this
slide-16
SLIDE 16

EE 225D N.MORGAN / B.GOLD LECTURE 8 8.16 LECTURE ON PATTERN RECOGNITION

zi template vector (prototype) = x input vector = Choose i to minimize distance argimin x zi – ( )

T x

zi – ( ) argimin x zi – ( )

T x

zi – ( ) argimin x

Tx

zi

Tzi

2x

Tzi

– + ( ) = = argimax zi

Tzi

2x

Tzi

– 2 –

   argimax x

Tzi

1 2

  • zi

Tzi

–     = If zi

Tzi

1 for all i = argimax x

Tzi

( ) ⇒

slide-17
SLIDE 17

EE 225D N.MORGAN / B.GOLD LECTURE 8 8.17 LECTURE ON PATTERN RECOGNITION

Problems with Min Distance

  • Proper scaling of dimensions (size, discrimination)
  • For high dim, sparsely sampled space
slide-18
SLIDE 18

EE 225D N.MORGAN / B.GOLD LECTURE 8 8.18 LECTURE ON PATTERN RECOGNITION

Decision Rule for Min Distance

  • Nearest Neighbor (NN) - in the limit of infinite

samples, at most twice the error of optimum classifier

  • k-Nearest Neighbor (kNN)
  • Lots of storage for large problems; potentially large

searches

slide-19
SLIDE 19

EE 225D N.MORGAN / B.GOLD LECTURE 8 8.19 LECTURE ON PATTERN RECOGNITION

Some Opinions

  • Better to throw away bad data than to reduce its

weight

  • Dimensionality-reduction based on variance often a

bad choice for supervised pattern recognition

slide-20
SLIDE 20

EE 225D N.MORGAN / B.GOLD LECTURE 8 8.20 LECTURE ON PATTERN RECOGNITION

Discriminant Analysis

  • Discriminant functions max for correct class, min

for others

  • Decision surface between classes
  • Linear decision surface for 2-dim is line, for 3 is

plane; generally called hyperplane

  • For 2 classes, surface at
  • 2-class quadratic case, surface at

ω ω ω ωTx ω ω ω ω0 + = xTWx ω ω ω ωTx ω ω ω ω0 + + =

slide-21
SLIDE 21

EE 225D N.MORGAN / B.GOLD LECTURE 8 8.21 LECTURE ON PATTERN RECOGNITION

slide-22
SLIDE 22

EE 225D N.MORGAN / B.GOLD LECTURE 8 8.22 LECTURE ON PATTERN RECOGNITION

Training Discriminant Functions

  • Minimum distance
  • Fisher linear discriminant
  • Gradient learning
slide-23
SLIDE 23

EE 225D N.MORGAN / B.GOLD LECTURE 8 8.23 LECTURE ON PATTERN RECOGNITION

Generalized Discriminators - ANNs

  • McCulloch Pitts neural model
  • Rosenblatt Perceptron
  • Multilayer Systems
slide-24
SLIDE 24

EE 225D N.MORGAN / B.GOLD LECTURE 8 8.24 LECTURE ON PATTERN RECOGNITION

The Perceptron

McCulloch-Pitts Neuron - Rosenblatt Perceptron + xd x2 x1 yo bias wd w2 w1

slide-25
SLIDE 25

EE 225D N.MORGAN / B.GOLD LECTURE 8 8.25 LECTURE ON PATTERN RECOGNITION

Perceptron Convergence

If classes are linearly separable the following rule will converge in a finite number of steps : For each pattern x at time step k; if x k ( ) class 1, ωT k ( )x k ( ) ≤ ∈ ω k 1 + ( ) = ω k ( ) cx k ( ) + ⇒ x k ( ) class 2, ωT k ( )x k ( ) ≥ ∈ ω k 1 + ( ) = ω k ( ) cx k ( ) – ⇒       else ω k 1 + ( ) = ""ω k ( )     

slide-26
SLIDE 26

EE 225D N.MORGAN / B.GOLD LECTURE 8 8.26 LECTURE ON PATTERN RECOGNITION

Multilayer Perceptrons

  • Heterogeneous, “hard” nonlinearity :(DAID, 1961)
  • Homogeneous, “soft” nonlinearity

(“modern” MLP)

I/O Perceptron

  • Gaus. class

subsets feature

slide-27
SLIDE 27

EE 225D N.MORGAN / B.GOLD LECTURE 8 8.27 LECTURE ON PATTERN RECOGNITION

slide-28
SLIDE 28

EE 225D N.MORGAN / B.GOLD LECTURE 8 8.28 LECTURE ON PATTERN RECOGNITION

y f y ( ) f y ( ) 1 1 e y

+

  • (sigmoid)

= f y ( ) 1 < <

slide-29
SLIDE 29

EE 225D N.MORGAN / B.GOLD LECTURE 8 8.29 LECTURE ON PATTERN RECOGNITION

slide-30
SLIDE 30

EE 225D N.MORGAN / B.GOLD LECTURE 8 8.30 LECTURE ON PATTERN RECOGNITION

Some PR Issues

  • Testing on the training set
  • Training on the test set
  • No. parameters vs no. training examples: overfitting

and overtraining