[PPT] - Machine Learning sanparith.marukatat@nectec.or.th Today Example PowerPoint Presentation

SLIDE 1

Machine Learning

sanparith.marukatat@nectec.or.th

SLIDE 2

Today

Example of intelligent system: OCR
k-Nearest Neighbor Classifier
Generative model
Maximum likelihood
Naïve Bayes model
Gaussian model

SLIDE 3

What is OCR?

Optical Character Recognition

– Input: scanned images, photos, vdo images – Output: text file

Alternative input

– Electronic stylus e.g. PDA – Online handwritten recognition

SLIDE 4

OCR process (1)

Preprocessing

– Image enhancement, denoise, deskew, ... – Binarization

Layout analysis and character segmentation
Character recognition
Spell correction

SLIDE 5

Should be removed

SLIDE 6

picture Two columns

SLIDE 7

1 line

SLIDE 8

1 character

SLIDE 9

OCR process (2)

Preprocessing uses image processing tech.
Layout analysis uses rule-based + some stats.
Character recognition

– Classifier (trained from training corpus) – Look-up table: no class -> ascii or UTF code (given by system designer)

Spell correction uses dictionary + some stats

+ some NLP tech.

SLIDE 10

How to build the recognition module? (1)

Technical choice

– All separated character: Neural Network, SVM – Few touched characters: Class of touched char – Some broken characters (อำำ): Class of sub-char – Rule-based segmentation – Several touched chars (e.g. arab, urdu): 2D- HMM

SLIDE 11

How to build the recognition module? (2)

Normalize character image (reduce variation, get

fixed size)

Select features

– High-level features: e.g. head, tail

Contain more information
Cannot reliably detected

– Low-level features: pixels color, edge

Single feature is not meaningful
Can be easily detected
Can be improved: PCA, LDA, NLDA, ...

SLIDE 12

How to build the recognition module? (2)

Design class
Build a feature extractor, ex: vector of pixels color
Construct a training corpus

– 1 example = 1 vector and 1 class – Very large number of examples – Cover all conditions: dpi, fonts, font sizes, style (e.g. slant, bold), writing styles, pen styles, with and without noises

SLIDE 13

How to build the recognition module? (3)

Building corpus

– Handwritten

Collect sample
Segment from forms or manual segmentation

– Printed

Print different fonts, font sizes, ...
Scan, scan of copy, ...
Time consuming

SLIDE 14

How to build the recognition module? (4)

Select a classifier

– Select tools

SNNS or fann for Neural Network
libsvm or svmlight for SVM
weka

– Format of training corpus – Parameters and their values – How to use it in your code

SLIDE 15

What is Neural Network?

Biological inspired multi-class classifier
Set of nodes with oriented connections

Multi-Layer Perceptron Diabolo network Recurrent network

SLIDE 16

Using neural network

Try MLP with 1 hidden layer first
1 parameter = number of hidden nodes
Training with Gradient descent
1 training parameter = learning rates

SLIDE 17

What is SVM?

Linear classifier using kernel trick trained to tradeoff

between error and generalization – Linear classifier

output is linear combination of input features
y = sign(wTx)
Use multiple linear classifier for multi-class

problem

– Kernel trick

Replace all dot product with a kernel function
K(x1,x2) = <g(x1); g(x2)> with some unknown

function g

SLIDE 18

Using SVM

Kernel function

– Radial Basis Function (RBF) K(x,y) = exp(-γ||x-y||2) – Polynomial K(x,y) = (<x;y>+1)d

Tradeoff parameter C

– Small C = generalization is more important than error – Large C = error is more important

SLIDE 19

Exercise

MNIST dataset with libsvm using RBF

kernel

SVM's parameters

– gamma = inverse of area of influence around each example, try 0.005 – C = trade off parameter between error on training set and generalization, try 1000

SLIDE 20

SLIDE 21

How to build the recognition module? (5)

Multi-pass classifier

– Rough classification: upper vowel, mid-level char, lower vowel – Fine classification

SLIDE 22

How to build the recognition module? (5)

Multi-pass classifier

– Rough classification: upper vowel, mid-level char, lower vowel – Fine classification: กถภ, ปฝฟ, ... – Finer classification

SLIDE 23

1-Nearest Neighbor classifier

Prototype-based classifier, template-based classifier
Distance function
Useful when

– We have very limited number of training examples, cannot train other classifier – We have large number of training examples, just to test as a baseline

What is performance of this model?

– When n→ ∞, 1NN error < 2 bayes error

SLIDE 24

Bayes error

If we do know P(Class1|x),...,P(ClassM|x), then the Bayes

classification rule f(x) = arg maxi=1,...,M P(Classi|x) produces the minimum possible expected error = Bayes error

P(Y=1|x) P(Y=2|x)

Expected error

X

SLIDE 25

k-NN and Bayes classification rule

P(x) ≈ k / (NV)
N number of examples in training set
V volume around x
k number of points in V
P(x|y) ≈ ky / (NyV)
ky number of examples of class y in V
Ny number of examples of class y in training set
P(y) ≈ Ny/N
P(y|x) ≈ ky / k (why??)

SLIDE 26

k-NN algorithm

Put the input object into the same class as most of

its k nearest neighbors

Implementation

– Compute distance between input and each training examples – Sort in increasing order – Count number of instances from each class amongst k nearest neighbors

There is no k which is always optimal

SLIDE 27

Some useful distances

Norm-p distance
Mahalanobis distance
Kernel-based distance

– WHY?

∣ ∣x−y∣ ∣

p=∑ i=1 n

xi−yi

p 1/p

distx , y=x−y

T  −1x−y

distx , y=K x ,xK y , y−2Kx , y

SLIDE 28

Generative approach (1)

Solving classification problem = build P(class|input)
P(classi|input) = P(input|classi) P(classi) / P(input)
P(input) = Σi P(input|classi) P(classi)
P(classi) = percentage of examples from class i in the

training set

Solving classification problem = build P(input|classi)
P(input|classi) = likelihood of classi
P(classi|input) = posterior probability of classi

SLIDE 29

Generative approach (2)

To build P(input|classi) we usually made an

assumption about

– How the data from the class i is distributed, e.g. Binomial, Gaussian, Mixture of Gaussian – How the data is created, e.g. HMM

Example: Document classification

– How each input i.e. document is represented? – What is the likelihood model for these data?

SLIDE 30

Document classification (1)

Spam/not-spam
Document = set of words
Preprocessing

– word segmentation – remove stop-words – stemming – word selection

SLIDE 31

Document classification (2)

Naïve Bayes assumption: all words are

independent

P(w1,...,wn|Spam) = Πi P(wi|Spam)
Same hypothesis for all classes
How to compute P(wi|Spam), why?
What is the process of building Naïve Bayes

model for spam classification?

SLIDE 32

Maximum Likelihood (1)

x1,...,xN are i.i.d. according to P(x|θ) where θ is the

parameter of this model

Q: What is the proper value for θ?
A: The value which gives maximum P(x1,...,xN|θ)
Q: We know P(x|θ), how to compute P(x1,...,xN|θ)?
A: P(x1,...,xN|θ) = Πi P(xi|θ), Why?
Q: How to find the maximum value?
Q: How to get ride of the product?

SLIDE 33

Maximum Likelihood (2)

Exercise: Binomial distribution

– Q: What this means? What is P(w|Spam) – word “viagra” – {T, F, F, T, T, T, F, T, F, T } – Find proper parameter for P(w|Spam)

SLIDE 34

Maximum Likelihood (3)

Exercise: coin-toss

– {H, T, H, H, T, H, H, H, T, H} – Q: What is the parameter of Binomial distribution that fits this data? – Q: What is the conclusion?

SLIDE 35

Maximum A Posteriori

Sometimes, we have prior knowledge about the model,

i.e. we have P(θ)

We search for maximum P(θ|x1,...,xN) instead
Q: How to compute P(θ|x1,...,xN) from P(θ) and P(x|θ)?
Exercise: coin-toss problem θ is distributed as Gaussian

with mean 5/10 and standard deviation 1/10 – Q: What is Gaussian model? – Q: What is the proper value for θ?

SLIDE 36

ML or MAP?

ML is good when we have large enough data
MAP is prefered when we have small data
Prior can be estimated from data too