Lecture 8
n Agenda:
n String matching n How to evaluate a pattern recognition system
Lecture 8 n Agenda: n String matching n How to evaluate a pattern - - PowerPoint PPT Presentation
Lecture 8 n Agenda: n String matching n How to evaluate a pattern recognition system String Matching (note 1) n n Definitions: n x = movi. Text :zlatanibrahimovic n Shift: s = offset from start of text to start position of x n Valid
n Agenda:
n String matching n How to evaluate a pattern recognition system
n
(note 1)
n Definitions:
n x = ”movi”. Text:”zlatanibrahimovic” n Shift: s = offset from start of text to start position of x n Valid shift: s = offset to a complete match
n Applications: Find word in text, count words, etc.
n Naive string matching: brute force n Ok, but slow for large texts n Alternative: Boyer-Moore string matching
n Faster because s = s+k, where k>1 n k=1 for the naive algorithm
n Algorithm (tavle) n Good suffix:
n The elements (from right) which match
n Bad character:
n The first (from right) wrong element
n Calculate the effect of both and apply max.
n F(x): Last occurrence function (bad character)
n Look-up table containing each letter in the alphabet together
n Example: x = ”bror”. F(x): o = 3, r = 2, b = 1, the rest = 0 n Example: x = ”estimates”
n NB: note that the right-most element is ignored since this
n G(x): Good-suffix function
n Look-up table containing the second right-most position of
n Ex: x = ”bror”. G(x): r = 2, the rest = 0,
n Ex: x = ”estimates”
n We know what to do for features… n x = ”hej” y = ”her” z = ”haj” n Dist(x,y) ?? Dist(x,y) > Dist(x,z) ?? n Applications: Spell-checking, speech recognition,
n Hamming distance: |x| = |z|
n Measures the number of positions where a difference
n Dist(x,y)=1, Dist(y,z)=2, Dist(x,y) = Dist(x,z)
n Levenshtein distance n |x| = |z| is not required => better
n Aka Edit distance, since the distance is
n Cost matrix: C (1.row, 1.col., hereafter one col. at a time)
n
C[i,j] = min[ C[i-1,j] +1 , C[i,j-1] +1 , C[i-1,j-1] +1 – δ(x[i],y[j]) ]
deletion insertion No change / exchange δ(x[i],y[j])= 1 if x[i]=y[j]
n In some system specifications you need technical
n HW, SW, Real-time, recognition rate,…
n Recognition rate =
n Multiply by 100% and you have it in percentages
n How do you test a system? n How do you present and interpret the results?
n Cross-validation
n Train on α % of the samples (α > 50) and test on the rest n α is typically 90, depending on the number of samples and
n M-fold cross validation
n Divide (randomly) all samples in M equally sized groups n Use M-1 groups to train the system and test on the rest n Do this M times and average the results
n Recognition rate =
n
Multiply by 100% and you have it in percentages
n Error % = 100% - ( Recognition rate x 100% ) n Distribution of errors? n Confusion matrix
n 3 classes n 25 samples
per class
Input (the truth) Output (from the system)
n Number of errors = Incorrect recognized + Not recognized n The total number of errors can be represented like this:
Input (the truth) Output (from the system) Incorrect recognized (Type I error) (False positiv = FP) (False accept = FA) (False accept rate = FAR) (Ghost object) (False alarm) Not recognized (Type II error) (False negativ = FN) (False reject = FR) (False reject rate = FRR) (Miss)
Input (the truth) Output (from the system) Incorrect recognized (Type I error) (False positiv = FP) (False accept = FA) (False accept rate = FAR) (Ghost object) (False alarm) Not recognized (Type II error) (False negativ = FN) (False reject = FR) (False reject rate = FRR) (Miss) Ok
No !!
Input (the truth) Output (from the system) Incorrect recognized (Type I error) (False positiv = FP) (False accept = FA) (False accept rate = FAR) (Ghost object) (False alarm) Not recognized (Type II error) (False negativ = FN) (False reject = FR) (False reject rate = FRR) (Miss) Ok
No !!
Call these patients “negative” Call these patients “positive”
Call these patients “negative” Call these patients “positive” without the disease with the disease
Call these patients “negative” Call these patients “positive” without the disease with the disease
Call these patients “negative” Call these patients “positive” without the disease with the disease
Call these patients “negative” Call these patients “positive” without the disease with the disease
without the disease with the disease
without the disease with the disease
True Positive Rate (sensitivity)
0% 100%
False Positive Rate (1-specificity)
0% 100%
True Positive Rate
%
100%
False Positive Rate
% 100%
True Positive Rate
%
100%
False Positive Rate
% 100%
True Positive Rate
%
100%
False Positive Rate
% 100 %
True Positive Rate
%
100%
False Positive Rate
% 100 %
True Positive Rate
%
100%
False Positive Rate
% 100 %
True Positive Rate
%
100%
False Positive Rate
% 100 %
True Positive Rate
%
100%
False Positive Rate
% 100 %
True Positive Rate
%
100%
False Positive Rate
% 100 %
training
35