Introduction la Reconnaissance de Formes RF, Pattern Recognition - - PowerPoint PPT Presentation

introduction la reconnaissance de formes rf pattern
SMART_READER_LITE
LIVE PREVIEW

Introduction la Reconnaissance de Formes RF, Pattern Recognition - - PowerPoint PPT Presentation

Introduction la Reconnaissance de Formes RF, Pattern Recognition PR Dijana Petrovska-Delacrtaz dijana.petrovska@telecom-sudparis.eu update 2020march24 1 Bibliographie 1. Pattern Classification and Scene Analysis ;


slide-1
SLIDE 1

Introduction à la Reconnaissance de Formes – RF, « Pattern Recognition – PR »

Dijana Petrovska-Delacrétaz dijana.petrovska@telecom-sudparis.eu

« update 2020march24»

1

slide-2
SLIDE 2

2

Bibliographie

1. Pattern Classification and Scene Analysis ; R.O. Duda, P.E. Hart, John Wiley & Sons, 2001 2. Introduction to Pattern Recognition: Statistical, Structural, Neural and Fuzzy Logic Approaches ; M. Friedman, A. Kandel, World Scientific, 1999 3. Reconnaissance des Formes et Analyse de Scènes, vol. 3 ; M. Kunt et al., Presses Polytechniques et Universitaires Romandes, 2000 4. Statistical Pattern Recognition: A Review, A.Jain, R.Duin, J. Mao, PAMI 2000 (figures principalement de cette ref) 5. Guide to Biometric Reference Systems and Performance

  • Evaluation. D. Petrovska-Delacrétaz, G. Chollet, and B. Dorizzi,
  • editors. Springer-Verlag, 2009. DOI: 10.1007/978-1-84800-292-0

6. Deep Learning with Python, Chollet, F., Manning Publications Company 7. http://www.jmlr.org/papers/volume11/erhan10a/erhan10a.pdf

slide-3
SLIDE 3

Introduction

  • Pattern recognition (PR) is the study of how machines can
  • bserve the environment, learn to distinguish patterns of

interest from their background, and make sound and reasonable decisions about the categories of the patterns

  • Humans are the best pattern recognizers, but « their » PR

algorithms are mostly unknown

  • Goal: make decisions automatically
  • A lot of disciplines are concerned with PR: artificial

intelligence, computer vision, machine learning, psychology, biology, medicine,…., understanding human PR…

  • Pattern = def Watanabe 1985 =opposite of chaos, entity

that could be given a name

3

slide-4
SLIDE 4

Pattern examples

4

slide-5
SLIDE 5

Profusion of “big” data

  • More an more data are available in digital form: on

Google, ADN sequences, You tube videos, astronomy, personal photos …. need of organizing the data

  • Supervised classification (e.g., discriminant analysis) in

which the input pattern is identified as a member of a predefined class,

  • Unsupervised classification (e.g., clustering) in which

the pattern is assigned to a hitherto unknown class

  • Emerging applications: Google search, biometrics,

speech recognition

  • Thanks to rapidly growing computational and storage

power

5

slide-6
SLIDE 6

6

slide-7
SLIDE 7

Design of PR

  • Modules of a PR:

– Data acquisitions – Segmentation – Data representation (feature extraction) – Learning (classification) – Post-processing (cost, fusion) – Decision making

  • Basic approaches of PR:

– Template matching – Statistical classification – Syntactic or structural matching – Neural networks

7

slide-8
SLIDE 8

Design Cycle

  • Data collection
  • Feature choice
  • Model choice
  • Training
  • Evaluation : split of the data !!!

Should be done in the very beginning: Into train, development and evaluation disjoint partitions (specific to the PR approach)

  • Common evaluations, reference systems and

reproducible results:

– Guide to Biometric Reference Systems and Performance Evaluation. D. Petrovska-Delacrétaz, G. Chollet, and B. Dorizzi, editors. Springer-Verlag, 2009. DOI: 10.1007/978-1-84800-292-0

8

slide-9
SLIDE 9

Template matching

  • In template matching, a template (typically, a

2D shape) or a prototype of the pattern to be recognized is available. The pattern to be recognized is matched (with a similarity measure) against the stored template while taking into account all allowable pose (translation and rotation) and scale changes.

  • Examples: fingerprints, normalized face

images (pixel correlation)

9

slide-10
SLIDE 10

Statistical approach

Each pattern is represented in terms of d- dimensional features or measurements and is viewed as a point in a d-dimensional space. The goal is to choose those features that allow pattern vectors belonging to different categories to occupy compact and disjoint regions in a d-dimensional feature space. Examples: Bayes classifiers, LDA, ….

10

slide-11
SLIDE 11

Syntactic approach

For complex patterns, it is more appropriate to adopt a hierarchical perspective where a pattern is viewed as being composed of simple subpatterns which are themselves built from yet simpler subpatterns . The simplest/elementary subpatterns to be recognized are called primitives and the given complex pattern is represented in terms of the interrelationships between these primitives. Assumes that pattern structure is quantifiable and extractable so that structural similarity of patterns can be assessed Need of grammatical rules for the classification. Examples: data represented by symbols: DNA sequences, speech atomic units (phonemes), data driven speech units

11

slide-12
SLIDE 12

12

slide-13
SLIDE 13

Neural networks and co.

Massively parallel computing systems consisting of an extremely large number of simple processors with many interconnections.

  • Main characteristics of neural networks are that they

have the ability to learn complex nonlinear input-

  • utput relationships, use sequential training

procedures, and adapt themselves to the data

  • Need of huge training data and computational power
  • Begin to be efficient on real world applications
  • Are able to learn data structures….

13

slide-14
SLIDE 14

Various approaches in statistical PR

14

slide-15
SLIDE 15

Our focus: Statistical Pattern Recognition

  • The recognition system is operated in two

modes: training (learning) and classification (testing)

15

slide-16
SLIDE 16

Learning with generalization ability

  • Data: Train and test sets should be disjoint
  • if test set = train set (“par coeur” learning)
  • Test on test set only once (no tuning)
  • Partitioning of the data in an optimal way

16

slide-17
SLIDE 17

Bad generalization reasons

  • # features >> # train samples
  • Too many parameters to estimate with few

data

  • Optimization on the test set: overtraining

17

slide-18
SLIDE 18

Distribution de proba connues

  • The features are assumed to have a

probability density or mass (depending on whether the features are continuous or discrete) function conditioned on the pattern

  • class. Thus, a pattern vector x belonging to

class wi is viewed as an observation drawn randomly from the class-conditional probability function p(x|wi);

  • Optimal Bayes decision….

18

slide-19
SLIDE 19

Dichotomy supervised/unsupervised learning

  • If the form of the class-conditional densities is

not known, then we operate in a nonparametric

  • mode. In this case, we must either estimate the

density function (e.g., Parzen window approach)

  • r k-nearest neighbor rule
  • If classification with construction of decision

boundaries => directly construct the decision boundary based on the training data =geom.approach

19

slide-20
SLIDE 20

Feature extraction

  • Dimensionality reduction methods: PCA; LDA,

N-networks,….

  • Voir exemple visages

20

slide-21
SLIDE 21

Visages 2D versus 3D

  • Visages 2D:

– facilité d’acquisition – Dispositifs peu chers – Formats de codage répandus

21

slide-22
SLIDE 22

22

Modules (etapes) necessaries

 Modules de:

 détection des visages  localisation de points caractéristiques  extraction des paramètres pertinents  création de modèles compacts  métriques d’évaluation

slide-23
SLIDE 23

Besoin de données

  • Pour avoir des exemples d’apprentissage

pertinents pour « éduquer » les algorithmes de chaque module domaine du « machine learning »

  • Pour trouver des bons paramètres pour le

focntionement des modules particuliers

  • Pour évaluer les résultats
  • Pour trouver les limites de fonctionnement des

algorithmes

  • .

23

slide-24
SLIDE 24

Détection de visage

  • Algorithme le plus répandu: AdaBoost, voir openCV

24

slide-25
SLIDE 25

Détection de points caractéristiques ("landmarks") du visage

  • 58 landmarks

25

slide-26
SLIDE 26

Normalisation

  • Avec en plus quelques prétraitements

26 Face extraction with geometric normalization Gray level extraction from HSV color system Photometric normalization by anisotropic smoothing 128x128 pixels

slide-27
SLIDE 27

Problème de dimension

  • Par ex. image 2D: 150x150 pixels

– Si dim50 quantifies avec 20 niveau – Espace des possibilités 20 50 – Dim = 22500 x val gris ou RGB ……

  • Données 3D de haute résolution: 75000 facettes

(texture) + forme….

  • Besoin de les traiter de manière compacte
  • Il y a des corrélations => comment les exploiter

=>avec la PCA

  • Création des « manifolds »

27

slide-28
SLIDE 28

28

« Manifold » des visages

  • Pixels très corrélés dans certaines régions
  • Points communs: yeux, nez, bouche, symétrie axe

verticale…

  • Variabilité des visages:

– Expressions – Bruits capture (résolution cameras) – Expression – Vieillissement – Changement d’apparence …

  • Difficulté: Comment représenter ces variabilités
slide-29
SLIDE 29

29

Représentation des images

stack PCA

slide-30
SLIDE 30

30

Espace intrinsèque des classes images

  • Image arbitraire est définie par ces a x b = d pixels:

– Dimensionalité d

  • Extraction d’information pour réduire la dimension:

exploitation des redondances

  • But:

– Espace stockage plus petit – Si transmission => utilisation de moins de bandes passante et/ou affichage progressif – Si reco formes: reco plus rapide sur des des images de faible dimensionnalite

  • Problème: quelle est la dimensionnalité intrinsèque

d’une classe d’images

slide-31
SLIDE 31

31

Exemple d’une droite

Ligne droite dans R3

  • points représentatifs a = x1 x2 x3 : dim 3
  • Sous espace constitue par tous les points de la

droite a un degré de liberté,

  • Sous espace représentatif:

– droite f(x1 x2 x3) = a1 x1 + a2 x2 + a3 x3

  • Representation des points: translation le long

de la droite

– Dim 1

slide-32
SLIDE 32

32

De l’importance de l’uniformisation des donnes

  • Exemples pris de « fouille de données », notes de Ph. Preux, uni Lille 3
slide-33
SLIDE 33

33

Importance d’homogénéisation des données

  • Distance entre individus en géométrie euclidienne
  • Si taille en m:

– Distances entre 4-5=2 4-6=5 5-6=9

  • Si taille en cm:

– Distances entre 4-5=27 4-6=30 5-6=9

  • Centrés réduites

– Distances entre 4-5=2.3 4-6=2.6 5-6=0.8

slide-34
SLIDE 34

34

Homogénéisation: centrage et réduction

slide-35
SLIDE 35

35

Centrage /et réduction

  • Centrage: soustraction moyenne
  • Réduction: division par variance
  • Si données homogènes (pixels d’une image)

seulement centrage des donnes

slide-36
SLIDE 36

PCA en résumé

  • PCA reduit la dimension des données pour une meilleure

representation (a) des donéées (!!! Linéarité)

  • Ceci n’implique pas qu’elle fournit une meilleure

representation pour la discirmination (b)

36

a a b

slide-37
SLIDE 37

37

Exemples des eigenface sur BANCA

slide-38
SLIDE 38

Devinette 1?

38

slide-39
SLIDE 39

Devinette 2

39

slide-40
SLIDE 40

Si un classifieur ne suffit pas

  • Combinaison de classifieurs …. Ou NN

40

slide-41
SLIDE 41

Où est-on aujourd’hui (mars2020)

  • NN(neural Nets) et DNN (Deep Neural Nets)

entrent en force (ex. ref6)

– Bons résultats, mais à quel prix

  • ANNOTATIONS Humaines
  • Clustering (voir ref 7)

41