we are grateful to jason saragih for providing his clm
play

We are grateful to Jason Saragih for providing his CLM code for our - PowerPoint PPT Presentation

Etvs Lornd University Faculty of Informatics Laszlo A. Jeni Daniel Takacs Andras Lorincz The University Realeyes Data Eotvos Lorand University of Tokyo Services Ltd H igh Q uality F acial E xpression R ecognition in V ideo S treams


  1. Eötvös Loránd University Faculty of Informatics Laszlo A. Jeni Daniel Takacs Andras Lorincz The University Realeyes Data Eotvos Lorand University of Tokyo Services Ltd H igh Q uality F acial E xpression R ecognition in V ideo S treams using S hape R elated I nformation only

  2. Eötvös Loránd University Faculty of Informatics We are grateful to Jason Saragih for providing his CLM code for our studies

  3. Eötvös Loránd University Outline  Introduction  Theory  Datasets  Experiments Faculty of Informatics  Discussion

  4. Eötvös Loránd University Introduction  Goal: recognize discrete facial emotions in video streams.  We use  precise Constrained Local Model based face tracking and Faculty of Informatics  shape related information for the emotion classification.  High quality classification can be achieved.

  5. Eötvös Loránd University Outline  Introduction  Theory  Datasets  Experiments Faculty of Informatics  Discussion

  6. Eötvös Loránd University Overview of the System Video Stream  We register a 66 point 3D Face Detection constrained local model (CLM) for the face. 3D Facial Feature Point Registration Faculty of Informatics  In 3D the CLM estimates the rigid parameters , therefore we can Normalization remove this deformation.  We use either AU0 normalization , or Normalized 3D Shape personal mean shape normalization to remove the personal variation of the face. Support Vector Machine  Finally, a multiclass SVM classifies the emotions. Emotion Label

  7. Eötvös Loránd University Constrained Local Model Video Stream  Point Distribution Model (PDM) Face Detection  Where are the landmaks?  3D model 3D Facial Feature Point Registration Faculty of Informatics  Normalization Parameters  Scale, Projection to 2D, Rotation (yaw/pitch/roll), Mean shape, Non-rigid components (PCA), Normalized 3D Shape PCA Coefficients, Translation Support Vector Machine Emotion Label

  8. Eötvös Loránd University Constrained Local Model Video Stream  Local: “local experts” to locate the landmarks Face Detection (logit regressors) 3D Facial Feature Point  Registration Faculty of Informatics Constrained: the relative position of the landmarks is constrained by the PDM) Normalization  Optimization problem: Normalized 3D Shape Support Vector Machine l i = {-1,1}  ith marker is (not) in a correct position Emotion Label

  9. Eötvös Loránd University Constrained Local Model Video Stream  Positive examples from an annotated dataset Face Detection 3D Facial Feature Point Registration Faculty of Informatics  Negative examples: from the neighborhood Normalization Normalized prob estimations 3D Shape for one patch markers found Support Vector Machine Response map of the corner of the eye Emotion Label

  10. Eötvös Loránd University Normalization Video Stream  AU0 normalization Face Detection  the difference between the features of the actual shape and the features of the first (neutral) frame 3D Facial Feature Point Registration Faculty of Informatics  Personal Mean Shape Normalization  AU0 normalization is crucial for facial expression Normalization recognition, however it is person dependent and it is not available for a single frame.  Normalized We assume that we have videos (frame series) 3D Shape about the subject like in the case of the BU 4DFE and we can compute the personal mean shape.  We found that the mean shape is almost identical Support Vector Machine to the neutral shape, i.e., to AU0. Emotion Label

  11. Eötvös Loránd University SVM Based Classification Video Stream  SVM seeks to minimize the cost Face Detection function 3D Facial Feature Point Registration Faculty of Informatics Normalization Normalized  Multi-class classification: 3D Shape  decision surfaces are computed for all class pairs, Support Vector Machine  for k classes one has k(k − 1)/2 decision surfaces  voting for decision. Emotion Label

  12. Eötvös Loránd University Outline  Introduction  Theory  Datasets  Experiments Faculty of Informatics  Discussion

  13. Eötvös Loránd University Datasets  Cohn-Kanade Extended  2D images of 118 subjects  annotated with the seven Faculty of Informatics universal emotions  Ground truth landmarks  AU validated emotion labels  BU-4DFE  High-resolution 3D video sequences of 101 subjects  Six prototypic facial expressions  No ground truth landmarks (they were provided by the CLM)  Posed expressions

  14. Eötvös Loránd University Outline  Introduction  Theory  Datasets  Experiments Faculty of Informatics  Discussion

  15. Eötvös Loránd University CK+ with original landmarks  We used the CK+ dataset with the original 68 2D landmarks  Calculated the mean shape using Procrustes’s method Faculty of Informatics  Normalized all shapes by minimizing the Procrustes distance between individual shapes and the mean shape  Compared AU0 normalization with Personal Mean Shape normalization  Trained a multi-class SVM using the leave-one-subject-out cross validation method

  16. Eötvös Loránd University CK+ with original landmarks  Emotions with large distortions, such as disgust, happiness and surprise, gave rise to nearly 100% classification performance. Faculty of Informatics  Even for the worst case, performance was 92% (fear). AU0 normalization

  17. Eötvös Loránd University CK+ with original landmarks  Emotions with large distortions, such as disgust, happiness and surprise, gave rise to nearly 100% classification performance. Faculty of Informatics  Even for the worst case, performance was 92% (fear). AU0 normalization  Replacing AU0 normalization by personal mean shape slightly decreases average performance: recognition on the CK+ database drops from 96% to 94.8% Personal Mean Shape normalization

  18. Eötvös Loránd University CLM tracked CK+  We studied the performance of the multi-class SVM using CLM method on the CK+ dataset.  We tracked facial expressions with the CLM tracker and annotated all Faculty of Informatics image sequences starting from the neutral expression to the peak of the emotion.  3D CLM estimates the rigid and non-rigid transformations:  We removed the rigid ones from the faces and  projected the frontal view to 2D.

  19. Eötvös Loránd University CLM tracked CK+  Classification performance is affected by imprecision of the CLM tracking. Faculty of Informatics  Emotions with large distortions can be still recognized in about 90% of the cases, whereas AU0 normalization more subtle emotions are sometimes confused with others.  With the Personal Mean Shape Normalization correct classification percentage rises from 77.57% to 86.82% for the CLM tracked CK+. Personal Mean Shape normalization

  20. Comparison of results on CK+ Eötvös Loránd University Faculty of Informatics T/S – Texture/ Shape information

  21. Eötvös Loránd University CLM tracked BU-4DFE (frontal case)  We characterized the BU-4DFE database by using the CLM technique:  We selected a frame with neutral expression and an apex frame of the same frame series. I used these frames and all frames between them for the evaluations. Faculty of Informatics  We applied CLM tracking for the intermediate frames in order, since it is more robust than applying CLM independently for each frames.  We removed the rigid transformation after the fit and projected the frontal 3D shapes to 2D.  We applied a 6 class multi-class SVM  this database does not contain contempt  and evaluated the classifiers by the leave-one-subject-out method.  We compared the normalization using the CLM estimation of the AU0 values with the normalization based on the personal mean shape.

  22. Eötvös Loránd University CLM tracked BU-4DFE (frontal case) Faculty of Informatics AU0 normalization Personal Mean Shape normalization  We found an 8% improvement on the average in favor of the mean shape method.

  23. Eötvös Loránd University CLM tracked BU-4DFE (frontal case)  We executed cross evaluations.  We used the CK+ as the ground truth, since it seems more Faculty of Informatics precise:  the target expression for each sequence is fully FACS coded,  emotion labels have been revised and validated, and  CK+ utilizes FACS coding based emotion evaluation and this method is preferred in the literature considered .  We note however, that both the Cross Evaulation: CK – BU-4DFE CK+ and the BU 4DFE facial expressions are posed and not spontaneous.

  24. Eötvös Loránd University Pose invariance on BU-4DFE  Question: CLM’s performance as a function of pose,  pose invariant emotion recognition for situation analysis.  We used the BU 4DFE dataset to render 3D faces with six emotions (anger, disgust, Faculty of Informatics fear, happiness, sadness, and surprise), which are available in the database.  We randomly selected 25 subjects and rendered rotated versions of every emotion.  We covered rotation angles between 0 and 44 degrees of anti-clockwise rotation around the yaw axis.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend