machine learning solutions to visual recognition problems
play

Machine Learning Solutions to Visual Recognition Problems Jakob - PowerPoint PPT Presentation

Machine Learning Solutions to Visual Recognition Problems Jakob Verbeek Habilitation ` a Diriger des Recherches Universit e Grenoble Alpes Jury Prof. Eric Gaussier Univ. Grenoble Alpes Pr esident Prof. Matthieu Cord Univ. Pierre et


  1. Machine Learning Solutions to Visual Recognition Problems Jakob Verbeek Habilitation ` a Diriger des Recherches Universit´ e Grenoble Alpes Jury Prof. Eric Gaussier Univ. Grenoble Alpes Pr´ esident Prof. Matthieu Cord Univ. Pierre et Marie Curie Rapporteur Prof. Erik Learned-Miller Univ. of Massachusetts Rapporteur Prof. Andrew Zisserman Univ. of Oxford Rapporteur Dr. Cordelia Schmid INRIA Rhˆ one-Alpes Examinateur Prof. Tinne Tuytelaars K.U. Leuven Examinateur 1 / 45

  2. Learning-based methods to understand natural imagery ◮ Recognition: people, objects, actions, events, . . . ◮ Localization: box, segmentation mask, space-time tube, . . . ◮ A technique-driven rather than application-driven approach 2 / 45

  3. Layout of this presentation ◮ Synthetic review of past activities ◮ Overview of contributions ◮ Perspectives 3 / 45

  4. Part I Synthetic review of past activities 4 / 45

  5. Academic background: 1994 — 2005 — 2016 ◮ 1994-1998: MSc Artificial Intelligence, University of Amsterdam ◮ With honors, Peter Gr¨ unwald, Ronald de Wolf, Paul Vit´ anyi ◮ 1999-2000: MSc Logic, ILLC, University of Amsterdam ◮ With honors, Michiel van Lambalgen ◮ 2000-2004: PhD Computer Science, University of Amsterdam ◮ Ben Kr¨ ose, Nikos Vlassis, Frans Groen ◮ 2005-2007: Postdoctoral fellow, INRIA Rhˆ one-Alpes ◮ Bill Triggs ◮ since 2007: Permantent researcher, INRIA Rhˆ one-Alpes ◮ 2009: Promotion to CR1 ◮ 2016: Outstanding research distinction (PEDR) 5 / 45

  6. Supervised PhD students ◮ 2006-2010: Matthieu Guillaumin ◮ Amazon research, Berlin, Germany ◮ 2008-2011: Josip Krapac ◮ PostDoc Univ. Zagreb, Croatia ◮ 2009-2012: Thomas Mensink, AFRIF best thesis award 2012 ◮ PostDoc Univ. Amsterdam, Netherlands ◮ 2010-2014: Gokberk Cinbis, AFRIF best thesis award 2014 ◮ Assistent Prof. Bilkent Univ. Ankara, Turkey ◮ 2011-2015: Dan Oneat ¸˘ a ◮ Data scientist, Eloquentix, Bucharest, Romania ◮ Since 2013: Shreyas Saxena ◮ Since 2016: Pauline Luc 6 / 45

  7. Research funding: ANR, EU, Cifre, LabEx ◮ 2006-2009: Cognitive-Level Annotation using Latent Statistical Structure (CLASS) , funded by European Union ◮ 2008-2010: Interactive Image Search , funded by ANR ◮ 2009-2012: Modeling multi-media documents for cross-media access , Cifre PhD with Xerox Research Centre Europe ◮ 2010-2013: Quaero Consortium for Multimodal Person Recognition , funded by ANR ◮ 2011-2015: AXES: Access to Audiovisual Archives , funded by European Union ◮ 2013-2016: Physionomie: Physiognomic Recognition for Forensic Investigation , funded by ANR ◮ 2016-2018: Weakly supervised structured prediction for semantic segmentation , Cifre with Facebook AI Research ◮ 2016-2020: Deep covolutional and recurrent networks for image speech and text , Laboratory of Execelence Persyval 7 / 45

  8. Publications ◮ 19 journal articles: 14 in TPAMI, IJCV, PR, TIP ◮ 34 conference papers: 25 (6 oral) ECCV, CVPR, ICCV, NIPS ◮ 5723 citations, H-index 36, i10-index 58 (Google scholar) ◮ 3 patents, joint inventions with Xerox Research Centre Europe 8 / 45

  9. Research community service ◮ Associate editor ◮ International Journal of Computer Vision (since 2014) ◮ Image and Vision Computing Journal (since 2011) ◮ Chairs for international conferences ◮ Tutorial chair ECCV 2016 ◮ Area chair CVPR 2015 ◮ Area chair ECCV 2012, 2014. ◮ Area chair BMVC 2012, 2013, 2014. 9 / 45

  10. Part II Overview of contributions 10 / 45

  11. Layout of this presentation ◮ Synthetic review of past activities ◮ Overview of contributions 1. The Fisher vector representation 2. Metric learning approaches 3. Learning with incomplete supervision ◮ Perspectives 11 / 45

  12. The Fisher vector representation ◮ Data representation by Fisher score vector [Jaakkola and Haussler, 1999] R D ∇ θ ln p ( x ; θ ) , θ ∈ I (1) ◮ Useful to represent non-vectorial data, e.g . sets, sequences,. . . ◮ For images: iid GMM for sets of local descriptors [Perronnin and Dance, 2007] N K � � p ( x 1: N ) = π k N ( x n ; µ k , σ k ) (2) n =1 k =1 ◮ Fisher vector contains local first and second order statistics N � ⊤ � 1 , x n , x 2 � ∇ ( π k ,µ k ,σ k ) ln p ( x ; θ ) = b + A p ( k | x n ) (3) n n =1 12 / 45

  13. Related publications ◮ Fisher vectors for non-iid image models Cinbis, Schmid, Verbeek [CVPR’12, PAMI’16], 40 citations ◮ Approximate power and L2 normalization of FV Oneata, Schmid, Verbeek [CVPR’14], 23 citations ◮ Application for action and event recognition Oneata, Schmid, Verbeek, Wang [ICCV’13, IJCV’15], 158 citations ◮ Application for object localization Cinbis, Schmid, Verbeek [ICCV’13], 64 citations ◮ Fisher vectors for descriptor layout coding Jurie, Krapac, Verbeek [ICCV’11], 135 citations 13 / 45

  14. Fisher vectors for non-iid image models ◮ Independence assumption generates sum-pooling in FV ◮ Bag-of-words [Csurka et al., 2004, Sivic and Zisserman, 2003] and iid GMM FV [Perronnin and Dance, 2007] ◮ Very poor assumption from modeling perspective ◮ Images are locally self-similar ◮ Representation should discount frequent events ◮ Compensated by power normalization, Hellinger or χ 2 kernel 14 / 45

  15. Replace iid models with non-iid exchangeable counterparts π α π λ k a k λ k w i w i b k m k x i x i β k µ k µ k i =1 , 2 ,...,N k =1 , 2 ,...,K i =1 , 2 ,...,N k =1 , 2 ,...,K Gaussian mixture Latent Gaussian mixture ◮ Bayesian approach treat model parameters as latent variables ◮ Compute Fisher vector w.r.t. hyper-parameters ◮ Variational inference to approximate intractable gradients 15 / 45

  16. Comparison to power normalization 1 1 0.8 0.9 SqrtMoG or LatMoG FV 0.6 0.8 0.4 0.7 0.2 0.6 0 LatMoG-1 -0.2 0.5 LatMoG-2 α = 1.0e−02 -0.4 0.4 LatMoG-3 α = 1.0e−01 -0.6 LatMoG-4 0.3 α = 1.0e+00 SqrtMoG α = 1.0e+01 -0.8 0.2 α = 1.0e+02 -1 α = 1.0e+03 0.1 -1 -0.8-0.6-0.4-0.2 0 0.2 0.4 0.6 0.8 1 square−root MoG FV 0 0 10 20 30 40 50 60 70 80 90 100 Bag-of-word counts Gaussian mean parameter ◮ Fisher vector non-iid model vs . power-normalization ◮ Qualitatively similar monotonic concave transformations ◮ Latent variable model explains effectiveness of power-normal. 16 / 45

  17. Layout of this presentation ◮ Synthetic review of past activities ◮ Overview of contributions 1. The Fisher vector representation 2. Metric learning approaches 3. Learning with incomplete supervision ◮ Perspectives 17 / 45

  18. Metric learning approaches ◮ Measures of similarity or distance have many applications ◮ Retrieval and matching of local descriptors or entire images ◮ Nearest neighbor prediction models ◮ Verification: do two objects belong to the same category ◮ Supervised training to discover the important features ◮ Notion of similarity is task dependent ◮ Methods such as FDA [Fisher, 1936] use only second moments FDA [Mensink et al., 2012] 18 / 45

  19. Related publications ◮ Coordinated Local Metric Learning Saxena and Verbeek [ICCV’15 Workshop] ◮ Metric learning for nearest class-mean classifier Csurka, Mensink, Perronin, Verbeek [PAMI’13, ECCV’12 oral], 126 citations ◮ Multiple instance metric learning Guillaumin, Schmid, Verbeek [ECCV’10], 83 citations ◮ Discriminative metric learning in nearest neighbor models Guillaumin, Mensink, Schmid, Verbeek [ICCV’09 oral], 377 citations ◮ Logistic discriminant metric learning Guillaumin, Schmid, Verbeek [ICCV’09], 420 citations 19 / 45

  20. Instantaneous adaptation to new samples and classes ◮ Consider photo-sharing service: stream of labeled images ◮ Re-training a discriminative model for new data is costly ◮ Generative models easily updated, but often perform worse ◮ KNN classifiers are very costly to evaluate for large dataset 20 / 45

  21. Instantaneous adaptation to new samples and classes ◮ Consider photo-sharing service: stream of labeled images ◮ Re-training a discriminative model for new data is costly ◮ Generative models easily updated, but often perform worse ◮ KNN classifiers are very costly to evaluate for large dataset ◮ Nearest mean classifier is linear and easily updated y = arg min k || W ( x − µ k ) || 2 (4) ◮ Maximum likelihood estimation with softmax loss p ( y = k | x ) ∝ exp −|| W ( x − µ k ) || 2 (5) ◮ Corresponds to posterior in generative Gaussian mixture model p ( x | y = k ) = N ( x ; µ k , Σ) (6) 20 / 45

  22. Experimental evaluation: ImageNet Challenge 2010 ◮ Train 1: metric and means from 1,000 classes ◮ Train 2: metric from 800 classes, means on all 1,000 ◮ Test: 200 classes not used for metric in (Train 2) Error in % KNN NCM Trained on all 38.4 36.4 Trained on 800 42.4 39.9 ◮ Linear NCM classifier better than non-parametric KNN ◮ In both cases metric is learned ◮ Training from other classes moderately impacts performance 21 / 45

  23. Visualization of nearest classes using L2 and learned metric ◮ Classes closest to center of “Palm” in FV image space ◮ Learned Mahalanobis metric semantically more meaningful ◮ Improves predication accuracy ◮ Remaining errors are more sensible 22 / 45

  24. Layout of this presentation ◮ Synthetic review of past activities ◮ Overview of contributions 1. The Fisher vector representation 2. Metric learning approaches 3. Learning with incomplete supervision ◮ Perspectives 23 / 45

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend