Machine Learning Solutions to Visual Recognition Problems Jakob - PowerPoint PPT Presentation

Machine Learning Solutions to Visual Recognition Problems Jakob Verbeek Habilitation ` a Diriger des Recherches Universit´ e Grenoble Alpes Jury Prof. Eric Gaussier Univ. Grenoble Alpes Pr´ esident Prof. Matthieu Cord Univ. Pierre et Marie Curie Rapporteur Prof. Erik Learned-Miller Univ. of Massachusetts Rapporteur Prof. Andrew Zisserman Univ. of Oxford Rapporteur Dr. Cordelia Schmid INRIA Rhˆ one-Alpes Examinateur Prof. Tinne Tuytelaars K.U. Leuven Examinateur 1 / 45

Learning-based methods to understand natural imagery ◮ Recognition: people, objects, actions, events, . . . ◮ Localization: box, segmentation mask, space-time tube, . . . ◮ A technique-driven rather than application-driven approach 2 / 45

Layout of this presentation ◮ Synthetic review of past activities ◮ Overview of contributions ◮ Perspectives 3 / 45

Part I Synthetic review of past activities 4 / 45

Academic background: 1994 — 2005 — 2016 ◮ 1994-1998: MSc Artificial Intelligence, University of Amsterdam ◮ With honors, Peter Gr¨ unwald, Ronald de Wolf, Paul Vit´ anyi ◮ 1999-2000: MSc Logic, ILLC, University of Amsterdam ◮ With honors, Michiel van Lambalgen ◮ 2000-2004: PhD Computer Science, University of Amsterdam ◮ Ben Kr¨ ose, Nikos Vlassis, Frans Groen ◮ 2005-2007: Postdoctoral fellow, INRIA Rhˆ one-Alpes ◮ Bill Triggs ◮ since 2007: Permantent researcher, INRIA Rhˆ one-Alpes ◮ 2009: Promotion to CR1 ◮ 2016: Outstanding research distinction (PEDR) 5 / 45

Supervised PhD students ◮ 2006-2010: Matthieu Guillaumin ◮ Amazon research, Berlin, Germany ◮ 2008-2011: Josip Krapac ◮ PostDoc Univ. Zagreb, Croatia ◮ 2009-2012: Thomas Mensink, AFRIF best thesis award 2012 ◮ PostDoc Univ. Amsterdam, Netherlands ◮ 2010-2014: Gokberk Cinbis, AFRIF best thesis award 2014 ◮ Assistent Prof. Bilkent Univ. Ankara, Turkey ◮ 2011-2015: Dan Oneat ¸˘ a ◮ Data scientist, Eloquentix, Bucharest, Romania ◮ Since 2013: Shreyas Saxena ◮ Since 2016: Pauline Luc 6 / 45

Research funding: ANR, EU, Cifre, LabEx ◮ 2006-2009: Cognitive-Level Annotation using Latent Statistical Structure (CLASS) , funded by European Union ◮ 2008-2010: Interactive Image Search , funded by ANR ◮ 2009-2012: Modeling multi-media documents for cross-media access , Cifre PhD with Xerox Research Centre Europe ◮ 2010-2013: Quaero Consortium for Multimodal Person Recognition , funded by ANR ◮ 2011-2015: AXES: Access to Audiovisual Archives , funded by European Union ◮ 2013-2016: Physionomie: Physiognomic Recognition for Forensic Investigation , funded by ANR ◮ 2016-2018: Weakly supervised structured prediction for semantic segmentation , Cifre with Facebook AI Research ◮ 2016-2020: Deep covolutional and recurrent networks for image speech and text , Laboratory of Execelence Persyval 7 / 45

Publications ◮ 19 journal articles: 14 in TPAMI, IJCV, PR, TIP ◮ 34 conference papers: 25 (6 oral) ECCV, CVPR, ICCV, NIPS ◮ 5723 citations, H-index 36, i10-index 58 (Google scholar) ◮ 3 patents, joint inventions with Xerox Research Centre Europe 8 / 45

Research community service ◮ Associate editor ◮ International Journal of Computer Vision (since 2014) ◮ Image and Vision Computing Journal (since 2011) ◮ Chairs for international conferences ◮ Tutorial chair ECCV 2016 ◮ Area chair CVPR 2015 ◮ Area chair ECCV 2012, 2014. ◮ Area chair BMVC 2012, 2013, 2014. 9 / 45

Part II Overview of contributions 10 / 45

Layout of this presentation ◮ Synthetic review of past activities ◮ Overview of contributions 1. The Fisher vector representation 2. Metric learning approaches 3. Learning with incomplete supervision ◮ Perspectives 11 / 45

The Fisher vector representation ◮ Data representation by Fisher score vector [Jaakkola and Haussler, 1999] R D ∇ θ ln p ( x ; θ ) , θ ∈ I (1) ◮ Useful to represent non-vectorial data, e.g . sets, sequences,. . . ◮ For images: iid GMM for sets of local descriptors [Perronnin and Dance, 2007] N K � � p ( x 1: N ) = π k N ( x n ; µ k , σ k ) (2) n =1 k =1 ◮ Fisher vector contains local first and second order statistics N � ⊤ � 1 , x n , x 2 � ∇ ( π k ,µ k ,σ k ) ln p ( x ; θ ) = b + A p ( k | x n ) (3) n n =1 12 / 45

Related publications ◮ Fisher vectors for non-iid image models Cinbis, Schmid, Verbeek [CVPR’12, PAMI’16], 40 citations ◮ Approximate power and L2 normalization of FV Oneata, Schmid, Verbeek [CVPR’14], 23 citations ◮ Application for action and event recognition Oneata, Schmid, Verbeek, Wang [ICCV’13, IJCV’15], 158 citations ◮ Application for object localization Cinbis, Schmid, Verbeek [ICCV’13], 64 citations ◮ Fisher vectors for descriptor layout coding Jurie, Krapac, Verbeek [ICCV’11], 135 citations 13 / 45

Fisher vectors for non-iid image models ◮ Independence assumption generates sum-pooling in FV ◮ Bag-of-words [Csurka et al., 2004, Sivic and Zisserman, 2003] and iid GMM FV [Perronnin and Dance, 2007] ◮ Very poor assumption from modeling perspective ◮ Images are locally self-similar ◮ Representation should discount frequent events ◮ Compensated by power normalization, Hellinger or χ 2 kernel 14 / 45

Replace iid models with non-iid exchangeable counterparts π α π λ k a k λ k w i w i b k m k x i x i β k µ k µ k i =1 , 2 ,...,N k =1 , 2 ,...,K i =1 , 2 ,...,N k =1 , 2 ,...,K Gaussian mixture Latent Gaussian mixture ◮ Bayesian approach treat model parameters as latent variables ◮ Compute Fisher vector w.r.t. hyper-parameters ◮ Variational inference to approximate intractable gradients 15 / 45

Comparison to power normalization 1 1 0.8 0.9 SqrtMoG or LatMoG FV 0.6 0.8 0.4 0.7 0.2 0.6 0 LatMoG-1 -0.2 0.5 LatMoG-2 α = 1.0e−02 -0.4 0.4 LatMoG-3 α = 1.0e−01 -0.6 LatMoG-4 0.3 α = 1.0e+00 SqrtMoG α = 1.0e+01 -0.8 0.2 α = 1.0e+02 -1 α = 1.0e+03 0.1 -1 -0.8-0.6-0.4-0.2 0 0.2 0.4 0.6 0.8 1 square−root MoG FV 0 0 10 20 30 40 50 60 70 80 90 100 Bag-of-word counts Gaussian mean parameter ◮ Fisher vector non-iid model vs . power-normalization ◮ Qualitatively similar monotonic concave transformations ◮ Latent variable model explains effectiveness of power-normal. 16 / 45

Metric learning approaches ◮ Measures of similarity or distance have many applications ◮ Retrieval and matching of local descriptors or entire images ◮ Nearest neighbor prediction models ◮ Verification: do two objects belong to the same category ◮ Supervised training to discover the important features ◮ Notion of similarity is task dependent ◮ Methods such as FDA [Fisher, 1936] use only second moments FDA [Mensink et al., 2012] 18 / 45

Related publications ◮ Coordinated Local Metric Learning Saxena and Verbeek [ICCV’15 Workshop] ◮ Metric learning for nearest class-mean classifier Csurka, Mensink, Perronin, Verbeek [PAMI’13, ECCV’12 oral], 126 citations ◮ Multiple instance metric learning Guillaumin, Schmid, Verbeek [ECCV’10], 83 citations ◮ Discriminative metric learning in nearest neighbor models Guillaumin, Mensink, Schmid, Verbeek [ICCV’09 oral], 377 citations ◮ Logistic discriminant metric learning Guillaumin, Schmid, Verbeek [ICCV’09], 420 citations 19 / 45

Instantaneous adaptation to new samples and classes ◮ Consider photo-sharing service: stream of labeled images ◮ Re-training a discriminative model for new data is costly ◮ Generative models easily updated, but often perform worse ◮ KNN classifiers are very costly to evaluate for large dataset 20 / 45

Instantaneous adaptation to new samples and classes ◮ Consider photo-sharing service: stream of labeled images ◮ Re-training a discriminative model for new data is costly ◮ Generative models easily updated, but often perform worse ◮ KNN classifiers are very costly to evaluate for large dataset ◮ Nearest mean classifier is linear and easily updated y = arg min k || W ( x − µ k ) || 2 (4) ◮ Maximum likelihood estimation with softmax loss p ( y = k | x ) ∝ exp −|| W ( x − µ k ) || 2 (5) ◮ Corresponds to posterior in generative Gaussian mixture model p ( x | y = k ) = N ( x ; µ k , Σ) (6) 20 / 45

Experimental evaluation: ImageNet Challenge 2010 ◮ Train 1: metric and means from 1,000 classes ◮ Train 2: metric from 800 classes, means on all 1,000 ◮ Test: 200 classes not used for metric in (Train 2) Error in % KNN NCM Trained on all 38.4 36.4 Trained on 800 42.4 39.9 ◮ Linear NCM classifier better than non-parametric KNN ◮ In both cases metric is learned ◮ Training from other classes moderately impacts performance 21 / 45

Visualization of nearest classes using L2 and learned metric ◮ Classes closest to center of “Palm” in FV image space ◮ Learned Mahalanobis metric semantically more meaningful ◮ Improves predication accuracy ◮ Remaining errors are more sensible 22 / 45

Machine Learning Solutions to Visual Recognition Problems Jakob - PowerPoint PPT Presentation

Machine Learning Solutions to Visual Recognition Problems Jakob Verbeek Habilitation ` a Diriger des Recherches Universit e Grenoble Alpes Jury Prof. Eric Gaussier Univ. Grenoble Alpes Pr esident Prof. Matthieu Cord Univ. Pierre et

Machine visual perception Cordelia Schmid INRIA Grenoble Machine visual perception

Introduction to Visual Recognition General visual recognition importance for intelligence?

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Machine learning solutions to visual recognition problems Jakob Verbeek Synth` ese des travaux

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Rich representations for Rich representations for learning visual recognition learning visual

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Audio- -Visual Automatic Speech Recognition: Visual Automatic Speech Recognition: Audio Theory,

Image Retrieval with CNN Giorgos Tolias Visual Recognition Group, CTU in Prague CVPR 2017

Recap by Milo Davies, SAS NZ POWERFUL ADAPTIVE OPEN UNIFIED SAS Visual Analytics SAS Visual

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Top Mistakes in Representing the Person with Developmental Disabilities Macomb County Probate

Repository for Germinal Choice The Genius Factory B Y : B R E N D A N M C I N T Y R E

SECTOR: A Neural Model for Coherent Topic Segmentation and Classification Sebastian Arnold, Rudolf

When should morphology be taught in reading instruction? Kathy Rastle and Ana Ulicheva Royal

A Single Standard: A Single Standard: Sex, Gender and the Federal Sex, Gender and

An Introduction to the Independent Living Movement What is Independent Living? People with

Governor Pat McCrory Art Pope State Budget Director Office of State Budget and Management March

The Economics of Immigration David Card, UC Berkeley Background immigration is a defining

Sambuz

Useful Links

Newsletter

Mail Us