LEAR @ TrecVid MED 2012 a 1 , Matthijs Douze 1 , J ome Revaud 1 , - PowerPoint PPT Presentation

Low-level features Encoding High-level features Fusion Results References LEAR @ TrecVid MED 2012 a 1 , Matthijs Douze 1 , J´ ome Revaud 1 , Dan Oneat ¸˘ erˆ Jochen Schwenninger 2 , Heng Wang 1 , Danila Potapov 1 , ıd Harchaoui 1 , Jakob Verbeek 1 , Cordelia Schmid 1 Za¨ 1 LEAR team, INRIA Grenoble, France 2 Fraunhofer Sankt Augustin, Germany 1 / 17

Low-level features Encoding High-level features Fusion Results References Outline Low-level features: appearance, motion, audio 1 Feature encoding: Fisher vectors 2 High-level features: text 3 Fusion strategies 4 Experiments and results 5 2 / 17

Low-level features Encoding High-level features Fusion Results References Appearance and audio features Scale-invariant feature transform (SIFT, Lowe 2004 ): 21 × 21 patches at 4 pixel steps on 5 scales Every 60-th frame. 4 / 17

Low-level features Encoding High-level features Fusion Results References Appearance and audio features Scale-invariant feature transform (SIFT, Lowe 2004 ): 21 × 21 patches at 4 pixel steps on 5 scales Every 60-th frame. Mel-frequency cepstral coefficients (MFCC, Rabiner and Schafer 2007 ). Window of 25 ms and a step-size of 10 ms 39 coefficients: 12 MFCC and energy of the signal, first and second derivative Optionally: Speech/non-speech separation. 4 / 17

Low-level features Encoding High-level features Fusion Results References Motion features Dense trajectories (Wang et al., 2011) Strong performance on many action recognition datasets: Hollywood2, Youtube, UCF Sports. Idea: MBH descriptors computed across short densely sampled trajectories. Dense sampling in each spatial scale Wang, H., Kl¨ aser, A., Schmid, C., and Cheng-Lin, L. (2011). Action recognition by dense trajectories. In IEEE Conference on Computer Vision & Pattern Recognition , pages 3169–3176, Colorado Springs, United States 5 / 17

Low-level features Encoding High-level features Fusion Results References Motion features Dense trajectories (Wang et al., 2011) Strong performance on many action recognition datasets: Hollywood2, Youtube, UCF Sports. Idea: MBH descriptors computed across short densely sampled trajectories. Tracking in each spatial scale separately Dense sampling in each spatial scale Wang, H., Kl¨ aser, A., Schmid, C., and Cheng-Lin, L. (2011). Action recognition by dense trajectories. In IEEE Conference on Computer Vision & Pattern Recognition , pages 3169–3176, Colorado Springs, United States 5 / 17

Low-level features Encoding High-level features Fusion Results References Motion features Dense trajectories (Wang et al., 2011) Strong performance on many action recognition datasets: Hollywood2, Youtube, UCF Sports. Idea: MBH descriptors computed across short densely sampled trajectories. Tracking in each spatial scale separately Trajectory description Dense sampling in each spatial scale Wang, H., Kl¨ aser, A., Schmid, C., and Cheng-Lin, L. (2011). Action recognition by dense trajectories. In IEEE Conference on Computer Vision & Pattern Recognition , pages 3169–3176, Colorado Springs, United States 5 / 17

Low-level features Encoding High-level features Fusion Results References Motion features Dense trajectories (Wang et al., 2011) Strong performance on many action recognition datasets: Hollywood2, Youtube, UCF Sports. Idea: MBH descriptors computed across short densely sampled trajectories. Tracking in each spatial scale separately Trajectory description Dense sampling in each spatial scale HOG HOF MBH Wang, H., Kl¨ aser, A., Schmid, C., and Cheng-Lin, L. (2011). Action recognition by dense trajectories. In IEEE Conference on Computer Vision & Pattern Recognition , pages 3169–3176, Colorado Springs, United States 5 / 17

Low-level features Encoding High-level features Fusion Results References Motion features 6 / 17

Low-level features Encoding High-level features Fusion Results References Video rescaling for dense trajectories Computationally expensive: cost scales linearly with the size of the video (time × resolution) 7 / 17

Low-level features Encoding High-level features Fusion Results References Video rescaling for dense trajectories Computationally expensive: cost scales linearly with the size of the video (time × resolution) Speed-ups: Rescale videos: width at most 200 px. 7 / 17

Low-level features Encoding High-level features Fusion Results References Video rescaling for dense trajectories Computationally expensive: cost scales linearly with the size of the video (time × resolution) Speed-ups: Rescale videos: width at most 200 px. Skip every second frame 7 / 17

Low-level features Encoding High-level features Fusion Results References Video rescaling for dense trajectories Computationally expensive: cost scales linearly with the size of the video (time × resolution) Speed-ups: Rescale videos: width at most 200 px. Skip every second frame Process descriptors on-the-fly. 7 / 17

Low-level features Encoding High-level features Fusion Results References Feature encoding: Fisher vectors (Perronnin et al., 2010) Top feature encoding technique for: object recognition (Chatfield et al., 2011) action recognition (Wang et al., 2012) . 9 / 17

Low-level features Encoding High-level features Fusion Results References Feature encoding: Fisher vectors (Perronnin et al., 2010) Top feature encoding technique for: object recognition (Chatfield et al., 2011) action recognition (Wang et al., 2012) . Fisher vectors (FV) for GMM: 9 / 17

Low-level features Encoding High-level features Fusion Results References Feature encoding: Fisher vectors (Perronnin et al., 2010) Top feature encoding technique for: object recognition (Chatfield et al., 2011) action recognition (Wang et al., 2012) . Fisher vectors (FV) for GMM: soft bag-of-words: � x p ( k | x ) first moment: � x p ( k | x )( x − µ k ) x p ( k | x )( x − µ k ) 2 . second moment: � 9 / 17

Low-level features Encoding High-level features Fusion Results References Feature encoding: Fisher vectors (Perronnin et al., 2010) Top feature encoding technique for: object recognition (Chatfield et al., 2011) action recognition (Wang et al., 2012) . Fisher vectors (FV) for GMM: soft bag-of-words: � x p ( k | x ) first moment: � x p ( k | x )( x − µ k ) x p ( k | x )( x − µ k ) 2 . second moment: � FV size: K + 2 KD K : number of Gaussians D : descriptor dimension. 9 / 17

Low-level features Encoding High-level features Fusion Results References Feature encoding: Fisher vectors (Perronnin et al., 2010) Top feature encoding technique for: object recognition (Chatfield et al., 2011) action recognition (Wang et al., 2012) . Fisher vectors (FV) for GMM: soft bag-of-words: � x p ( k | x ) first moment: � x p ( k | x )( x − µ k ) x p ( k | x )( x − µ k ) 2 . second moment: � FV size: K + 2 KD K : number of Gaussians D : descriptor dimension. Normalization: zero mean, unit variance signed square-rooting ℓ 2 normalization. 9 / 17

Low-level features Encoding High-level features Fusion Results References High-level features. Optical character recognition Feature extraction: Maximally stable extremal regions Video frame all MSERs (MSER; Matas et al. 2004 ) Gradient filtering Color and stroke width filtering 11 / 17 Pairs filtering Forming words

Video frame all MSERs Low-level features Encoding High-level features Fusion Results References High-level features. Optical character recognition Feature extraction: Maximally stable extremal regions Gradient filtering Color and stroke width filtering (MSER; Matas et al. 2004 ) Filtering based on boundary gradients and aspect ratio. Pairs filtering Forming words 11 / 17

LEAR @ TrecVid MED 2012 a 1 , Matthijs Douze 1 , J ome Revaud 1 , - PowerPoint PPT Presentation

Low-level features Encoding High-level features Fusion Results References LEAR @ TrecVid MED 2012 a 1 , Matthijs Douze 1 , J ome Revaud 1 , Dan Oneat er Jochen Schwenninger 2 , Heng Wang 1 , Danila Potapov 1 , d Harchaoui 1 ,

I . Preliminaries: practical matters I . Preliminaries: practical matters A. Office

LEAR C ONTIGUOUS A REAS A NALYSIS (CAA) M APPING R EFINEMENT LEAR Open House Presentation April

Health and Safety at Workplace Dr N.B.P.Balalla MBBS, M.Med(Occup.Med),Cert Av.Med Head ,

Learning From Video Browse Behavior Learning From Video Browse Behavior TRECVID 2009 TRECVID

George Awad National Institute of Standards and Technology Dakota Consulting, Inc 2 TRECVID

CMU @ TRECVID Event Detection @ Ming-yu Chen & Alex Hauptmann School of Computer Science

Columbia HLF: TRECVID2006 TRECVID TRECVID TRECVID 2005 2005 2005 (development)

Event Detection in Airport Surveillance The TRECVid 2008 Evaluation The TRECVid 2008 Evaluation

TRECVID 2008 CBCD TRECVID 2008. CBCD MCG-ICT-CAS MCG-ICT-CAS Sheng Tang Yongdong Zhang Ke Gao

TRECVID 2010 K TRECVID 2010 Known item Search it S h by NUS by NUS Xiangyu Chen, Jin Yuan

Adaptive Feature Discovery for TRECVID Broadcast News Video Story Segmentation @TRECVID Workshop

THE REGION OF PEEL & TOWN OF CALEDON LAND EVALUATION AREA REVIEW (LEAR) STUDY PUBLIC

Trainin ing and Exercis ising th the Nucle lear Safety and Nucle lear Securit ity In

Vertical Restraints in e-commerce Paolo Buccirossi Lear Laboratorio di economia, antitrust,

INRIA LEAR-TEXMEX: Copy detection task INRIA TEXMEX (RENNES) INRIA LEAR Herv Jgou

BBN VISER TRECVID MED 11 System 1/12/2012 1 Outline Overview Feature Extraction

Logic in Computer Science Ulrich Berger Swansea University ISSUGE Summer School Genova, July

LOOKING BACKWARD LOOKING FORWARD Introduction: Some History, Some People Dana S. Scott

U-INF OR-ME Univ ersal Information Matrix Engine Prop osal Num b er: IST-2001-32011 P

Wave of Grand Challenge Initiatives Grand Challenges in Computer Science and Engineering

INF5210 INF5210 Information Infrastructures Information Infrastructures Information

Software Development (Chapter 7) CSE 1020 Franck van Breugel November 3, 2009 Franck van

Measuring the Effects of Happy Eyeballs draft-bajpai-happy-01 Vaibhav Bajpai and Jrgen

UK Education: Skills gaps and recruitment Colin Smith, Curriculum Director Computing @ UTC

LEAR @ TrecVid MED 2012 a 1 , Matthijs Douze 1 , J ome Revaud 1 , - PowerPoint PPT Presentation

Low-level features Encoding High-level features Fusion Results References LEAR @ TrecVid MED 2012 a 1 , Matthijs Douze 1 , J ome Revaud 1 , Dan Oneat er Jochen Schwenninger 2 , Heng Wang 1 , Danila Potapov 1 , d Harchaoui 1 ,

I . Preliminaries: practical matters I . Preliminaries: practical matters A. Office

LEAR C ONTIGUOUS A REAS A NALYSIS (CAA) M APPING R EFINEMENT LEAR Open House Presentation April

Health and Safety at Workplace Dr N.B.P.Balalla MBBS, M.Med(Occup.Med),Cert Av.Med Head ,

Learning From Video Browse Behavior Learning From Video Browse Behavior TRECVID 2009 TRECVID

George Awad National Institute of Standards and Technology Dakota Consulting, Inc 2 TRECVID

CMU @ TRECVID Event Detection @ Ming-yu Chen &amp; Alex Hauptmann School of Computer Science

Columbia HLF: TRECVID2006 TRECVID TRECVID TRECVID 2005 2005 2005 (development)

Event Detection in Airport Surveillance The TRECVid 2008 Evaluation The TRECVid 2008 Evaluation

TRECVID 2008 CBCD TRECVID 2008. CBCD MCG-ICT-CAS MCG-ICT-CAS Sheng Tang Yongdong Zhang Ke Gao

TRECVID 2010 K TRECVID 2010 Known item Search it S h by NUS by NUS Xiangyu Chen, Jin Yuan

Adaptive Feature Discovery for TRECVID Broadcast News Video Story Segmentation @TRECVID Workshop

THE REGION OF PEEL &amp; TOWN OF CALEDON LAND EVALUATION AREA REVIEW (LEAR) STUDY PUBLIC

Trainin ing and Exercis ising th the Nucle lear Safety and Nucle lear Securit ity In

Vertical Restraints in e-commerce Paolo Buccirossi Lear Laboratorio di economia, antitrust,

INRIA LEAR-TEXMEX: Copy detection task INRIA TEXMEX (RENNES) INRIA LEAR Herv Jgou

BBN VISER TRECVID MED 11 System 1/12/2012 1 Outline Overview Feature Extraction

Logic in Computer Science Ulrich Berger Swansea University ISSUGE Summer School Genova, July

LOOKING BACKWARD LOOKING FORWARD Introduction: Some History, Some People Dana S. Scott

U-INF OR-ME Univ ersal Information Matrix Engine Prop osal Num b er: IST-2001-32011 P

Wave of Grand Challenge Initiatives Grand Challenges in Computer Science and Engineering

INF5210 INF5210 Information Infrastructures Information Infrastructures Information

Software Development (Chapter 7) CSE 1020 Franck van Breugel November 3, 2009 Franck van

Measuring the Effects of Happy Eyeballs draft-bajpai-happy-01 Vaibhav Bajpai and Jrgen

UK Education: Skills gaps and recruitment Colin Smith, Curriculum Director Computing @ UTC

CMU @ TRECVID Event Detection @ Ming-yu Chen & Alex Hauptmann School of Computer Science

THE REGION OF PEEL & TOWN OF CALEDON LAND EVALUATION AREA REVIEW (LEAR) STUDY PUBLIC