concept detection concept detection convergence to local
play

Concept Detection: Concept Detection: Convergence to Local Features - PowerPoint PPT Presentation

Concept Detection: Concept Detection: Convergence to Local Features Convergence to Local Features and Opportunities Beyond and Opportunities Beyond Shih Fu Chang 1 , Junfeng He 1 , Yu Gang Jiang 1,2 , Elie El Khoury 3 , Chong Wah Ngo 2


  1. Concept Detection: Concept Detection: Convergence to Local Features Convergence to Local Features and Opportunities Beyond and Opportunities Beyond Shih ‐ Fu Chang 1 , Junfeng He 1 , Yu ‐ Gang Jiang 1,2 , Elie El Khoury 3 , Chong ‐ Wah Ngo 2 , Akira Yanagawa 1 , Eric Zavesky 1 g g g y 1 DVMM Lab, Columbia University , y 2 City University of Hong Kong 3 IRIT, Toulouse, France IRIT, Toulouse, France TRECVID 2008 workshop, NIST

  2. Overview: 5 components & 6 runs Overview: 5 components & 6 runs Classifiers 5 components 6 6 Local Feature 5 Global Feature Global Feature 4 SVM 374-d fea. CU ‐ VIREO374 CU VIREO374 3 Web Images Web Images 1 4 2 Face & Audio Filtering 2

  3. Overview: overall performance Overview: overall performance TRECVID 2008 Type ‐ A Submissions (161) TRECVID 2008 Type A Submissions ( ) 0.18 0.16 on ge Precisio 0.14 0 14 0.12 0.1 ean Averag 0.08 0.06 Me 0.04 0.02 0 0 – Local feature alone already achieves near top performance – Every other component contributes incrementally to the final detection 3

  4. Overview: per ‐ concept performance Overview: per ‐ concept performance 0.4 CU_2_run4+face&audio CU_4_run5+cu ‐ vireo374 0.35 CU_5_local_global CU_6_local_only 0.3 MEAN MAX 0.25 0.2 0.15 0.1 0.05 0 4

  5. Outline Outline Classifiers Classifiers 5 components 5 components 6 6 Local Feature 5 Global Feature Global Feature 4 SVM 374 ‐ d fea. 374 ‐ d fea. CU ‐ VIREO374 CU VIREO374 3 Web Images Web Images 1 4 2 Face & Audio Filtering 5

  6. Bag ‐ of ‐ Visual ‐ Words (BoW) Bag ‐ of ‐ Visual ‐ Words (BoW) 6

  7. Representation Choices of BoW Representation Choices of BoW • Word weighting scheme d h h – How to weight the importance of a word to an How to weight the importance of a word to an image? • Spatial information Spatial information – Are the spatial locations of keypoints useful? 7

  8. Weighting Scheme Weighting Scheme • Traditional… d l – Binary, Term frequency (TF), inverse document frequency y, q y ( ), q y (IDF)… • Our method • Our method – soft weighting soft weighting ‐‐ Assign a keypoint to multiple visual Assign a keypoint to multiple visual words ‐‐ weights are determined by keypoint ‐ to ‐ word similarity Details in: Jiang et al. CIVR 2007. 8 Image from http://www.cs.joensuu.fi/pages/franti/vq/lkm15.gif

  9. Vocabulary Size & Weighting Scheme Vocabulary Size & Weighting Scheme TRECVID 2006 Test Data 0.12 Binary TF TF ‐ IDF Soft 0.1 ion age Precisi 0.08 0.06 Mean Avera 0.04 M 0.02 0 500 1,000 5,000 10,000 Vocabulary Size – Soft weighting • Improve TF by 10% ‐ 20% – More accurate to assess the importance of a keypoint 9

  10. Spatial Information Spatial Information • Partition image into equal ‐ sized regions l d • Concatenate BoW features from the regions g – Poor generalizability F = ( f 11 , f 12 , f 13 , f 21 , f 22 , f 23 , f 31 , f 32 , f 33 ) 10

  11. Spatial Information Spatial Information TRECVID 2006 Test Data (soft ‐ weighting) C 006 est ata(so t e g t g) 0.14 1×1 region 2×2 regions 3×3 regions 4×4 regions sion 0.12 age precis 0.10 0.08 an avera 0.06 0.04 Mea 0.02 0.00 500 1000 5000 10000 Vocabulary size – Spatial Information does not help much for p p concept detection • 2x2 is a good choice • 3x3 and 4x4 may cause mismatch problem 11

  12. Local Feature Representation Framework Local Feature Representation Framework • a K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, L. Van Gool, “A comparison of affine region detectors”, IJCV, vol. 65, pp. 43 ‐ 72, 2005. 12

  13. Internal Results – Local Features Internal Results – Local Features • Over TRECVID 2008 Test Data O C 2008 MAP: 13% 13% Similar! 0.16 0.157 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 1x1 (3 detectors) 1x1 (2 detectors) 2x2 1x3 Run6: Fusion 13

  14. Failure Cases ‐ I Failure Cases ‐ I misses misses • Flower – Small visual area Small visual area – Coloration/texture too similar to background scene t b k d • Possible Solutions – Color ‐ descriptor – Color ‐ descriptor – Class ‐ specific visual words 14

  15. Failure Cases ‐ II Failure Cases ‐ II misses misses i • Boat_Ship, Airplane_flying – Learning biased by background scene – Difficulty from occlusion • Possible Solution – Feature selection F t l ti 15

  16. Summary – Local Features Summary – Local Features • BoW with good representation choices achieved very impressive performance y p p • Soft ‐ weighting is very effective • Multiple spatial layouts are useful l l l l f l • Multi ‐ detectors do not help much p • Rooms for future improvement • Class ‐ specific visual words, feature selection, color ‐ descriptor etc. 16

  17. Outline Outline Classifiers Classifiers 5 components 5 components 6 6 Local Feature 5 Global Feature Global Feature 4 SVM 374 ‐ d fea. 374 ‐ d fea. CU ‐ VIREO374 CU VIREO374 3 Web Images Web Images 1 4 2 Face & Audio Filtering 17

  18. Global Features Global Features • Grid based Color Moments (225 d) • Grid ‐ based Color Moments (225 ‐ d) • Wavelet texture (81 ‐ d) 0.4 A_ CU_6_local_only_6 A_ CU_5_local_global_5 0.35 0.3 0.25 0.2 0 2 0.15 0.1 0.05 0 18

  19. Outline Outline Classifiers Classifiers 5 components 5 components 6 6 Local Feature 5 Global Feature Global Feature 4 SVM 374 ‐ d fea. 374 ‐ d fea. CU ‐ VIREO374 CU VIREO374 3 Web Images Web Images 1 4 2 Face & Audio Filtering 19

  20. CU ‐ VIREO374 CU ‐ VIREO374 • Fusion of Columbia374 and VIREO374 Fusion of Columbia374 and VIREO374 Feature Dimension Grid ‐ based color moment (LUV) 225 Columbia374 Gabor Texture 48 Edge Direction Histogram 73 Bag of visual words (soft weighting) Bag ‐ of ‐ visual ‐ words (soft weighting) 500 500 VIREO374 Grid ‐ based Color Moment ( Lab ) 225 Grid ‐ based Wavelet Texture 81 Performance of CU ‐ VIREO374 over TRECVID 2006 Test Data CU ‐ VIREO374 VIREO374 Columbia374 Scores on the TRECVID2008 corpora: http://www.ee.columbia.edu/ln/dvmm/CU ‐ VIREO374/ Yu ‐ Gang Jiang, Akira Yanagawa, Shih ‐ Fu Chang, Chong ‐ Wah Ngo, "Fusing Columbia374 and VIREO ‐ 374 for Large 20 Scale Semantic Concept Detection", Columbia University ADVENT Technical Report #223 ‐ 2008 ‐ 1, Aug. 2008.

  21. Concept Fusion Using CU ‐ VIREO374 Concept Fusion Using CU ‐ VIREO374 • Train a SVM for each concept f h – Using CU ‐ VIREO374 scores as features Using CU VIREO374 scores as features TRECVID 2008 Test Data 2.2% 2.2% 0.18 on 0.16 0 16 age Preciso 0.14 0.12 0.1 0.08 0.08 Mean Avera 0.06 0.04 0.02 0 M CU ‐ VIREO374 Run5 Run4: run5+CU ‐ VIREO374 Run5: Local+global – Performance improvement is merely 2% Performance improvement is merely 2% • Need a better concept fusion model! 21

  22. Outline Outline Classifiers Classifiers 5 components 5 components 6 6 Local Feature 5 Global Feature Global Feature 4 SVM 374 ‐ d fea. 374 ‐ d fea. CU ‐ VIREO374 CU VIREO374 3 Web Images Web Images 1 4 2 Face & Audio Filtering 22

  23. Exploring External Images from Web Exploring External Images from Web • Problem bl – Sparsity of positive data Concept Name # Positive shots Concept Name # Positive shots Classroom 224 Harbor 195 Bridge 158 Telephone 184 Emergency_Vehicle 88 Street 1551 D Dog 122 122 Demonstration/Protest D t ti /P t t 134 134 Kitchen 250 Hand 1515 Airplane_flying 72 Mountain 239 Two_people 3630 Nighttime 424 Bus 87 Boat_Ship 437 Driver D i 258 258 Fl Flower 582 582 Cityscape 288 Singing 366 Total # of shots in TV’08 Dev: 36 262 Total # of shots in TV 08 Dev: 36,262 23

  24. Challenging Issues Challenging Issues • How to make use of the large amount of “noisily k f h l f “ i il labeled” web images for concept detection? – Issue 1: filter the false positive samples Flickr Images Flickr Images Good Good Bad Bad 24

  25. Challenging Issues Challenging Issues • How to make use of the large amount of “noisily k f h l f “ i il labeled” web images for concept detection? – Issue 1: filter the false positive samples – Issue 2: overcome the cross ‐ domain problem Issue 2: overcome the cross domain problem Flickr Flickr TRECVID TRECVID 25

  26. Preliminary Results Preliminary Results • Web image set: 18,000 from Flickr b i 8 000 f li k – Issue 1: filter the false positive samples • Graph based semi ‐ supervised learning – Issue 2: overcome the cross ‐ domain problem p • Weighted SVM • Results Results 0.3 A_CU_Run ‐ 4 0.25 – MAP: no difference C_CU_Run ‐ 3 0.2 (Bug Free) – “Bus”: improve 50% “ ” i 0% 0.15 0.1 • Open Problem! 0.05 0 26

  27. Outline Outline Classifiers Classifiers 5 components 5 components 6 6 Local Feature 5 Global Feature Global Feature 4 SVM 374 ‐ d fea. 374 ‐ d fea. CU ‐ VIREO374 CU VIREO374 3 Web Images Web Images 1 4 2 Face & Audio Filtering 27

  28. Face Detection and Tracking Face Detection and Tracking • Face Detection (OpenCV Toolbox) i (O C lb ) • Tracking based on face location and skin color g Backward Backward Forward Forward Character 1 tracking tracking Pt1 x x Pt2 Start Frame End Frame ... Pt1 Pt1 Pt1 Pt1 Face Detection Pt2 Pt2 Tracking 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend