Kobe University at TRECVI D 2009 Search Task Topic Retrieval based on Rough Set Theory and Partially Supervised Learning Kimiaki Shirahama, Chieri Sugihara, Yuta Matsuoka and Kuniaki Uehara
System Overview Difficulty of preparing indexing and retrieval models for all possible topics → Define a topic based on examples provided by a user Topic 289: one or more people, each sitting in a chair, talking Positives Negatives Retrieved Partially Rough Topic supervised set definition learning theory TRECVID video collection
Features 1. Grid-based color, edge and visual word histograms 2. Moving regions 3. # of faces with a certain size R = (x, y, size, h_move, v_move) One large-size face Two small-size faces One shot is represented by the Total 94 features!
Rough Set Theory Large variation of features in the same topic → Extract subsets where positives can be correctly discriminated from all negatives Topic 271: A view of one or more tall buildings … Positives Negatives Subsets are computed by boolean algebra of features and described by decision rules . Color hist. I F is similar to Edge hist. ∧ is similar to , THEN Positive
Difficulty of Selecting Negative Examples A great variety of shots can be negatives Topic 271: A view of one or more tall buildings (more than 4 stories) and the top story visible Too much dissimilar → Many irrelevant features are included in decision rules Positive Neither similar nor dissimilar Many relevant features are included in decision rules, e.g. long vertical edges, few edges in the upper part, etc. Too much similar → Many relevant features are ignored (with two stories) How to select effective negatives for defining a topic?
Partially Supervised Learning Build a classifier only from positives by selecting negatives from unlabeled examples � Web document classification → Documents on the Web as unlabeled examples � Our topic retrieval → Shots except for positives as unlabeled examples Similarity-based method (Fung et al. TKDE 2006) → Effective in the case where only a small number of positives are available Positives Reliable negative 1. Reliable negative selection 2. Clustering-based additional negative selection
Partially Supervised Learning Build a classifier only from positives by selecting negatives from unlabeled examples � Web document classification → Documents on the Web as unlabeled examples � Our topic retrieval → Shots except for positives as unlabeled examples Similarity-based approach (Fung et al. TKDE 2006) → Effective in the case where only a small number of positives are available Additional negative Reliable negative Positives 1. Reliable negative selection 2. Clustering-based additional negative selection How to calculate similarities in a high-dimensional feature space?
Subspace Clustering Due to many irrelevant features, we cannot appropriately calculate similarities → Find specific features to each example Subspace clustering ( PROCLUS proposed by C. Aggarwal et al. SIGMOD 99) → Group examples into clusters in different subspaces of the high-dimensional space Cluster 1 Cluster 2 Cluster 3 Cluster 4 Calculate similarities of an example to the other examples only by using the set of associated features!
Submitted Runs 1. M_A_N_cs24_kobe1_1 � Positives by manual and negatives by random 2. M_A_N_cs24_kobe2_2 � Positives by manual and negatives by Partially Supervised Learning 3. I _A_N_cs24_kobeS_3 (supplemental) � Positives by manual and negatives by random � Positives and negatives interactively selected from each retrieval result Experimental purposes � Examine the effectiveness of rough set theory � Examine the effectiveness of partially supervised learning � Examine the Influence of positives and negatives on the performance
Example of Good Retrieval Topic 277: A person talking behind a microphone Topic 285: Printed, typed, or handwritten text, filling more than half of the frame area Topic 289: One or more people, each sitting in a chair, talking Rough set theory can cover a large variation of features in the same topic!
Comparison to Automatic Runs MAP 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 M_A_N_cs24_kobe2_2 M_A_N_cs24_kobe1_1 I_A_N_cs24_kobeS_3 NOTE: Only three runs have been submitted for the manually-assisted category.
Comparison to Interactive Runs 0.3 MAP 0.25 0.2 0.15 0.1 0.05 0 M_A_N_cs24_kobe2_2 M_A_N_cs24_kobe1_1 I_A_N_cs24_kobeS_3 Difficulty of deriving an accurate conclusion for partially supervised learning Why our runs are so bad?
Additional Experiment Our assumption: Features in submitted runs are ineffective Additional Experiment � Select 50 positives and 50 negatives from TRECVID 2008 test videos � Use various combinations of features � Features used in submitted runs: � Color, edge and visual word histograms, � Moving regions, # of faces with a certain size � Additional features: � Grid-based color moment � Gabor texture � Concept detection scores (provided by MediaMill) � HOG � Camera work � Retrieve shots of a topic in 200 of TRECVID 2009 test videos
Main reason for our bad runs Topic ID 271 272 287 291 292 Same features 14 3 5 2 9 Effective features 90 11 50 12 38 * Estimated best values* 70 22 86 22 10 Best values in TRECVID 209 66 257 66 30 ‘ 09 Using ineffective features is the main reason for our bad runs! � Promising performance when effective features can be selected � Effectiveness of camera work feature
Zoom in/out estimation by split Main reason for our bad runs tensor histogram method (Kumano et al. ITE (In Japanese)) Topic ID 271 272 287 291 292 Same features 14 3 5 2 9 Effective features 90 11 50 12 38 * Estimated best values* 70 22 86 22 10 Best values in TRECVID 209 66 257 66 30 ‘ 09 Using ineffective features is the main reason for our bad runs! � Promising performance when effective features can be selected � Effectiveness of camera work feature
What is an Effective Feature? Topic ID 271 272 287 291 292 Original result 72 8 34 9 24 Concept Camera work Original features Concept Concept Concept + Color mom. + # of faces Best result 90 11 50 12 38 Effective features Color hist. Color hist. Camera work Gabor tex. Concept Worst result 76 2 16 3 24 Ineffective features Gabor tex. Edge hist. Edge hist. Visual words Gabor tex. All features 66 7 19 1 7 Posteriori Comb. 80 4 36 4 37 Color hist. Color hist. Edge hist. + Edge hist. Color hist. + Moving reg. Concept Features + Moving reg. + Color mom. + Gabor tex. + Gabor tex. + Color mom. + Gabor tex. + Camera work + Camera work Rather than many features, using two or three features leads to the best performance! Neither visual words nor HOG are effective features.
How Retrieved Shots Change Depending on Features? Topic ID 271 272 292 Original result 72 8 24 Camera work Original Feature Concept Concept + # of faces Color hist. Gabor tex. Camera work Edge hist. Gabor tex. Concept (Effective) (Effective) (Ineffective) (Effective) (Ineffective) (Ineffective) Overlapping shots 66 61 28 9 22 14 Removed shots 6 11 6 25 2 10 Added shots 24 15 16 7 16 10 NOTE: Similar results are obtained for Topic 287 and 291 + + Effective features preserve many relevant shots retrieved by original features, and add more relevant shots. -- Ineffective features remove many relevant shots retrieved by original features.
How Decision Rules Change Depending on Features? Topic 271: Tall building Topic 287: People, table and computer Computer Building Sky Urban Face Office or Television Concept Concept 357 210 385 177 284 235 (Original) (Original) Concept Concept 361 204 342 138 355 174 + color hist. + Camera work Concept Concept 241 152 327 77 86 303 + Gabor tex. + Edge hist. + + Effective features preserve most of useful decision rules -- Ineffective features substitute useful decision rules with inaccurate ones Wrong Wrong match match
How to Select Negatives? Topic ID 271 272 287 291 292 Baseline 80 (+ 8) 3 (-5) 58 (+ 24) 12 (+ 3) 33 (+ 9) Concept Camera work Features Concept Concept Concept + Color mom. + # of faces 92 (+ 2) 56 (+ 6) 15 (+ 3) Best result 8 (-3) 36 (-2) Added feature Color hist. Color hist. Moving Reg. Camera work Visual words Topic 287: one or more people, each at a table or desk with computer visible Random Partially supervised learning • Many edges in the upper part • Few edges in the upper part • Many shots where a person appears • Small number of shots where a person appears Near miss negatives are not useful for defining a topic in videos!
Lecture #16: Iterators, Generators An Iterator Confusion The distinction between iterators (things with a method) next and iterables (things from which the iter function can construct an iterator) can be confusing, and sometimes downright
85 views • 4 slides
Generalized Data Structure Synthesis Calvin Loncaric Michael D. Ernst Emina Torlak 1 Stateful modules are much more complicated than their specifications Specification Implementation 1992 Lines of code 1383 292 157 3668 41 42
846 views • 27 slides
S u p e r c h a r g e Y o u r S a l e s W i t h M o r t g a g e Q u e s t C h r i s C a r t e r M a r k e t F o c u s , I n c M a r k e t F o c u s , I n c . F i r
704 views • 48 slides
vendredi 12 mai 2017 JUX (Java Universal eXplorer) Pascal Calvat Several grid in the world re ware ddlewa midd ARC GOS NAREGI JUX Overview JUX is a file explorer designed to be independent of Operating System full java code
163 views • 12 slides
HEINZ NIXDORF INSTITUTE University of Paderborn Schaltungstechnik Dr.-Ing. Mario Porrmann Design Space Exploration for Memory Subsystems of VLIW Architectures Thorsten Jungeblut 1 , Gregor Sievers, Mario Porrmann 1 , Ulrich Rckert 2 1 System
577 views • 20 slides
Title Embedding ACL2 in HOL Mike Gordon, Warren A. Hunt, Jr., Matt Kaufmann, James Reynolds Gordon, Hunt, Kaufmann, Reynolds Embedding ACL2 in HOL (ACL206 Workshop, Seattle) 1 / 23 Title Embedding ACL2 in HOL Mike Gordon, Warren A. Hunt,
350 views • 23 slides
CMSC201 Computer Science I for Majors Lecture 25 Final Exam Review Prof. Katherine Gibson Prof. Jeremy Dixon www.umbc.edu Office Hours Tuesday is the last day office hours will be held 2 www.umbc.edu Exam Rules The final is
478 views • 27 slides
Multiband With Contaminated Training Data Results on AURORA 2 TCTS Facult Polytechnique de Mons Belgium INTRODUCTION The noise contamination of speech corpus leads to quasi optimal performance when test noise conditions match
638 views • 10 slides
All you need to know about the ITE Early Admissions Exercise Impetus 2017 For those who are clear what they want to pursue, we should support them as much as possible, to facilitate their admission into our PSEIs based on interests and
775 views • 42 slides
MaxSAT and Related Optimization Problems Joao Marques-Silva 1 , 2 1 University College Dublin, Ireland 2 IST/INESC-ID, Lisbon, Portugal SAT/SMT Summer School 2012 FBK, Trento, Italy Example Problem: Minimum Vertex Cover The problem: Graph
1.66k views • 145 slides
Scaling symbolic evaluation for automated verification of systems code with Serval Luke Nelson , James Bornholt , Ronghui Gu , Andrew Baumann , Emina Torlak , Xi Wang University of Washington, Columbia University,
713 views • 38 slides
E-LAB IN INSTITUTE OF TECHNICAL EDUCATION (ITE) Chua Wee Seng Ong Chao Xiang 1 INSTITUTE OF TECHNICAL EDUCATION In times of unforeseen events 2014 2010 Fire at Dover Campus 2009 H1N1 Outbreak 2003 SARS Outbreak 2 INSTITUTE OF TECHNICAL
466 views • 32 slides
Facility Layout Two levels of layout problems: Machine : determine assignment of machines to (fixed) sites Departmental : determine space requirements of each department (or room) and its shape and relation of other departments 102
520 views • 14 slides
Bayesian Counterfactual Risk Minimization Ben London (blondon@) Amazon Music Ted Sandler (sandler@) Amazon Music International Conference on Machine Learning Long Beach, CA, June 11, 2019 Learning from Logged Data Pull log data e.g., user i
593 views • 10 slides
Supporting Client Culture Webinar in Two Parts Webinar Part Two: From Victims to Heroes Many consumer leaders have brought their wisdom to this training. This training was compiled and developed by the California Association of Mental Health
1.32k views • 69 slides
NetworkSecurityandApplied CryptographyLaboratory http://crypto.cs.stonybrook.edu UsablePIR NDSS'08,SanDiego,CA PeterWilliams email@example.com RaduSion firstname.lastname@example.org
437 views • 22 slides
Synthesis by Quantifier Instantiation in CVC4 Andrew Reynolds May 4, 2015 Overview SMT solvers : how they work Synthesis Problem : f. x. P( f, x ) There exists a function f such that for all x, P( f, x ) New approaches for
587 views • 54 slides
SMT-LIB 3: Bringing higher-order logic to SMT Clark Barrett Pascal Fontaine Cesare Tinelli Stanford University, USA The University of Iowa, USA Universit e de Lorraine, CNRS, Inria, LORIA, France 1/22 Disclaimer Many things here are
522 views • 31 slides
Challenges for Fast Synthesis Procedures in SMT Andrew Reynolds ARCADE Workshop August 6, 2017 Synthesis SMT solvers act as subroutines for automated synthesis For program snippets, planning, digital circuits, programming by examples,
657 views • 46 slides
Spin characters and enumeration of maps " : :* : : " : " exact formulas " for asymptotic problems spin linear PCV ) V linear projective space - space - iii. if linear representation : :
426 views • 26 slides
Alexandra Schladebeck, BREDEX GmbH @alex_schl AU AUTOMAT ATED UI TESTING WITH JUBU BULA INDIVIDUELLE SOFTWARE INDIVIDUELLE SOFTWARE INTRODUCTIONS I am a tester, team leader, product owner, Bredex is a software development
314 views • 18 slides
Stephens Fall Investment Conference November 8, 2016 Marta R. Stewart Executive Vice President Finance and Chief Financial Officer 1 Forward-Looking Statements Certain statements in this presentation are forward-looking statements within the
471 views • 11 slides
Sunrise or Sunset: Exploring the Design Space of Big Data Software Stacks HPBDC 2017 3rd IEEE International Workshop on High-Performance Big Data Computing May 29, 2017 email@example.com http://www.dsc.soic.indiana.edu/,
725 views • 13 slides
CHAPTER 16: ARGUING Multiagent Systems http://www.csc.liv.ac.uk/mjw/pubs/imas/ Chapter 16 An Introduction to Multiagent Systems 2e Argumentation Argumentation is the process of attempting to agree about what to believe. Only a
806 views • 32 slides