We Weakly-supe supervise sed d Vid Video eo Rec ecogn gnitio - PowerPoint PPT Presentation

We Weakly-supe supervise sed d Vid Video eo Rec ecogn gnitio ition Pa Pascal Mettes University of Amsterdam

Video recognition pipeline Learn video labels from spatio-temporal inputs. Do Double scale challenge: : Order(s) of magnitude more inputs, order(s) of magnitude fewer training samples. [Feichtenhofer et al. NeurIPS 2016] 04-04-2019 Weakly-supervised action recognition 1

Spatio-Temporal Action Localization Discover what , when , and where actions occur in videos. Diving Skateboarding 06-02-2019 Understanding Actions with Minimal Supervision 2

The annotation burden of localization Same double burden as video recognition in general. Additional burden from box annotation in each video frame. How can we learn action locations without all these burdens? 04-04-2019 Weakly-supervised action recognition 3

Part I Pa To Towards action lo localiz alizatio tion from video video labels labels

Spatio-Temporal Proposals At test time, actions can be anywhere in a video. Split videos into action tubes, such that at least one tube matches with each action. Apply model to all proposals and select the best ones during testing. Illustration courtesy of Jan van Gemert 06-02-2019 Understanding Actions with Minimal Supervision 5

Pointly-Supervised Action Localization Go Goal: al: Alleviate the need for expensive bounding box annotations per frame. Hy Hypoth othesi esis: s: Spatio-temporal proposals can be used as training example, if guided properly with minimal extra supervision. P. P. Mettes , J.C. van Gemert, and C.G.M. Snoek, “Spot On: Action Localization from Pointly-Supervised Proposals”, ECCV, 2016. P. P. Mettes and C.G.M. Snoek, “Pointly-Supervised Action Localization”, IJCV, 2018 (in press). 06-02-2019 Understanding Actions with Minimal Supervision 6

Multiple Instance Learning negative bags positive bags negative positive Traditional Multiple-instance supervised learning learning [Dietterich et al. 1997] 04-04-2019 Weakly-supervised action recognition 7

Multiple Instance Learning positive positive bags bags negative negative bags bags Compute optimal hyper-plane with Re-weight positive instances sparse MIL [Slide by Vijayanarasimhan and Grauman] 04-04-2019 Weakly-supervised action recognition 8

Proposed approach Proposal mining Point supervision Proposal scoring Annotate actions simply by pointing on action centers. Use point supervision to help find the best proposals to train on. 06-02-2019 Understanding Actions with Minimal Supervision 9

Multiple Instance Learning with priors Iteratively learn to discriminate actions ( likelihood ) and learn the spatio-temporal extent of actions ( point priors ). 06-02-2019 Understanding Actions with Minimal Supervision 10

Point and proposal overlap We introduce a new overlap measure between points and proposals. Proposals should be small and their centers should match with points. < < 06-02-2019 Understanding Actions with Minimal Supervision 11

Experiments Da Datasets UCF-101 UCF Sports J-HMDB Hollywood2Tubes Ev Evaluation Rank detections based on classifier score. Positive if correct class and enough overlap. Results reported in (mean) Average Precision. 06-02-2019 Understanding Actions with Minimal Supervision 12

Can we train on spatio-temporal proposals? UCF Sports Trained on box annotations Trained on box annotations Trained on best proposal Trained on best proposal Trained on point annotations Tr Training on our approach with points is as effective as training on boxes. 06-02-2019 Understanding Actions with Minimal Supervision 13

How much point supervision is required? UCF Sports Sim Similar ilar performan Points are 15 times faster to annotate than boxes. Po ance at 50-100x s 100x speed eed-up up with h fewer anno nnotations ns. 06-02-2019 Understanding Actions with Minimal Supervision 14

Discovered actions Actions from box supervision Actions from point supervision Sim Similar ilar performan ance, dif ifferent behavio vior. 06-02-2019 Understanding Actions with Minimal Supervision 15

Replacing manual point supervision Ne New hypothesis: Point supervision can be replaced with automatic visual cues that correlate with the action location. P. P. Mettes , C.G.M. Snoek, and S.F. Chang, “Localizing Actions from Video Labels and Pseudo-Annotations”, BMVC, 2017. 06-02-2019 Understanding Actions with Minimal Supervision 16

Pseudo-annotations Action Ac Center b Ce on p Object Ob prop opos cts bias osals Actions occur where proposals agree. Actions occur where objects occur. Actions are central. Zitnick and Dollar, “Edge boxes: Locating object proposals from edges”, ECCV, 2014. Van Gemert et al. “APT: Action localization proposals from dense trajectories, Tseng et al. “Quantifying bias of observers in free viewing of dynamic natural scenes”, JOV, 2004. BMVC, 2015. Indep In Person epen enden on d detection ent motion on Actions occurs where motion deviates from global motion. Action location correlates with the actor location. Ren et al. “Faster R-CNN: Towards real-time object detection with region proposal networks”, NeurIPS, 2015. Jain et al. “Action localization with tubelets from motion”, CVPR, 2014. 06-02-2019 Understanding Actions with Minimal Supervision 17

Individual pseudo-annotation performance Su Supervis visio ion bounds: UCF Sports Upper Full box supervision. Lower Video labels, no priors. Ax Axes: X-axis Minimal overlap threshold for positives. Y-axis Localization performance 06-02-2019 Understanding Actions with Minimal Supervision 18

Individual pseudo-annotation performance All pseudo-annotations are informative. UCF Sports Mo Most helpful: Person detection and motions. Le Least helpful: Center bias and objects. 06-02-2019 Understanding Actions with Minimal Supervision 19

Combining pseudo-annotations We We rank pseudo-an annotatio ions bas ased on correla latio ion to person detectio ion. Performance akin to fu Co Correlation matches with individual performance. full box supervision using g top 3 pseudo-an annotatio ions. 06-02-2019 Understanding Actions with Minimal Supervision 20

On point- and pseudo-annotations The good Th In two papers from box supervision to video label supervision only. Kalogeiton et al. ICCV, 2017. Th The bad Performance gap to state-of-the-art is increasing; spatio-temporal Singh et al. ICCV, 2017. proposals are a limiting factor. State-of St of-th the-ar art appr approac ach: h: Tr Train at box-le level, l, link link bo boxes over tim ime 06-02-2019 Understanding Actions with Minimal Supervision 21

New goal in weakly-supervised localization Localize actions spatio-temporally from video labels. Now, directly from boxes, instead of spatio-temporal proposals. P. P. Mettes and C.G.M. Snoek, “Spatio-Temporal Instance Learning: Action Tubes from Class Supervision”, arXiv, 2019. 06-02-2019 Understanding Actions with Minimal Supervision 22

Spatio-Temporal Instance Learning An action consists of a set of boxes; MIL no longer directly applicable. We propose a new instance learning for actions: Co Conditi tion 1: Each positive video contains at least one positive action instance, which can occur in precisely one tube. Co Conditi tion 2: For each positive video V , the positive action instance is a set of connected boxes of minimal length 1 and maximal length F V , where F V denotes the total video length. tion 3: For each negative video, all tubes and boxes are negative. Co Conditi 06-02-2019 Understanding Actions with Minimal Supervision 23

Spatio-Temporal Instance Learning Max margin optimization Ne New obj bjectiv ive and and optimiz imizatio ion with latent variables No more positive box proposals than number of frames in the video All boxes from one tube Boxes are connected over time 04-04-2019 Weakly-supervised action recognition 24

STIL on boxes versus MIL on tubes St State-of of-th the-ar art result lts, especially ially at hig igh overlap lap threshold lds. 06-02-2019 Understanding Actions with Minimal Supervision 25

Capturing spatio-temporal action extent Ac Acti tion on extent t during g tr training g uncovered for or lon ong g acti tion ons, , prob oblemati tic for or shor ort t on ones. At At test time, success in single-ac actio ion vid ideos, confu fusio ion in in mult lti-ac actio ion vid ideos. 06-02-2019 Understanding Actions with Minimal Supervision 26

Conclusions on action localization Action localization with box annotations is does not scale to large settings. We can adapt Multiple Instance Learning to help solve this problem. Gap to fully supervised approaches still large, remains an open problem. 06-02-2019 Understanding Actions with Minimal Supervision 27

Pa Part II Ac Action localization an and rec ecognitio ition wi witho hout ut exampl ples es

Zero-shot recognition [Slide by Zeynep Akata] 04-04-2019 Weakly-supervised action recognition 29

On objects and actions An action involves an actor trying to act on its environment. Objects serve as tools to make changes . . To what extent can we recognize and localize actions only from objects? 06-02-2019 Understanding Actions with Minimal Supervision 30

We Weakly-supe supervise sed d Vid Video eo Rec ecogn gnitio - PowerPoint PPT Presentation

We Weakly-supe supervise sed d Vid Video eo Rec ecogn gnitio ition Pa Pascal Mettes University of Amsterdam Video recognition pipeline Learn video labels from spatio-temporal inputs. Do Double scale challenge: : Order(s) of

E1-24a King Den at his Sed Festival ( heb-sed ) E1-40 Djoser (Zoser) at his Sed Festival

STAT 605 Data Science Computing Introduction to sed and awk Editing text streams: sed sed is short

CIS 218 stream editor (sed) CIS 218 Advanced UNIX 1 sed Uses same syntax as vi Batch

Performance and Professional Growth System t Staff Non- Bar gaining Suppor For use by Supe r

Se Semi-Su Supervise sed QA with Ge Generative Do Doma main-Ad Adaptive N e Nets C arnegie

free 18-May-17 Towards Weakly Supervised Image Understanding 1/50 Towards Weakly Supervised

ADA and FMLA D a vid S. D e n to n Pa r tn e r d a vid @ b r o w n foxla w.c om Covered

CI T I ZE N SUPE RF UND WORK GROUP -AN UPDAT E T y Churc hwe ll T ro ut Unlimite d

Searches for New Light Weakly Coupled Particles around DESY Intensity Frontier Workshop IF5:

Universal homogeneous constraint structures and the hom-equivalence classes of weakly

Weakly Supervised Classification Weakly Supervised Classification and Robust Learning and Robust

Automatic Face Recognition in Weakly Constrained Environments Fabien Cardinaux cardinau@idiap.ch

Presentation Outline SED/ED in the REI4P Local Community Ownership Issues for

It iz tiem too gow hoam sed v kator pilla. But iy doat wont 2 gow howm sed th butt or flie. Iy

The ARCHES SED archive of 3XMM sources by Mauro Lpez (mauro@cab.inta- csic.es) Spanish

Session 3: Vim P . S. Langeslag 1 November 2018 sed Replacement Operation Effect Replace all

Exact Collision Detection Recent Developments and Applications Dan Halperin danha@tau.ac.il Tel

Game-Theoretic Semantics for Alternating- Time Temporal Logic Valentin Goranko Stockholm

Quench calculations calculations for for the the Super Super- -FRS FRS quadrupoles

Cyber@UC Meeting 53 Cross-Site Request Forgery If Youre New! Join our Slack:

Graphical Interface Object Oriented Programming Marco Chiarandini Department of Mathematics

ADVANCED DATABASE SYSTEMS Transaction Models & Concurrency Control @ Andy_Pavlo // 15-

15-721 DATABASE SYSTEMS [Source] Lecture #03 Concurrency Control Part I Andy Pavlo / /

Transactions Concurrent execution of user programs is essential for good DBMS performance.

Sambuz

Useful Links

Newsletter

Mail Us

We Weakly-supe supervise sed d Vid Video eo Rec ecogn gnitio - PowerPoint PPT Presentation

We Weakly-supe supervise sed d Vid Video eo Rec ecogn gnitio ition Pa Pascal Mettes University of Amsterdam Video recognition pipeline Learn video labels from spatio-temporal inputs. Do Double scale challenge: : Order(s) of

E1-24a King Den at his Sed Festival ( heb-sed ) E1-40 Djoser (Zoser) at his Sed Festival

STAT 605 Data Science Computing Introduction to sed and awk Editing text streams: sed sed is short

CIS 218 stream editor (sed) CIS 218 Advanced UNIX 1 sed Uses same syntax as vi Batch

Performance and Professional Growth System t Staff Non- Bar gaining Suppor For use by Supe r

Se Semi-Su Supervise sed QA with Ge Generative Do Doma main-Ad Adaptive N e Nets C arnegie

free 18-May-17 Towards Weakly Supervised Image Understanding 1/50 Towards Weakly Supervised

ADA and FMLA D a vid S. D e n to n Pa r tn e r d a vid @ b r o w n foxla w.c om Covered

CI T I ZE N SUPE RF UND WORK GROUP -AN UPDAT E T y Churc hwe ll T ro ut Unlimite d

Searches for New Light Weakly Coupled Particles around DESY Intensity Frontier Workshop IF5:

Universal homogeneous constraint structures and the hom-equivalence classes of weakly

Weakly Supervised Classification Weakly Supervised Classification and Robust Learning and Robust

Automatic Face Recognition in Weakly Constrained Environments Fabien Cardinaux cardinau@idiap.ch

Presentation Outline SED/ED in the REI4P Local Community Ownership Issues for

It iz tiem too gow hoam sed v kator pilla. But iy doat wont 2 gow howm sed th butt or flie. Iy

The ARCHES SED archive of 3XMM sources by Mauro Lpez (mauro@cab.inta- csic.es) Spanish

Session 3: Vim P . S. Langeslag 1 November 2018 sed Replacement Operation Effect Replace all

Exact Collision Detection Recent Developments and Applications Dan Halperin danha@tau.ac.il Tel

Game-Theoretic Semantics for Alternating- Time Temporal Logic Valentin Goranko Stockholm

Quench calculations calculations for for the the Super Super- -FRS FRS quadrupoles

Cyber@UC Meeting 53 Cross-Site Request Forgery If Youre New! Join our Slack:

Graphical Interface Object Oriented Programming Marco Chiarandini Department of Mathematics

ADVANCED DATABASE SYSTEMS Transaction Models &amp; Concurrency Control @ Andy_Pavlo // 15-

15-721 DATABASE SYSTEMS [Source] Lecture #03 Concurrency Control Part I Andy Pavlo / /

Transactions Concurrent execution of user programs is essential for good DBMS performance.

Sambuz

Useful Links

Newsletter

Mail Us

ADVANCED DATABASE SYSTEMS Transaction Models & Concurrency Control @ Andy_Pavlo // 15-