Event Recognition by Learning Amir Habibian Qualcomm Research, - PowerPoint PPT Presentation

Event Recognition by Learning Amir Habibian Qualcomm Research, Amsterdam 27 Feb 2017 1

What is an event? Interaction of people and objects under a certain scene Event Object People Examples Actio Scene n • Personal events: marriage proposal, grooming an animal • Traffic events: accident, traffic jam • Security events: breaking a lock, leaving a bag unattended Event: Winning a race without a vehicle 2

Why event recognition is hard? Large variation in examples (semantic variance) • Depending on the context, may involve various objects, actions and scenes Event: Feeding an animal Limited number of training examples • More specific than individual object, action, and scenes 3

Video representations for event recognition Neither shallow BoW nor deep learned representations fit well • BoW are not discriminative enough to handle the large variations • Not enough training examples to train a deep neural network SOTA rely on pre-trained semantic encoders to represent videos Semantic representation Event: Making a sandwich Non-semantic representation 4

Video representations for event recognition Handcrafted Learned Early work Non-semantic Semantic Research trend 5

Non-semantic representation (handcrafted) Aggregation of handcrafted descriptors over video Extracting Quantizing Decoding video descriptors descriptors Appearance Bag-of-words - SIFT, GIST, VLAD … Fisher vector Motion - HOF, MBH, … [Jiang et al., TRECVID 2010] [Natarajan et al., CVPR 2012] [Wang et al., ICCV 2013] and many others 6

Non-semantic representation (learned) Aggregation of CNN descriptors over video Extracting Video pooling Decoding video CNN descriptors Trained on Averaging images Fisher vector VGG - Inception VLAD More effective and efficient compared to the handcrafted [Xu et al., CVPR 2015] [Nagel et al., BMVC 2015] 7

Video representations for event recognition Handcrafted Learned Non-semantic Semantic 8

Semantic representation (handcrafted) Handcraft a vocabulary of concept detectors 9

Handcrafting concept vocabulary The vocabulary is created in three steps: 1. Identifying the concepts to be included in the vocabulary 2. Providing training examples per concept 3. Training concept classifiers Involves lots of annotation effort • To identify which concepts to include • To provide training examples per concept 10

Handcrafted vocabulary Key questions • How many concepts to include in the vocabulary? • How accurate should the concept detectors be? • What concept types to include in the vocabulary? • Which concepts to include in the vocabulary? • ... A. Habibian, K. van de Sande, and C. Snoek, ICMR’13 A. Habibian and C. Snoek, CVIU’14 11

Quantity vs Quality Impact of concept detector accuracies on event recognition Impose noise on concept detector predictions 0.35 Vocabulary Size = 50 Vocabulary Size = 100 Vocabulary Size = 200 0.3 Vocabulary Size = 300 Vocabulary Size = 500 Vocabulary Size = 1346 0.25 Mean Average Precision 0.2 0.15 0.1 0.05 0 0 10 20 30 40 50 60 70 80 90 100 Imposed Detection Noise (in %) 12

Quantity vs Quality Impact of concept detector accuracies on event recognition Impose noise on concept detector predictions 0.35 Vocabulary Size = 50 Vocabulary Size = 100 Vocabulary Size = 200 0.3 Vocabulary Size = 300 Vocabulary Size = 500 Vocabulary Size = 1346 0.25 Mean Average Precision 0.2 0.15 0.1 0.05 0 0 10 20 30 40 50 60 70 80 90 100 Imposed Detection Noise (in %) Make the vocabulary larger rather than more accurate 13

Conclusion Comprehensive set of concepts from various types are needed It requires lots of annotation effort … 14

Label composition trick Expanding the labels by logical operations • AND, OR, … A. Habibian, T. Mensink, and C. Snoek, ICMR’14 15

Label composition trick Expanding the labels by logical operations • AND, OR, … … 16

Motivation Expanding the vocabulary for free Composite concepts can be easier to detect • boat-AND-sea • bear-AND-cage • man-OR-woman Composite concepts can be more indicative of the event • bike-AND-ride for attempting a bike trick 17

Learning composite concepts For a vocabulary of n concepts, there are B n disjoint compositions • Bell number: • Not all of them are useful Which concepts should be composed together? • NP-hard problem, equivalent to set-partitioning • Approximated by a greedy search algorithm 18

Qualitative results Top ranked videos for flash mob gathering Most dominant concepts in the video representation 19

Conclusion More comprehensive vocabulary by composing the concepts Still grounded on the handcrafted concepts … 20

Video representations for event recognition Handcrafted Learned Non-semantic Semantic 21

Discovering concepts from the web [Wu et. al. CVPR’14] [Chen et al., ICMR’14] 22

Video2Vec embedding Learn the mutual underlying subspace between videos and descriptions Videos Descriptions A woman folds and packages a scarf she has made. A woman points out bones on a skeleton for lab practical for an anatomy class. A mother at a fountain tries to get her daughter to step on the water … jets. … Semantic space A. Habibian, T. Mensink, and C. Snoek, PAMI, In press 23

Autoencoder Learn a compact representation by which the input could be reconstructed • Codes as data representation Autoencoder for visual data: Codes Encoder Decoder Autoencoder for textual data: Crazy guy doing Crazy guy doing Codes Encoder Decoder insane stunts on insane stunts on bike. bike. 24

Video2Vec embedding Reconstruct the other view of data • Reconstruct the textual view from visual view Crazy guy doing Codes insane stunts on Encoder Decoder bike. • Reconstruct the visual view from textual view : Codes Crazy guy doing Encoder Decoder insane stunts on bike. 25

Video2Vec embedding Reconstruct the other view of data • Reconstruct the textual view from visual view - ℒ 𝑧, 𝑧 ) = 𝑧 " − 𝐵 𝑋 𝑦 " • W: encodes visual features into codes • A: decodes codes into textual features 𝑦 " 𝑧 " 𝑋 𝐵 𝑡 " Crazy guy doing Codes insane stunts on Encoder Decoder bike. 26

Multimodal encoding Train a different encoder to encode every video channel • Appearance, Motion, and audio Share the codes to enforce the common structures across modalities • Acts as a regularizer Appearance Crazy guy doing Codes Encoder insane stunts on Decoder motion Motion bike. Audio 27

Multimodal encoding Visualizing the decoder (A) as A x A T Unimodal encoder Unimodal encoder Unimodal encoder Multimodal encoder (Appearance) (Motion) (Audio) The multimodal encoder better learns the semantic relations 28

Impact of multimodal encoding Joint encoding of multiple modalities lead to a better representation 29

Task specific decoding Autoencoders rely on ℓ - loss to measure reconstruction error: ) - ℒ 𝑧, 𝑧 ) = 𝑧 − 𝑧 The error in reconstructing all of the words are treated equally We replace the ℓ - loss with: )) - ℒ 𝑧, 𝑧 ) = 𝐼 0 (𝑧 − 𝑧 H t is a diagonal matrix determining the importance of each word per task 30

Task specific decoding Middle: standard decoder Bottom: task specific decoder 31

Impact of event specific decoding Event specific decoding lead to a better representation • For the both unimodal and multimodal encoders Zero-shot event recognition 32

Event recognition with video examples 1. Train the embedding on a collection of videos and their descriptions − Videos and their captions downloaded from YouTube 2. Use the trained embedding to encode event videos 3. Train and use the event classifier on the encoded representations − SVM 33

Event recognition without video examples Term extraction Event description Term Vector Text Matching Video2Vec Test videos Term Vector 34

Applications 35

Application 1: Cross-modal retrieval Represent the all modalities in a mutual semantic space Speech Images Text Videos A. Habibian, T. Mensink, and C. Snoek, ICMR’15 36

Application 1: Cross-modal retrieval A. Habibian and C. Snoek, MM’13 37

Application 1: Cross-modal retrieval A. Habibian and C. Snoek, MM’13 38

Application 2: On-the-fly event search Efficiency • Representing videos by a compact set of concepts Few exemplars • Transfer learning from vocabulary training examples Recounting • Interpretable video representation A. Habibian, M. Mazloom, and C. Snoek, ICMR’14 M. Mazloom, A. Habibian, and C.Snoek, MM’13 39

Application 2: On-the-fly event search 40

Application 3: Video summarization Localizing the event over time by following its concepts Summarizing long videos, i.e. GoPro footages Changing a vehicle tire M.Mazloom, A. Habibian and C. Snoek, ICMR’15 43

Thanks ! habibian.a.h@gmail.com 44

Event Recognition by Learning Amir Habibian Qualcomm Research, - PowerPoint PPT Presentation

Event Recognition by Learning Amir Habibian Qualcomm Research, Amsterdam 27 Feb 2017 1 What is an event? Interaction of people and objects under a certain scene Event Object People Examples Actio Scene n Personal events: marriage

Events Event-driven programming Event loop Event dispatch Event handling Event Driven

Events Event-driven programming Event loop Event dispatch Event handling Event Driven

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

More Event Combinators CML provides two more event combinators: guard and withNack : val guard :

EMPLOYEE RECOGNITION OBJECTIVES Types of recognition Creating a culture of recognition

License Plate Recognition License Plate Recognition License Plate Recognition License Plate

Instance-level Recognition Pingmei Xu Object Recognition Friends SE01EP02 Recognition: Find the

Face detection and recognition Detection Recognition Sally Face detection &

Ch 11. Event Cognition Seminar on Event Cognition Summary of Event Cognition Event

Learning for Action Recognition Yemin Shi shiyemin@pku.edu.cn 2018-03 1 Background Action

Event Extraction Event Template for Terrorist Acts OUTPUT: filled event INPUT: document

The The Algae Event The The Algae Event Algae Event Algae Event

RSO Event Planning 7 Steps to a Successful Event Why Plan an Event? Event planning is a great

Donor Recognition NPS ~ Donor Recognition Donor recognition is an important and critical for

Speaker Recognition and Speaker Recognition and the ETSI Standard the ETSI Standard Distributed

Southeast Community College Operationalize 2 year public institution Resources

Interwoven Generations: Our Shared Calling Balancing the need for pastoral care, varying

MA111: Contemporary mathematics Entrance Slip (due 5 min past the hour): 100 people are preparing

* * Perdido Key Chamber of Commerce is excited to offer a new MVP Program exclusively to our

Towards Characterizing Cities with Social Media Images Daniela Opitz Universidad del Desarrollo

Containers At Scale At Google, the Google Cloud Platform and Beyond Joe Beda jbeda@google.com

Minemu The world's fastest taint tracker Attack detection aimed at production environments. Erik

q f al y't ni XI IE EI Hn Gi ai x ex p E fo Uj f l t't y't X l T f n y e X n t Y e l

Event Recognition by Learning Amir Habibian Qualcomm Research, - PowerPoint PPT Presentation

Event Recognition by Learning Amir Habibian Qualcomm Research, Amsterdam 27 Feb 2017 1 What is an event? Interaction of people and objects under a certain scene Event Object People Examples Actio Scene n Personal events: marriage

Events Event-driven programming Event loop Event dispatch Event handling Event Driven

Events Event-driven programming Event loop Event dispatch Event handling Event Driven

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

More Event Combinators CML provides two more event combinators: guard and withNack : val guard :

EMPLOYEE RECOGNITION OBJECTIVES Types of recognition Creating a culture of recognition

License Plate Recognition License Plate Recognition License Plate Recognition License Plate

Instance-level Recognition Pingmei Xu Object Recognition Friends SE01EP02 Recognition: Find the

Face detection and recognition Detection Recognition Sally Face detection &amp;

Ch 11. Event Cognition Seminar on Event Cognition Summary of Event Cognition Event

Learning for Action Recognition Yemin Shi shiyemin@pku.edu.cn 2018-03 1 Background Action

Event Extraction Event Template for Terrorist Acts OUTPUT: filled event INPUT: document

The The Algae Event The The Algae Event Algae Event Algae Event

RSO Event Planning 7 Steps to a Successful Event Why Plan an Event? Event planning is a great

Donor Recognition NPS ~ Donor Recognition Donor recognition is an important and critical for

Speaker Recognition and Speaker Recognition and the ETSI Standard the ETSI Standard Distributed

Southeast Community College Operationalize 2 year public institution Resources

Interwoven Generations: Our Shared Calling Balancing the need for pastoral care, varying

MA111: Contemporary mathematics Entrance Slip (due 5 min past the hour): 100 people are preparing

* * Perdido Key Chamber of Commerce is excited to offer a new MVP Program exclusively to our

Towards Characterizing Cities with Social Media Images Daniela Opitz Universidad del Desarrollo

Containers At Scale At Google, the Google Cloud Platform and Beyond Joe Beda jbeda@google.com

Minemu The world's fastest taint tracker Attack detection aimed at production environments. Erik

q f al y't ni XI IE EI Hn Gi ai x ex p E fo Uj f l t't y't X l T f n y e X n t Y e l

Face detection and recognition Detection Recognition Sally Face detection &