Learning from actions Temporal structures for human action - PowerPoint PPT Presentation

Learning from actions Temporal structures for human action recognition Hilde Kuehne Computer Vision Group, Prof. Juergen Gall, Institute of Computer Science III Deep Learning for Computer Vision Dagstuhl Seminar 1739

Overview • Motivation: • Sequence models for activity recognition • Weak action learning • Weak learning of sequential data • Weak learning with CNNs/RNNs • Outlook: • Current projects – Learning of unordered action sets – Mining Youtube Deep Learning for Computer Vision Dagstuhl Seminar 1739 Insti tute of Computer Science III – Computer Vision Group 17.09.2017 2

Why activity recognition? SFB 588 – Humanoid Robot Armar III Human machine interaction – e.g. robotics, services e.g. assisted living, entertainment … Video transcription, movie labeling and indexing Surveillance – Who does what when? HMDB51 [Kuehne2011] Scientific studies – e.g. behavior and motion analysis, sport science … Project AutoTIP - GoHuman Deep Learning for Computer Vision Dagstuhl Seminar 1739 Insti tute of Computer Science III – Computer Vision Group 17.09.2017 3

Activity recognition - Problem Statement Action recognition • … means (usually) one label per image / per clip  Doesn‘t work for complex activity Weizmann • One image may not be enough for reliable recognition [Blank2005] • One label per video can be too coarse • Look for a representation that captures the structure of complex action sequences: Pascal [Everingham2010] – human actions as time series (robotics) – models of complex relations between entities (speech) Problem: Find representations that fit the structure of human actions BKT [Kuehne2012] Deep Learning for Computer Vision Dagstuhl Seminar 1739 Insti tute of Computer Science III – Computer Vision Group 17.09.2017 4

Action primitives Action (primitives) / (units) : • A motion that is performed continuously and without interruption. • The smallest entity, which order can be changed during the execution. • Complex tasks, e.g. in the household domain, consist of concatenated action primitives • An action primitive usually is made up of a set of motion phases: start → preparation → main action → finalize → adjust (Energy compensation - Preparation for End position Start position following action) Deep Learning for Computer Vision Dagstuhl Seminar 1739 Insti tute of Computer Science III – Computer Vision Group 17.09.2017 5

Action Grammar • All tasks, as long as they have a meaningful aim, are executed in a certain order. • The order in which the tasks are executed is not random • It is possible to formulate a grammar, which has to be followed. •  The action grammar defines the action sequences, which are a concatenation of action primitives that result in a meaningful task. Deep Learning for Computer Vision Dagstuhl Seminar 1739 Insti tute of Computer Science III – Computer Vision Group 17.09.2017 6

Time based modeling 1. Action units: Linear n-state model e.g. action unit: „picking a bowl“ transition states s 1 : move hand towards the bowl s 2 s 3 s 1 s 2 : grasp the bowl s 3 : take bowl to target position state [Gehrig2008] 2. Activitiy: Context-free grammar picking_bowl idle_position idle_position picking_bottle idle_position idle_position picking_bottle idle_position [Gehrig2008] Deep Learning for Computer Vision Dagstuhl Seminar 1739 Insti tute of Computer Science III – Computer Vision Group 17.09.2017 7

Modelling of action units • The task of recognizing an action unit is defined by the best match of the input sequence x  x x , x , x ... x 1 2 3 T • with x i representing the feature vector at frame I, to a set of action units u  u u , u , u .. u 1 2 3 N • Corresponding to maximizing the probability of an action unit u i given the input sequence x P ( x | u ) P ( u )  i i arg max P ( u | x ) i P ( x ) Deep Learning for Computer Vision Dagstuhl Seminar 1739 Insti tute of Computer Science III – Computer Vision Group 17.09.2017 8

Modelling of action units The joint probability of the model M ui moving through the state sequence S x can be calculated as the product of transition probabilities and observation probabilities given the input x: Deep Learning for Computer Vision Dagstuhl Seminar 1739 Insti tute of Computer Science III – Computer Vision Group 17.09.2017 9

Modelling of action sequences Action sequences are realized as a concatenation of action units : - Computation of probabilities with a combination of Viterbi and pruning - Can include grammar specification Deep Learning for Computer Vision Dagstuhl Seminar 1739 Insti tute of Computer Science III – Computer Vision Group 17.09.2017 10

Practical realization Recognition of action units: • on the level of action units = a n-state left-to-right HMM • state = two equally likely transitions, one to the current state, and one to the next state • number of states = adaptive to mean unit length • initialization = equal distribution of samples Recognition of sequences: • action sequences = defined by a context free grammar • build by automatic parsing of labels or definition by hand Describes stirring, mashing and pouring. Deep Learning for Computer Vision Dagstuhl Seminar 1739 Insti tute of Computer Science III – Computer Vision Group 17.09.2017 11

Properties #!MLF!# "bend.lab" 0 3800000 bend_down Implicit segmentation 3700000 6200000 bend_up . – Output sequence contains semantic and "jack.lab" temporal information in addition to the general 0 2800000 jack 2700000 5000000 jack label . "pjump.lab" 0 2300000 pjump 2200000 4100000 pjump Continuous recognition Ground truth. – Hypothesizes are based on beams of (theoretically) unlimited length #!MLF!# "bend.rec" 0 3700000 bend_down 45358.023438 3700000 6200000 bend_up 35816.691406 Temporal variations are handled by HMMs: . "jack.rec" – Temporal flexibility without need for more 0 1700000 jack 6247.286621 1700000 2700000 jack -544.383606 training samples 2700000 5000000 jack 10465.790039 . – Only constrained by number of states "pjump.rec" – 0 1400000 pjump 11971.578125 Handel large variations 1400000 2800000 pjump 15659.549805 2800000 4100000 pjump -25356.494141 . Result Deep Learning for Computer Vision Dagstuhl Seminar 1739 Insti tute of Computer Science III – Computer Vision Group 17.09.2017 12

Example Deep Learning for Computer Vision Dagstuhl Seminar 1739 Insti tute of Computer Science III – Computer Vision Group 17.09.2017 13

Example Deep Learning for Computer Vision Dagstuhl Seminar 1739 Insti tute of Computer Science III – Computer Vision Group 17.09.2017 14

Weak learning of sequential data Segmentation from video input + transcripts Idea: Given: • sequences of input data • transcripts, i.e. a list of the order the actions occur in the videos +  infer the scripted actions and train the related Take cup, Pour milk, Stir coffee , ….. action models without any boundary information Usually applied for training of ASR systems Pour coffee Pour milk Stir coffee - lots of training data: (e.g. TIMIT: ~6300 sentences * ~8.2 words per sentence * ~3.9 …. phones per word ≈ 201474 phone samples , Breakfast: ~ 11000) - well defined vocabulary - low signal variance Deep Learning for Computer Vision Dagstuhl Seminar 1739 Insti tute of Computer Science III – Computer Vision Group 17.09.2017 15

Segment Annotation vs. Transcript Annotation Full segmented annotation requires the start and end frames for each action: transcript annotations contain only the actions within a video and the order in which they occur: Cost of the different annotation techniques: Annotators label both types on 11 videos (making coffee ) with 7 possible actions • Full segmented annotation: real-time factor 3.85 ( = 3.85 x video duration) • Transcript annotations : real-time factor is 1.36  about a third of the time compared to a full annotation Deep Learning for Computer Vision Dagstuhl Seminar 1739 Insti tute of Computer Science III – Computer Vision Group 17.09.2017 16

Video Segmentation given the Action Transcripts Given the action transcripts, a large sequence-HMM can be build that is a concatenation of the HMMs for each action class in the order they occur in the transcript for this sequence. Video segmentation: finding the best alignment of video frames to the sequence-HMM (e.g. Viterbi algorithm) Deep Learning for Computer Vision Dagstuhl Seminar 1739 Insti tute of Computer Science III – Computer Vision Group 17.09.2017 17

System overview Deep Learning for Computer Vision Dagstuhl Seminar 1739 Insti tute of Computer Science III – Computer Vision Group 17.09.2017 18

Example Example for segmentation during the training iterations: Deep Learning for Computer Vision Dagstuhl Seminar 1739 Insti tute of Computer Science III – Computer Vision Group 17.09.2017 19

Alignment vs. Segmentation Alignment: Segmentation: Video + Transcript given Video given Result: Result: Segment boundaries Action classes +Segment boundaries Deep Learning for Computer Vision Dagstuhl Seminar 1739 Insti tute of Computer Science III – Computer Vision Group 17.09.2017 20

Learning from actions Temporal structures for human action - PowerPoint PPT Presentation

Learning from actions Temporal structures for human action recognition Hilde Kuehne Computer Vision Group, Prof. Juergen Gall, Institute of Computer Science III Deep Learning for Computer Vision Dagstuhl Seminar 1739 Overview Motivation:

Actions of Compact Quantum Groups V Free and homogeneous actions I Kenny De Commer (VUB,

Actions of Compact Quantum Groups III Reduced and universal actions Kenny De Commer (VUB,

Civil Actions Civil Actions Civil Actions Lesson No. 13 ENV H 471 Environmental Health

Sug Suggested Actions sted Actions on W on WebSAMS bSAMS Suggested Actions Keep Latest

Actions of Compact Quantum Groups I Definition Kenny De Commer (VUB, Brussels, Belgium) CQG

Actions of Compact Quantum Groups VI Free and homogeneous actions II Kenny De Commer (VUB,

Actions eg. BP check h k Follow up Actions 1. Follow up from the main screen ALL the actions

SFTR Corporate Actions Working Group #4 27 th June 2019 James Langlois SFTR Corporate Actions Q2

Management Actions November 7, 2018 Projects and Management Actions 4 Approach and Objective

WatchKit Actions & Outlets Actions & Outlets App Extension (Interface) (Code) Watch

C*-algebras associated with algebraic actions Joachim Cuntz Abel, August 2015 Topic: Actions by

How to align actions to address climate change limate change How to align actions to address c

Fiber Types, Actions, and Contractions 26a A&P: Muscular System - Fiber Types, Actions, and

ccNS NSO Guidel elines es Rejec ection on Actions and Approval Actions Stephen Deerhake

Class Actions on Data Breach Class Actions on Data Breach and Privacy on the Rise Litigating Class

Trust Barometer Indonesia Results WHAT DETERMINES OUR ACTIONS? ACTIONS Where do you buy your

A P A P A Proposal for Publishing Data A Proposal for Publishing Data l f l f P bli hi P bli

Access To Cloud Computing Challenge And Opportunity T. V. Raman Google

EXPLORING SYNERGY WITH INDUSTRY EXPLORING SYNERGY WITH INDUSTRY Media X X @ Stanford University

portable social networks with microformats social networks pownce.com ma.gnolia.com

ODI Leaning in Tanzania Dr David Tarrant | @davetaz | The Open Data Institute

A Better User Experience for the Web Sebastian Kgler (Open-SLX GmbH) 03.07.2010 | Tampere,

Building Online Services for Borderlands 2 Jimmy Sieben @jimmys Lead Programmer, Gearbox Software

bit.ly/uwctech Sen McHugh | Transformational Technology bit.ly/ SAMMS smc@uwcsea.edu.sg |

Learning from actions Temporal structures for human action - PowerPoint PPT Presentation

Learning from actions Temporal structures for human action recognition Hilde Kuehne Computer Vision Group, Prof. Juergen Gall, Institute of Computer Science III Deep Learning for Computer Vision Dagstuhl Seminar 1739 Overview Motivation:

Actions of Compact Quantum Groups V Free and homogeneous actions I Kenny De Commer (VUB,

Actions of Compact Quantum Groups III Reduced and universal actions Kenny De Commer (VUB,

Civil Actions Civil Actions Civil Actions Lesson No. 13 ENV H 471 Environmental Health

Sug Suggested Actions sted Actions on W on WebSAMS bSAMS Suggested Actions Keep Latest

Actions of Compact Quantum Groups I Definition Kenny De Commer (VUB, Brussels, Belgium) CQG

Actions of Compact Quantum Groups VI Free and homogeneous actions II Kenny De Commer (VUB,

Actions eg. BP check h k Follow up Actions 1. Follow up from the main screen ALL the actions

SFTR Corporate Actions Working Group #4 27 th June 2019 James Langlois SFTR Corporate Actions Q2

Management Actions November 7, 2018 Projects and Management Actions 4 Approach and Objective

WatchKit Actions &amp; Outlets Actions &amp; Outlets App Extension (Interface) (Code) Watch

C*-algebras associated with algebraic actions Joachim Cuntz Abel, August 2015 Topic: Actions by

How to align actions to address climate change limate change How to align actions to address c

Fiber Types, Actions, and Contractions 26a A&amp;P: Muscular System - Fiber Types, Actions, and

ccNS NSO Guidel elines es Rejec ection on Actions and Approval Actions Stephen Deerhake

Class Actions on Data Breach Class Actions on Data Breach and Privacy on the Rise Litigating Class

Trust Barometer Indonesia Results WHAT DETERMINES OUR ACTIONS? ACTIONS Where do you buy your

A P A P A Proposal for Publishing Data A Proposal for Publishing Data l f l f P bli hi P bli

Access To Cloud Computing Challenge And Opportunity T. V. Raman Google

EXPLORING SYNERGY WITH INDUSTRY EXPLORING SYNERGY WITH INDUSTRY Media X X @ Stanford University

portable social networks with microformats social networks pownce.com ma.gnolia.com

ODI Leaning in Tanzania Dr David Tarrant | @davetaz | The Open Data Institute

A Better User Experience for the Web Sebastian Kgler (Open-SLX GmbH) 03.07.2010 | Tampere,

Building Online Services for Borderlands 2 Jimmy Sieben @jimmys Lead Programmer, Gearbox Software

bit.ly/uwctech Sen McHugh | Transformational Technology bit.ly/ SAMMS smc@uwcsea.edu.sg |

WatchKit Actions & Outlets Actions & Outlets App Extension (Interface) (Code) Watch

Fiber Types, Actions, and Contractions 26a A&P: Muscular System - Fiber Types, Actions, and