Impact of Eye Fixated Regions in Visual Action Recognition
Deepak Pathak deepakp@
- Dept. of Computer Science & Eng. ,
IIT Kanpur Mentor : Dr. Amitabha Mukerjee
Impact of Eye Fixated Regions in Visual Action Recognition Mentor - - PowerPoint PPT Presentation
Impact of Eye Fixated Regions in Visual Action Recognition Mentor : Dr. Amitabha Mukerjee Deepak Pathak deepakp@ Dept. of Computer Science & Eng. , IIT Kanpur Introduction Human Action Recognition ( What ?) 150,000 uploads every
Deepak Pathak deepakp@
IIT Kanpur Mentor : Dr. Amitabha Mukerjee
(What ?)
in movies , news etc.
[UCF Sports Action Dataset] 150,000 uploads every day ! [Live Snapshot – earthcam.com]
performance on similar tasks.
[Poggio 2007]
Suggested hierarchical model with simplex features which are in coherence with the ventral stream of visual cortex. [Mathe 2012]
dynamic visual action recognition tasks.
Hollywood-2 and UCF Sports Action Dataset
: Free viewing – 4 subjects : Action Recognition – 12 subjects
sample rate.
saccadic movement of eyes.
Experimental Setup [Mathe 2012]
* Realistic human actions in unconstrained video clips of hollywood movies. * 12 Action Classes * 823 Training Video clips 884 Test Video clips
Get HoG3D descriptor centered at these interest points
Done !!
Eye “fixation” points as Interest Points
K-means clustering to map it to Visual Vocabulary Histogram of Visual words
Train Classifier
histograms
approach with other state of the art performances.
subject with 12 frame overlap. (computational reasons)
(dimension=300)
videos feature histogram. Test over 884 test videos.
Frame showing Interest point (From Eye Gaze data) Action – GetOutCar Action – FightPerson
Embedded
Visual Saliency Predictor ?
descriptor. Input: HoG3D feature detector around each pixel of the video data. Output: Yes or No (being salient)
datasets and learnt saliency models for visual action recognition." Computer Vision-ECCV 2012. Springer Berlin Heidelberg, 2012. 842- 856.
dynamic gaze datasets and learnt saliency models for visual
Romanian Academy and University of Bonn (February 2012), 2012.
descriptor based on 3D-gradients." (2008).
Computer Vision 64.2 (2005): 107-123. – Hollywood-2 Dataset
[CVPR „08] [Klaser 2008] This involves Gradient computation and Orientation binning. Gradient computation requires filtering the image with the kernels [-1,0,1] and [-1,0,1]‟