describing changes in human appearance over time video
play

Describing Changes in Human Appearance Over Time Video Analysis for - PDF document

Describing Changes in Human Appearance Over Time Video Analysis for Sociology Video Analysis for Sociology Charlie s Angels: 1976 and Miami Vice: 1984 and 2006 Dukes of Hazzard: Neva Cherniavsky, Ivan Laptev, 2000 1979 and 2005 Josef


  1. Describing Changes in Human Appearance Over Time Video Analysis for Sociology Video Analysis for Sociology Charlie ʼ s Angels: 1976 and Miami Vice: 1984 and 2006 Dukes of Hazzard: Neva Cherniavsky, Ivan Laptev, 2000 1979 and 2005 Josef Sivic, Andrew Zisserman Describing Changes in Human Sociology Research Appearance Over Time • Typical data sets: 250 movies • Coders (usually students) view video in entirety twice and view each incidence multiple times; usually 10% overlap for inter-coder reliability Preventative Medicine Vol 34, 2002 Sociology Research Sociology Research • Typical data sets: 250 movies, 617 commercials • Typical data sets: 250 movies, 617 commercials, 195 television episodes • Coders (usually students) view video in entirety twice • Coders (usually students) view video in entirety twice and view each incidence multiple times; usually 10% and view each incidence multiple times; usually 10% overlap for inter-coder reliability overlap for inter-coder reliability Sex Roles Vol 35 Nos 3/4, 1996 Journal of Alcohol Studies Vol 51 No 5, 1990

  2. Sociology Research Sociology Research • Typical data sets: 250 movies, 617 commercials, 195 • Typical data sets: 250 movies, 617 commercials, 195 television episodes, 900 movies television episodes, 900 movies • Coders (usually students) view video in entirety twice • Raters (usually students) view video in entirety twice and and view each incidence multiple times; usually 10% view each incidence multiple times; usually 10% overlap overlap for inter-coder reliability for inter-rater reliability Tobacco Control Vol 15, 2006 Goal: Video to Statistics Data • Automatically find attributes, and number of occurrences, in video data • Minimize supervision (many different possible attributes) • Hollywood movies from different time periods – The Graduate, Roman Holiday, When Harry Met Sally, Love, Actually • Institut National de l ʼ Audiovisuel – R&D: L. Laborelli and D. Teruggi – 1.5 Mhours of annotated audiovisual archives, 50 years of TV Currently: focus on facial attributes Currently: focus on facial attributes Annotated Training data Annotated Training data Gender: Gender: Males (108): 86.2% Males (108): 86.2% Females (19): 13.8% Females (19): 13.8% Face Pipeline Face Pipeline Facial hair: Facial hair: Mustache (11): 8.0% Mustache (11): 8.0% Detection Detection None (115): 92.0% None (115): 92.0% Expression: Expression: Description Description Smiling (29): 21.0% Smiling (29): 21.0% Tracking Unsmiling (96): 79.0% Tracking Unsmiling (96): 79.0% Hair color: Hair color: Classification Classification Blond (4): 2.9% Blond (4): 2.9% Not blond (124): 97.1% Not blond (124): 97.1% … … … …

  3. Face Pipeline: Detection Face Pipeline: Description • Face representation - local image descriptors at • Run face detection on each frame (Viola- facial feature points Jones) • Extended pictorial structure model [ Everingham , Sivic, Zisserman, 2006] Face Pipeline: Tracking Face Pipeline: Classification • Measure “connectedness” of a pair of faces by point tracks intersecting both • Classify tracks using SVM • Doesn ʼ t require contiguous detections • Distance between tracks is the minimum • Independent evidence – no drift distance between facial features (not a • Faces into tracks kernel): D(T i , T j ) = min(d(x,y) | x ∈ T i ,y ∈ T j ) [Everingham et al. 2006] Classification: Matching face Training data sets Annotated Training data Gender: Males (108): 86.2% Females (19): 13.8% Face Pipeline Facial hair: Mustache (11): 8.0% Detection None (115): 92.0% Expression: Description Smiling (29): 21.0% Tracking Unsmiling (96): 79.0% Hair color: Classification Blond (4): 2.9% Not blond (124): 97.1% … …

  4. Training from still images vs Training data video • Need annotated training data • Still images: + Variation across people • Ideally we would train on a large number + Potentially labeled data from web of attributes with limited supervision for free • Looked at two sources: video or still + Higher quality (resolution, no images motion blur) – Not much variation in expression • Mechanical Turk (Amazon) • Videos: – Large scale coordination of manual tasks + Variation across – Turks label one frame of the track or a single viewpoint/expression still image + Same domain as the testing set – Not much variation in people Current results: gender Automatically tagged video Current work • Preliminary conclusions: Better to train on videos • Ongoing work: Study how to combine still images and videos to improve attribute labeling • More attributes: – Race, age, hair color, eye wear – Use upper body detection to capture clothing, hairstyles – Dynamic attributes: smoking, drinking, smiling • Video to Statistics – Understand where we fail so even when we miss faces, we can report statistics

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend