Automatic and Efficient Long Term Arm and Hand Tracking for - PowerPoint PPT Presentation

Automatic and Efficient Long Term Arm and Hand Tracking for Continuous Sign Language TV Broadcasts Tomas Pfister 1 , James Charles 2 , Mark Everingham 2 , Andrew Zisserman 1 1 ¡ Visual ¡Geometry ¡Group ¡ 2 ¡ School ¡of ¡Compu=ng ¡ ¡University ¡of ¡Oxford ¡ ¡ ¡University ¡of ¡Leeds ¡ British Machine Vision Conference (BMVC) September 4 th , 2012

Motivation Automatic sign language recognition: § We want a large set of training examples to learn a sign classifier. We obtain them from signed TV broadcasts. § § Exploit correspondences between signs and subtitles to automatically learn signs. § Use the resulting sign-video pairs to train a sign language classifier. Page 2 of 27

Objective Find the position of the head, arms and hands § Use arms to disambiguate where hands are Page 3 of 27

Difficulties Colour of signer Overlapping hands Hand motion blur Faces and hands similar to background in background Changing background Page 4 of 27

Overview Our approach: § First: Automatic signer segmentation § Second: Joint detection Joint detection Hand and arm Intermediate step 1 Intermediate step 2 Image location Input Co-segmentation Colour Model Random Forest Regressor Page 5 of 27

Related work Hand detection for sign language recognition Necessary user input: 75 annotated frames per one hour of video (3 hours work) State-of-the-art: Long Term Arm and Hand Tracking for Colour & shape model HOG templates Continuous Sign Language TV Broadcasts [Buehler et al., BMVC’08] Method: generative model of foreground & background using a layered pictorial structure model Method 5 frames 40 frames Find pose with minimum cost Head and body segmentation Output Colour information by Input pixel-wise labelling Find pose with minimum cost 11 DOF 15 frames 15 frames Performance: accurate tracking of 1 hour long videos, but at a cost of 100s per frame Page 6 of 27

Our work – automatic and fast! Hand detection for sign language recognition Necessary user input: 75 annotated frames per one hour of video (3 hours work) State-of-the-art: Long Term Arm and Hand Tracking for Colour & shape model HOG templates Continuous Sign Language TV Broadcasts [Buehler et al., BMVC’08] Method: generative model of foreground & background using a layered pictorial structure model Method 5 frames 40 frames Find pose with minimum cost Head and body segmentation Output Colour information by Input pixel-wise labelling Find pose with minimum cost 11 DOF 15 frames 15 frames Performance: accurate tracking of 1 hour long videos, but at a cost of 100s per frame Page 7 of 27

The problem § How do we segment the signer out of a TV broadcast? Page 9 of 27

One solution: depth data (e.g. Kinect) § Using depth data, segmentation is easy Shotton et al. CVPR’11 § But we only have 2D data from TV broadcasts… Page 10 of 27

Constancies § How do we segment a signed TV broadcast? Part of the Clearly there are many constancies in the video background is always static Box contains changing background Same signer Signer never crosses this line Page 11 of 27

Co-segmentation § Exploit constancies to help find a generative model that describes all layers in the video Page 12 of 27

Co-segmentation – overview Method: co-segmentation – consider all frames together For a sample of frames obtain … … Background … Foreground hist( ) colour model … and use the background and the foreground colour model to obtain Per-frame segmentations … … Page 13 of 27

Backgrounds Find a “clean plate” of the static background § Roughly segment a sample of frames using GrabCut § Combine background regions with a median filter Use this to refine the final foreground segmentation Page 14 of 27

Foreground colour model Find a colour model for the foreground in a sample of frames § Find faces in a sub-region of the video § Extract a colour model from a region based on the face position Use this as a global colour model for the final GrabCut segmentation Page 15 of 27

Qualitative co-segmentation results Page 16 of 27

Colour model § Segmentations are not always useful for finding the exact location of the hands ? § Skin regions give a strong clue about hand location § Solution: find a colour model of the skin/torso § Method: § skin colour from a face detector § torso colour from foreground segmentations (face colour removed) § Improves generalisation to unseen signers Page 18 of 27

Joint position estimation § Aim: find joint positions of head, shoulders, elbows and wrists § Train from Buehler et al.’s joint output Page 20 of 27

Random Forests § Method: Random Forest multi-class classification § Input: skin/torso colour posterior § Classify each pixel into one of 8 categories describing the body joints § Efficient simple node tests Colour posterior Random forest PDF of joints Estimated joints Page 21 of 27

Evaluation: comparison to Buehler et al. § Joint estimations compared against joint tracking output by Buehler et al Page 22 of 27

Evaluation: comparison to Buehler et al. Page 23 of 27

Evaluation: quantitative results Our method vs. Buehler et al. compared against manual ground truth e.g. 80% of wrist predictions are within 5 pixels of ground truth Manual ground truth Page 24 of 27

Evaluation: problem cases § Left and right hands are occasionally mixed § Occasional failures due to a person standing behind the signer Page 25 of 27

Evaluation: generalisation to new signers Trained & tested on same signer Trained & tested on different signers Generalises to new signers Page 26 of 27

Conclusion Conclusion: Presented method which finds the position of hands and arms automatically and in real-time § Method achieves reliable results for hours of tracking and generalises to new signers § Future work: Adding spatial model to avoid mixup of hands § Web page: This presentation is online at: http://www.robots.ox.ac.uk/~vgg/research/sign_language § Page 27 of 27

Automatic and Efficient Long Term Arm and Hand Tracking for - PowerPoint PPT Presentation

Automatic and Efficient Long Term Arm and Hand Tracking for Continuous Sign Language TV Broadcasts Tomas Pfister 1 , James Charles 2 , Mark Everingham 2 , Andrew Zisserman 1 1 Visual Geometry Group 2 School of

Systems Architecture The ARM Processor The ARM Processor p. 1/14 The ARM Processor ARM:

ARM Software Suite Powered by GDM Why use ARM Software? ARM is the software solution to plan,

ARM Advanced RISC Machines The ARM Instruction Set The ARM Instruction Set - ARM University

ARM Cortex-M4 Programming Model ARM = Advanced RISC Machines, Ltd. ARM licenses IP to other

ARM Microprocessor and ARM-Based Microcontrollers Nguatem William 24th May 2006 1 / 40 A

List of hand outs for this session Hand out 1: Incident decision tree Hand out 2: Yorkshire

Hand arm Vibration & Hand-arm Vibration & Recent Developments Recent Developments

Verifying the Motion of a Robot Arm Akul Penugonda 1 /6 Akul Penugonda - Robot Arm Motion 2

ARM v4T CS2253 Owen Kaser, UNBSJ ARM v4T History of ARM processors R is for RISC

Hand Hygiene Stefan Morton Hand Hygiene Coordinator Evidence Improved adherence to hand

Automatic Registration and Calibration Automatic Registration and Calibration Automatic

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Tracking H akan Ard o March 4, 2013 H akan Ard o Tracking March 4, 2013 1 / 57

It's finally time for Arm in the Datacenter- and beyond [TUT1143] Jay Kruemcke Sr. Product

Porting FreeBSD on Xen on ARM How to support your OS as Xen ARM guest Julien Grall

ARM Reports Maja Talevska Milenkovska ERP Functional Consultant, Acumatica Class Syllabus Day

Word Sense Disambiguation Unsupervised WSD Modern WSD L645 / B659 (Some material from Jurafsky

2014 TABSE CONFERENCE WORKSHOPS # Workshop Title/Presenter Strand 1 The Best Beginnings/Dr.

FULLER BUILDING 595 MADISON AVENUE 18TH FLOOR COOLING TOWER LANDMARKS PRESENTATION F E B R U A

Predictive Modeling and By-Peril Analysis for Data Peril grouping Homeowners Insurance

Leveraging orientation knowledge to enhance human pose estimation methods S. Azrour, S. Pi

Re-engineering OntoSem Ontology Towards OWL DL Compliance Guntis Barzdins, Normunds Gruzitis and

Does your customer know what they are signing off? Maria Aretoulaki (PhD) Head of Speech Design

Pre-Presentation Notes Slides and presentation materials are available online at:

Automatic and Efficient Long Term Arm and Hand Tracking for - PowerPoint PPT Presentation

Automatic and Efficient Long Term Arm and Hand Tracking for Continuous Sign Language TV Broadcasts Tomas Pfister 1 , James Charles 2 , Mark Everingham 2 , Andrew Zisserman 1 1 Visual Geometry Group 2 School of

Systems Architecture The ARM Processor The ARM Processor p. 1/14 The ARM Processor ARM:

ARM Software Suite Powered by GDM Why use ARM Software? ARM is the software solution to plan,

ARM Advanced RISC Machines The ARM Instruction Set The ARM Instruction Set - ARM University

ARM Cortex-M4 Programming Model ARM = Advanced RISC Machines, Ltd. ARM licenses IP to other

ARM Microprocessor and ARM-Based Microcontrollers Nguatem William 24th May 2006 1 / 40 A

List of hand outs for this session Hand out 1: Incident decision tree Hand out 2: Yorkshire

Hand arm Vibration &amp; Hand-arm Vibration &amp; Recent Developments Recent Developments

Verifying the Motion of a Robot Arm Akul Penugonda 1 /6 Akul Penugonda - Robot Arm Motion 2

ARM v4T CS2253 Owen Kaser, UNBSJ ARM v4T History of ARM processors R is for RISC

Hand Hygiene Stefan Morton Hand Hygiene Coordinator Evidence Improved adherence to hand

Automatic Registration and Calibration Automatic Registration and Calibration Automatic

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Tracking H akan Ard o March 4, 2013 H akan Ard o Tracking March 4, 2013 1 / 57

It's finally time for Arm in the Datacenter- and beyond [TUT1143] Jay Kruemcke Sr. Product

Porting FreeBSD on Xen on ARM How to support your OS as Xen ARM guest Julien Grall

ARM Reports Maja Talevska Milenkovska ERP Functional Consultant, Acumatica Class Syllabus Day

Word Sense Disambiguation Unsupervised WSD Modern WSD L645 / B659 (Some material from Jurafsky

2014 TABSE CONFERENCE WORKSHOPS # Workshop Title/Presenter Strand 1 The Best Beginnings/Dr.

FULLER BUILDING 595 MADISON AVENUE 18TH FLOOR COOLING TOWER LANDMARKS PRESENTATION F E B R U A

Predictive Modeling and By-Peril Analysis for Data Peril grouping Homeowners Insurance

Leveraging orientation knowledge to enhance human pose estimation methods S. Azrour, S. Pi

Re-engineering OntoSem Ontology Towards OWL DL Compliance Guntis Barzdins, Normunds Gruzitis and

Does your customer know what they are signing off? Maria Aretoulaki (PhD) Head of Speech Design

Pre-Presentation Notes Slides and presentation materials are available online at:

Hand arm Vibration & Hand-arm Vibration & Recent Developments Recent Developments