Learning How to Move and Where to Look from Unlabeled Video Kristen - PowerPoint PPT Presentation

Learning How to Move and Where to Look from Unlabeled Video Kristen Grauman Department of Computer Science University of Texas at Austin

Visual recognition Objects amusement park sky Activities Scenes Locations The Wicked Cedar Point Text / writing Twister Faces Gestures Ferris ride Motions wheel ride Emotions… 12 E Lake Erie water ride tree tree people waiting in line people sitting on ride umbrellas tree maxair carousel deck bench tree pedestrians Kristen Grauman, UT Austin

Visual recognition: applications Organizing visual content Science and medicine AI and autonomous robotics Personal photo/video collections Gaming, HCI, Augmented Reality Surveillance and security Kristen Grauman, UT Austin

Significant recent progress in the field Big labeled Deep learning datasets ImageNet top-5 error (%) 30 25 20 GPU technology 15 10 5 0 1 2 3 4 5 6 Kristen Grauman, UT Austin

Recognition benchmarks PASCAL (2007-12) BSD (2001) Caltech 101 (2004), Caltech 256 (2006) LabelMe (2007) ImageNet (2009) SUN (2010) Places (2014) MS COCO (2014) Visual Genome (2016) Kristen Grauman, UT Austin

How do our systems learn about the visual world today? dog … Expensive and restrictive in scope … boat Kristen Grauman, UT Austin

Big picture goal: Embodied visual learning Status quo : Learn from “disembodied” bag of labeled snapshots. Our goal: Visual learning in the context of acting and moving in the world. Inexpensive and unrestricted in scope Kristen Grauman, UT Austin

Talk overview Towards embodied visual learning 1. Learning representations tied to ego-motion 2. Learning representations from unlabeled video 3. Learning how to move and where to look Kristen Grauman, UT Austin

The kitten carousel experiment [Held & Hein, 1963] passive kitten active kitten Key to perceptual development: self-generated motion + visual feedback Kristen Grauman, UT Austin

Our idea: Ego-motion vision Goal: Teach computer vision system the connection: “how I move” “how my visual surroundings change” + Unlabeled video Ego-motion motor signals [Jayaraman & Grauman, ICCV 2015] Kristen Grauman, UT Austin

Ego-motion vision: view prediction After moving: Kristen Grauman, UT Austin

Ego-motion vision for recognition Learning this connection requires:  Depth, 3D geometry Also key to  Semantics recognition!  Context Can be learned without manual labels! Our approach: unsupervised feature learning using egocentric video + motor signals [Jayaraman & Grauman, ICCV 2015] Kristen Grauman, UT Austin

Approach idea: Ego-motion equivariance Invariant features: unresponsive to some classes of transformations Simard et al, Tech Report, ’98 Wiskott et al, Neural Comp ’02 Hadsell et al, CVPR ’06 Mobahi et al, ICML ’09 Zou et al, NIPS ’12 Sohn et al, ICML ’12 Cadieu et al, Neural Comp ’12 Goroshin et al, ICCV ’15 Lies et al, PLoS computation biology ’14 … Kristen Grauman, UT Austin

Approach idea: Ego-motion equivariance Invariant features: unresponsive to some classes of transformations Equivariant features: predictably responsive to some classes of transformations, through simple mappings (e.g., linear) “equivariance map” Invariance discards information; equivariance organizes it. Kristen Grauman, UT Austin

Approach idea: Ego-motion equivariance Equivariant embedding Training data organized by ego-motions Unlabeled video + motor signals left turn right turn forward motor signal Learn Pairs of frames related by similar ego-motion should be related by same time feature transformation [Jayaraman & Grauman, ICCV 2015] Kristen Grauman, UT Austin

Approach idea: Ego-motion equivariance Equivariant embedding Training data organized by ego-motions Unlabeled video + motor signals motor signal Learn time [Jayaraman & Grauman, ICCV 2015] Kristen Grauman, UT Austin

Ego-motion equivariant feature learning Given: Desired : for all motions and all images , Unsupervised training � � � � � � Feature space � � (� � ) � � (�� ) [Jayaraman & Grauman, ICCV 2015] Kristen Grauman, UT Austin

Ego-motion equivariant feature learning Given: Desired : for all motions and all images , Unsupervised training � � � � � � Supervised training softmax loss � � � class , and jointly trained [Jayaraman & Grauman, ICCV 2015] Kristen Grauman, UT Austin

Results: Recognition Learn from unlabeled car video (KITTI) Geiger et al, IJRR ’13 Exploit features for static scene classification (SUN, 397 classes) Xiao et al, CVPR ’10 Kristen Grauman, UT Austin

Results: Recognition Ego-equivariance for unsupervised feature learning 9 SUN scenes: 397 multi-class accuracy 8 7 Accuracy (%) 6 Egomotion-equivariance induces the 5 strongest representations 4 3 2 1 0 1 2 3 4 5 Series1 Series2 Pre-trained models available Series3 Series4 + Hadsell, Chopra, LeCun, “Dimensionality Reduction by Learning an Invariant Mapping”, CVPR 2006 * Agrawal, Carreira, Malik, “Learning to see by moving”, ICCV 2015 Kristen Grauman, UT Austin

Learning from arbitrary unlabeled video? Unlabeled video Unlabeled video + ego-motion Kristen Grauman, UT Austin

Background: Slow feature analysis [Wiskott & Sejnowski, 2002] Find functions g(x) that map quickly varying input slowly varying signal x( t ) features y( t ) Figure: Laurenz Wiskott, http://www.scholarpedia.org/article/File:SlowFeatureAnalysis-OptimizationProblem.png Kristen Grauman, UT Austin

Prior work: Slow feature analysis Wiskott et al, 2002 Hadsell et al. 2006 Mobahi et al. 2009 Bergstra & Bengio 2009 Goroshin et al. 2013 Wang & Gupta 2015 … Learn feature map such that: (invariance) Kristen Grauman, UT Austin

Our idea: Steady feature analysis Higher order temporal coherence Learn feature map such that: (invariance) (equivariance) [Jayaraman & Grauman, CVPR 2016] Kristen Grauman, UT Austin

Our idea: Steady feature analysis Learn feature map such that: (invariance) (equivariance) [Jayaraman & Grauman, CVPR 2016] Kristen Grauman, UT Austin

Datasets Unlabeled video Target task (few labels) Human Motion PASCAL 10 Actions Database (HMDB) SUN 397 Scenes KITTI Video NORB NORB 25 Objects 32 x 32 images or 96 x 96 images Kristen Grauman, UT Austin

Results: Steady feature analysis * ** Multi-class recognition accuracy *Hadsell et al., Dimensionality Reduction by Learning an Invariant Mapping, CVPR’06 **Mobahi et al., Deep Learning from Temporal Coherence in Video, ICML’09 Kristen Grauman, UT Austin

Pre-training a representation Supervised pre-training Labeled images Few labeled images from a related domain for target task Fine-tune Unsupervised “pre-training” Few labeled images Unlabeled video for target task Kristen Grauman, UT Austin

Results: Can we learn more from unlabeled video than “related” labeled images? + HMDB (unlabeled video) CIFAR-100 PASCAL (labeled for other (few img labels) categories) Kristen Grauman, UT Austin

Results: Can we learn more from unlabeled video than “related” labeled images? + HMDB (unlabeled video) Better even than providing 50,000 extra manual labels for auxiliary classification task! CIFAR-100 PASCAL (labeled for other (few img labels) categories) Kristen Grauman, UT Austin

Current recognition benchmarks Passive, disembodied snapshots at test time, too BSD (2001) PASCAL (2007-12) Caltech 101 (2004), Caltech 256 (2006) LabelMe (2007) ImageNet (2009) SUN (2010) Places (2014) MS COCO (2014) Visual Genome (2016) Kristen Grauman, UT Austin

Current recognition benchmarks Passive, disembodied snapshots at test time, too Object recognition ? ? ? Scene recognition ? ? Kristen Grauman, UT Austin

Moving to recognize Time to revisit active recognition in challenging settings! Bajcsy 1985, Aloimonos 1988, Ballard 1991, Wilkes 1992, Dickinson 1997, Schiele & Crowley 1998, Tsotsos 2001, Denzler 2002, Soatto 2009, Ramanathan 2011, Borotschnig 2011, … Kristen Grauman, UT Austin

Learning How to Move and Where to Look from Unlabeled Video Kristen - PowerPoint PPT Presentation

Learning How to Move and Where to Look from Unlabeled Video Kristen Grauman Department of Computer Science University of Texas at Austin Visual recognition Objects amusement park sky Activities Scenes Locations The Wicked Cedar Point

Collection #1 LOOk 1/8 LOOk 2/8 LOOk 3/8 LOOk 4/8 LOOk 5/8 LOOk 6/8

Mimicking Word Embeddings using Subword RNNs Yuval Pinter, Robert Guthrie, Jacob Eisenstein

10 Steps to Counting Unlabeled Planar Graphs: 20 Years Later Manuel Bodirsky October 2007

10701 Semi supervised learning Can Unlabeled Data improve supervised learning? Important

Visual Learning with Unlabeled Video and Look-Around Policies Kristen Grauman Department of

Unlabeled Motzkin numbers Max Alekseyev Dept. Computer Science and Engineering 2013 Max

Word2Vec Michael Collins, Columbia University Motivation We can easily collect very large

Clustering Clustering is an unsupervised classification method, i.e. unlabeled data is partitioned

Ac#ve Learning Aarti Singh Machine Learning 10-601 Dec 6, 2011 Slides Courtesy: Burr

Learning from Unlabeled Data INFO-4604, Applied Machine Learning University of Colorado Boulder

Easy Move Progression and Distinctive versions The lifts for accessibility EASY MOVE Distinctive

Active Learning with Active Learning with Model Selection Neil Rubens Sugiyama Lab / Tokyo

Clustering Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Oct 25, 2010

Learning on the move Hypertag project Learning on the move 18 month project 3 informal learning

Learning from Limited Labeled Data (but a lot of unlabeled data) NELL as a case study Tom M.

Motion Created by: You and Me Friends Well Be Name _____________________ Objects that Move

Introduction to Virtual Reality Alberto Borghese Department of Computer Science University of

Methods and Applications Rashid Mehmood School of Engineering Swansea University

CS 561: Artificial Intelligence Instructor: Prof Hadi Moradi Instructor: Prof. Hadi Moradi,

model-based situational awareness P R E S E N T E D B Y T. Russell Gayle, Computer Scientist I

Computer Graphics SS2019 Christian Theobalt Mohamed Elgharib Vladislav Golyanik Graphics,

Faculty Lightning Introductions Scott Stevens Scott Stevens U.N.C.L.E The Benjamin

Large Scale Data Engineering Big Data Frameworks: Hadoop & Spark event.cwi.nl/lsde Key

Prospects at ILC A gold place for QCD in the perturbative Regge limit Samuel Wallon 1 1

Learning How to Move and Where to Look from Unlabeled Video Kristen - PowerPoint PPT Presentation

Learning How to Move and Where to Look from Unlabeled Video Kristen Grauman Department of Computer Science University of Texas at Austin Visual recognition Objects amusement park sky Activities Scenes Locations The Wicked Cedar Point

Collection #1 LOOk 1/8 LOOk 2/8 LOOk 3/8 LOOk 4/8 LOOk 5/8 LOOk 6/8

Mimicking Word Embeddings using Subword RNNs Yuval Pinter, Robert Guthrie, Jacob Eisenstein

10 Steps to Counting Unlabeled Planar Graphs: 20 Years Later Manuel Bodirsky October 2007

10701 Semi supervised learning Can Unlabeled Data improve supervised learning? Important

Visual Learning with Unlabeled Video and Look-Around Policies Kristen Grauman Department of

Unlabeled Motzkin numbers Max Alekseyev Dept. Computer Science and Engineering 2013 Max

Word2Vec Michael Collins, Columbia University Motivation We can easily collect very large

Clustering Clustering is an unsupervised classification method, i.e. unlabeled data is partitioned

Ac#ve Learning Aarti Singh Machine Learning 10-601 Dec 6, 2011 Slides Courtesy: Burr

Learning from Unlabeled Data INFO-4604, Applied Machine Learning University of Colorado Boulder

Easy Move Progression and Distinctive versions The lifts for accessibility EASY MOVE Distinctive

Active Learning with Active Learning with Model Selection Neil Rubens Sugiyama Lab / Tokyo

Clustering Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Oct 25, 2010

Learning on the move Hypertag project Learning on the move 18 month project 3 informal learning

Learning from Limited Labeled Data (but a lot of unlabeled data) NELL as a case study Tom M.

Motion Created by: You and Me Friends Well Be Name _____________________ Objects that Move

Introduction to Virtual Reality Alberto Borghese Department of Computer Science University of

Methods and Applications Rashid Mehmood School of Engineering Swansea University

CS 561: Artificial Intelligence Instructor: Prof Hadi Moradi Instructor: Prof. Hadi Moradi,

model-based situational awareness P R E S E N T E D B Y T. Russell Gayle, Computer Scientist I

Computer Graphics SS2019 Christian Theobalt Mohamed Elgharib Vladislav Golyanik Graphics,

Faculty Lightning Introductions Scott Stevens Scott Stevens U.N.C.L.E The Benjamin

Large Scale Data Engineering Big Data Frameworks: Hadoop &amp; Spark event.cwi.nl/lsde Key

Prospects at ILC A gold place for QCD in the perturbative Regge limit Samuel Wallon 1 1

Large Scale Data Engineering Big Data Frameworks: Hadoop & Spark event.cwi.nl/lsde Key