Video Object Recogni/on Chenyi Chen Mo/on is important How - PowerPoint PPT Presentation

Video ¡Object ¡Recogni/on Chenyi ¡Chen

Mo/on ¡is ¡important • How ¡important? ¡ • Let’s ¡first ¡look ¡at ¡“Visual ¡Parsing ¡AEer ¡ Recovery ¡From ¡Blindness” ¡ • This ¡is ¡a ¡real ¡“vision” ¡paper

Background • Study ¡how ¡do ¡three ¡Indian ¡pa/ents ¡(subjects) ¡ develop ¡object ¡recogni/on ¡ability ¡aEer ¡long ¡ term ¡blindness ¡ • Give ¡treatment ¡to ¡the ¡subjects ¡ • During ¡recovery, ¡test ¡on ¡the ¡subjects ¡to ¡see ¡ how ¡they ¡perform ¡on ¡recogni/on ¡tasks ¡

Background • The ¡subjects ¡are: ¡ • S.K.: ¡age ¡29, ¡male, ¡born ¡blindness, ¡M.A. ¡in ¡ poli/cal ¡science ¡ • J.A.: ¡age ¡13, ¡male, ¡born ¡blindness, ¡never ¡ received ¡educa/on ¡ • P.B.: ¡age ¡7, ¡male, ¡born ¡blindness ¡ • Control ¡group: ¡4 ¡normal ¡sighted ¡adults, ¡similar ¡ social ¡background

Subjects’ ¡parsing ¡of ¡sta/c ¡images

S.K. ¡versus ¡simple ¡region ¡par//on ¡ algorithm

Dynamic ¡informa/on ¡in ¡object ¡ segrega/on

Mo/lity ¡ra/ng ¡and ¡object ¡recogni/on ¡ results

Follow-‑up ¡tes/ng ¡aEer ¡several ¡months

What ¡do ¡we ¡learn ¡about ¡developing ¡ visual ¡parsing ¡skill • Early ¡stages: ¡integra/ve ¡impairments, ¡ overfragmenta/on ¡of ¡images, ¡compromise ¡ recogni/on ¡performance ¡ • However, ¡mo/on ¡effec/vely ¡mi/gates ¡these ¡ integra/ve ¡difficul/es ¡ • Mo/on ¡appears ¡to ¡be ¡instrumental ¡both ¡in ¡ segrega/ng ¡objects ¡and ¡in ¡binding ¡their ¡ cons/tuents ¡into ¡representa/ons ¡for ¡ recogni/on ¡

• So ¡we ¡have ¡some ¡insight ¡of ¡how ¡people ¡ developing ¡visual ¡recogni/on ¡ability ¡ ¡ • Can ¡we ¡reproduce ¡visual ¡learning ¡process ¡on ¡a ¡ robot? ¡ • Let’s ¡look ¡at ¡“Learning ¡about ¡Humans ¡During ¡ the ¡First ¡6 ¡Minutes ¡of ¡Life”

A ¡baby ¡robot

Hypothesis ¡in ¡social ¡development • The ¡infant ¡brain ¡is ¡par/cularly ¡sensi/ve ¡to ¡the ¡ presence ¡of ¡con/ngencies ¡ • The ¡con/ngency ¡drives ¡the ¡defini/on ¡and ¡ recogni/on ¡of ¡caregivers ¡ • Human ¡faces ¡become ¡acrac/ve ¡because ¡they ¡ tend ¡to ¡occur ¡in ¡high ¡con/ngency ¡situa/ons

Goal • Whether ¡acous/c ¡con/ngency ¡informa/on ¡ (sound) ¡would ¡be ¡sufficient ¡for ¡the ¡robot ¡to ¡ develop ¡preferences ¡for ¡human ¡faces ¡ • If ¡so, ¡get ¡a ¡sense ¡for ¡the ¡/me ¡scale ¡of ¡the ¡ learning ¡problem ¡

A ¡baby ¡robot

Sedngs • The ¡baby ¡robot ¡interacted ¡with ¡the ¡lab ¡ members ¡while ¡recording ¡image ¡it ¡saw ¡ • Con/ngency ¡detec/on ¡engine ¡analyzes ¡sound ¡ signal ¡for ¡presence ¡of ¡con/ngencies ¡ ¡ • Whether ¡people ¡were ¡present ¡is ¡not ¡specified ¡ • Whether ¡people ¡were ¡of ¡any ¡par/cular ¡ relevance ¡is ¡not ¡specified ¡ ¡ • The ¡only ¡training ¡label ¡is ¡the ¡acous/c ¡ con/ngency ¡signal

Visual ¡learning ¡engine • Probabilis/c ¡model ¡ • Only ¡needs ¡the ¡images ¡to ¡be ¡weakly ¡labeled ¡as ¡ containing ¡with ¡high ¡or ¡low ¡probability ¡the ¡ object ¡of ¡interest, ¡do ¡not ¡need ¡to ¡indicate ¡ where ¡the ¡objects ¡are ¡located ¡on ¡the ¡image ¡ plane ¡ • Implementable ¡in ¡a ¡neural ¡network ¡ • Run ¡in ¡real ¡/me ¡at ¡video ¡frame ¡rate

Hardware • Plush ¡baby ¡doll ¡ • IEEE1394a ¡webcam ¡(capture ¡images, ¡only ¡ grayscale ¡images ¡used ¡for ¡training) ¡ • Microphone ¡(receive ¡auditory ¡signal) ¡ • Loudspeaker ¡(baby ¡makes ¡excited ¡noise) ¡

Collec/ng ¡data • Record ¡the ¡auditory ¡and ¡visual ¡signals ¡for ¡88 ¡ minutes ¡ • 2877 ¡posi/ve ¡examples ¡ • 824 ¡nega/ve ¡examples ¡ • Baby ¡robot ¡was ¡placed ¡in ¡chair, ¡stroller, ¡and ¡a ¡ crib, ¡with ¡bright ¡or ¡dim ¡ligh/ng ¡condi/ons ¡ • 9 ¡persons ¡interacted ¡with ¡the ¡baby ¡robot

Collec/ng ¡data • Select ¡34 ¡posi/ve ¡examples ¡and ¡200 ¡nega/ve ¡ examples ¡for ¡training ¡(approx. ¡5 ¡min ¡34 ¡sec). ¡ The ¡rest ¡are ¡used ¡for ¡tes/ng ¡ • The ¡label ¡is ¡noisy

Results • Evalua/on: ¡ 2-‑Alterna/ve ¡Forced ¡Choice ¡Task ¡ (2AFC) ¡ • 86.17% ¡on ¡the ¡face ¡detec/on ¡task ¡( ¡i.e., ¡ deciding ¡which ¡of ¡two ¡images ¡contained ¡a ¡face) ¡ • 89.7% ¡correct ¡on ¡the ¡con/ngency ¡task ¡(i.e., ¡ deciding ¡which ¡of ¡two ¡images ¡was ¡more ¡likely ¡to ¡ be ¡associated ¡with ¡an ¡auditory ¡con/ngency) ¡ • 92.3 ¡% ¡correct ¡on ¡the ¡person ¡detec/on ¡task ¡ (i.e., ¡deciding ¡which ¡image ¡contained ¡a ¡person).

Results • Examples ¡images ¡and ¡their ¡pixel-‑wise ¡ probability ¡images

Results • Infants ¡showed ¡a ¡significant ¡order ¡of ¡tracking ¡ preference ¡in ¡favor ¡the ¡face ¡s/mulus, ¡ followed ¡by ¡the ¡scrambled ¡s/mulus, ¡followed ¡ by ¡the ¡empty ¡s/mulus ¡ • The ¡robot ¡reproduce ¡the ¡ ¡ ¡ ¡ ¡preference ¡order

• Video ¡usually ¡contains ¡more ¡data ¡for ¡object ¡ detector ¡training ¡ • There ¡is ¡a ¡domain ¡difference ¡between ¡video ¡ and ¡s/ll ¡image ¡ • So ¡“Analysing ¡domain ¡shiE ¡factors ¡between ¡ videos ¡and ¡images ¡for ¡object ¡detec/on” ¡is ¡ necessary

Goal • For ¡a ¡given ¡target ¡test ¡domain ¡(image ¡or ¡ video), ¡the ¡performance ¡of ¡the ¡detector ¡ depends ¡on ¡the ¡domain ¡it ¡was ¡trained ¡on. ¡ ¡ • Examine ¡the ¡reasons ¡behind ¡this ¡performance ¡ gap. ¡ • Train ¡an ¡object ¡detector ¡with ¡samples ¡either ¡ from ¡s/ll ¡images ¡or ¡from ¡video ¡frames ¡and ¡ then ¡test ¡the ¡detector ¡on ¡both ¡domains.

Dataset • S/ll ¡images ¡(VOC) ¡ • PASCAL ¡VOC ¡2007 ¡ • 10 ¡class ¡of ¡moving ¡objects ¡chosen

Dataset • Video ¡frames ¡(VID) ¡ • YouTube-‑Objects ¡dataset ¡ • 10 ¡classes ¡of ¡moving ¡objects ¡ • Further ¡annotated ¡a ¡few ¡images ¡to ¡make ¡the ¡ dataset ¡have ¡comparable ¡labels ¡with ¡VOC

Equalizing ¡the ¡number ¡of ¡samples ¡per ¡ class • Equalize ¡the ¡training ¡samples ¡of ¡VOC ¡and ¡VID ¡ • 3097 ¡in ¡total ¡over ¡the ¡10 ¡classes ¡(Table. ¡1) ¡ • Only ¡the ¡equalized ¡training ¡sets ¡are ¡used ¡ • trainVOC ¡ • trainVID ¡

Domain ¡shiE ¡factors • Spa/al ¡loca/on ¡accuracy: ¡accuracy ¡of ¡ bounding ¡box ¡ • Appearance ¡diversity: ¡consecu/ve ¡frames ¡in ¡ video ¡are ¡similar, ¡thus ¡less ¡diverse ¡ • Image ¡quality: ¡compression, ¡mo/on ¡blur ¡etc. ¡ in ¡video ¡images ¡ • Object ¡detector: ¡DPM

Spa/al ¡loca/on ¡accuracy • Method ¡of ¡gedng ¡bounding ¡box ¡on ¡video: ¡ • PRE: ¡worst ¡ • FVS: ¡becer ¡ • Manual ¡label: ¡best

Spa/al ¡loca/on ¡accuracy • Reduce ¡almost ¡4% ¡of ¡the ¡gap ¡(test ¡on ¡VOC)

Spa/al ¡loca/on ¡accuracy • Equaliza/on: ¡using ¡the ¡ground ¡truth ¡(human ¡ labeled) ¡bounding ¡box ¡on ¡trainVID

Appearance ¡diversity • Near ¡iden/cal ¡samples ¡of ¡an ¡object ¡in ¡video

Appearance ¡diversity • Measure ¡diversity: ¡ • Clustering ¡(agglomera/ve ¡clustering, ¡L2 ¡ distance ¡of ¡HOG ¡features): ¡each ¡cluster ¡ contains ¡visually ¡very ¡similar ¡samples ¡ • Measure ¡appearance ¡diversity ¡by ¡coun/ng ¡the ¡ number ¡of ¡clusters ¡ • Equaliza/on: ¡resample ¡training ¡sets ¡so ¡the ¡ number ¡of ¡images ¡and ¡clusters ¡(of ¡trainVOC ¡ and ¡trainVID) ¡are ¡equal

Appearance ¡diversity

Video Object Recogni/on Chenyi Chen Mo/on is important How - PowerPoint PPT Presentation

Video Object Recogni/on Chenyi Chen Mo/on is important How important? Lets first look at Visual Parsing AEer Recovery From Blindness This is a

Face Recogni+on CSE 576 Face recogni+on: once youve

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Intro to Pa)ern Recogni/on CSCI 8260 Spring 2016 Computer Network A)acks and Defenses

Vi Video Ob eo Object ject Segm Segmen enta tati tion on CV3DST | Prof. Leal-Taix 1

Object-Oriented Databases Object Oriented Databases ODMG Standard Object Model, Object

Object oriented Object oriented Object oriented Object oriented approach and UML approach and

Video Games Written and Researched by: Patrick Kania First Video Game The first Video Game made

Object Space Volume Rendering Object Space Volume Rendering Ronald Peikert SciVis 2010 - Object

EU US MUTUAL RECOGNI TI ON AGREEMENT First annual EMA-EuropaBio bilateral meeting 9 th June 2017

Acous&c Environment Recogni&on Graduate Student Mentor: Hedieh

Machine Learning for Ac/vity Recogni/on Jay Urbain, PhD

Emo$on Recogni$on in Images and Text Agata Lapedriza alapedriza@uoc.edu / agata@mit.edu

Th The e Ch ChEMU ev evaluation campaign: Na Named d entity y recogni gnition n and nd

OverFeat Integrated Recogni.on, Localiza.on and Detec.on using

Recogni(on of Mul(-Oriented, Mul(-Sized, and Curved Text

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 3/20/2019 NVIDIA Video Technologies Overview Turing

Systems Mathias Lux, mlux@itec.uni-klu.ac.at Dienstags, 16.oo Uhr c.t., E.1.42 This work is

Interest point detection Nicolas ROUGON ARTEMIS Department Nicolas.Rougon@telecom-sudparis.eu

Administrivia Homework 2 due today CMPSCI 370: Intro to Computer Vision Homework 3 will

Asymptotic stabilization of the hyperelastic-rod wave equation Giuseppe Maria Coclite Department

Computer Graphics 4731 Lecture 5: Fractals Prof Emmanuel Agu Computer Science Dept. Worcester

Today and tomorrow typedef for very simple type definitions. struct for interesting

61A Lecture 18 Monday, October 13 Announcements Homework 5 is due Wednesday 10/15 @ 11:59pm

Principles of Software Construction: Objects, Design, and Concurrency Object-Oriented Programming

Video Object Recogni/on Chenyi Chen Mo/on is important How - PowerPoint PPT Presentation

Video Object Recogni/on Chenyi Chen Mo/on is important How important? Lets first look at Visual Parsing AEer Recovery From Blindness This is a

Face Recogni+on CSE 576 Face recogni+on: once youve

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Intro to Pa)ern Recogni/on CSCI 8260 Spring 2016 Computer Network A)acks and Defenses

Vi Video Ob eo Object ject Segm Segmen enta tati tion on CV3DST | Prof. Leal-Taix 1

Object-Oriented Databases Object Oriented Databases ODMG Standard Object Model, Object

Object oriented Object oriented Object oriented Object oriented approach and UML approach and

Video Games Written and Researched by: Patrick Kania First Video Game The first Video Game made

Object Space Volume Rendering Object Space Volume Rendering Ronald Peikert SciVis 2010 - Object

EU US MUTUAL RECOGNI TI ON AGREEMENT First annual EMA-EuropaBio bilateral meeting 9 th June 2017

Acous&amp;c Environment Recogni&amp;on Graduate Student Mentor: Hedieh

Machine Learning for Ac/vity Recogni/on Jay Urbain, PhD

Emo$on Recogni$on in Images and Text Agata Lapedriza alapedriza@uoc.edu / agata@mit.edu

Th The e Ch ChEMU ev evaluation campaign: Na Named d entity y recogni gnition n and nd

OverFeat Integrated Recogni.on, Localiza.on and Detec.on using

Recogni(on of Mul(-Oriented, Mul(-Sized, and Curved Text

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 3/20/2019 NVIDIA Video Technologies Overview Turing

Systems Mathias Lux, mlux@itec.uni-klu.ac.at Dienstags, 16.oo Uhr c.t., E.1.42 This work is

Interest point detection Nicolas ROUGON ARTEMIS Department Nicolas.Rougon@telecom-sudparis.eu

Administrivia Homework 2 due today CMPSCI 370: Intro to Computer Vision Homework 3 will

Asymptotic stabilization of the hyperelastic-rod wave equation Giuseppe Maria Coclite Department

Computer Graphics 4731 Lecture 5: Fractals Prof Emmanuel Agu Computer Science Dept. Worcester

Today and tomorrow typedef for very simple type definitions. struct for interesting

61A Lecture 18 Monday, October 13 Announcements Homework 5 is due Wednesday 10/15 @ 11:59pm

Principles of Software Construction: Objects, Design, and Concurrency Object-Oriented Programming

Acous&c Environment Recogni&on Graduate Student Mentor: Hedieh