Telling What-Is-What in Video Gerard Medioni medioni@usc.edu 1

Tracking • Essential problem • Establishes correspondences between elements in successive frames • Basic problem easy 2

M any issues • One target (pursuit) vs. • A few objects vs. • Lots of objects 3

M ore issues: motion type – Rigid – Articulated – Non rigid (face expression) 4

Tag & Track - The problem Select any object and follow it in real time Object tracking problem Current work 5

Challenges Unknown type of object Unknown type of object Changes in viewpoint Changes in viewpoint Changes in lighting Changes in lighting Cluttered background Cluttered background vs vs Running time Running time 6

Context Tracker • M otivation • Context information is overlooked: online processing requirement, speed trade-off + Focus in building appearance model, do not take advantage of background info Explore Distracters and pay  Requires very complicated model when similar objects appear. more attention to them + Treat every region on the background in the same way. 7

Context Tracker What else to explore? • M otivation Supporters ! 8

Context Tracker New input image Short-term tracking Detection Detector Tracking loop Online model evaluation distance … ... … 9

Context Tracker • Distracter – Detection: o Pass the classifier (share the same classifier) o High confidence (look similar to our object) • Tracking: o Same as tracking our target BUT will be killed when being lost or look different from our target o Heuristic data association: the higher confidence has higher priority in the association queue 10

Context Tracker • Experiment settings – 8 ferns and 4 6bitBP features – M inimum search region 20x20 – Number of maximum distracters 15, maximum supporters 40 – System: 3.0 GHz (one core), 8GB M emory – Runs 10-25 fps depending on the number of distracters and supporters 11

Active Surveillance Combine • Real Time tracker and • Camera Control – To keep object of interest in the field of view of the camera – To zoom in (on the face) 14

Challenges Unknown type of object Unknown type of object Changes in viewpoint Changes in viewpoint Tracking Changes in lighting Changes in lighting Cluttered background Cluttered background vs vs Running time Running time Limited support from commercial cameras with discrete speed control Limited support from commercial cameras with discrete speed control Limited support from commercial cameras with discrete speed control due to the use of stepping motors. due to the use of stepping motors. due to the use of stepping motors. Control Delay because of communication through TCP/ IP Delay because of communication through TCP/ IP Delay because of communication through TCP/ IP Network  abrupt motion and motion blur Network  abrupt motion and motion blur Network  abrupt motion and motion blur 15

Challenges Unknown type of object Unknown type of object Changes in viewpoint Changes in viewpoint Tracking Changes in lighting Changes in lighting Cluttered background Cluttered background Running time Running time Limited support from commercial cameras with discrete speed control Limited support from commercial cameras with discrete speed control Limited support from commercial cameras with discrete speed control due to the use of stepping motors. due to the use of stepping motors. due to the use of stepping motors. Control Delay because of communication through TCP/ IP Delay because of communication through TCP/ IP Delay because of communication through TCP/ IP Network  abrupt motion and motion blur Network  abrupt motion and motion blur Network  abrupt motion and motion blur 16

Challenges • Practical issues – Pedestrians far away (face covers few pixels) 100% crop – In long focal length, people may get out of FOV with a little movement. 17

Overview Tracking control loop Pedestrian Face Tracker Camera control Camera control detector detector No Face Tracked? Y es T agged high resolution face sequences 18

Experimental setup • Settings – Sony PTZ Network Camera SNC-RZ30N with wireless card – 14 levels of speed control for panning and 18 levels for tilting – 25x optical zoom, 300x digital zoom – Pan angle: -170 to +170 degrees – Tilt angle: -90 to +25 degrees 19

Results 20

Tracking from security PTZ Camera @ USC Cannot see the face from 100% cropped image Zooming Pedestrian (11x) detector Face track Tracking … Frontal face detector 21

Tracking many objects • Useful for persistent surveillance • WAAS (Wide Area Aerial Surveillance) • Very large images (60M Pix-1GPix) • 2 frames per second 22

Video Stabilization 23

Video Stabilization Results Close Up 24

Tracking • M otivation • M oving objects tell us a lot about the “ life” in the geographic area • Important for activity recognition • Challenges • Small number of pixels on target • Large number of targets 25

Approach • Goal: infer tracklets, each representing one object, over a sliding window of frames • 4-8 second window (depends on frame rate) • Input: object detections (from background subtraction or otherwise) 26

Results (CLIF 2006) 27

Tracking Results (CLIF 2006) Normalized Track Object Detection Rate False Alarm Rate ID Consistency Fragmentation 0.72 0.04 1.01 0.84 • M anually generated ground truth • 168 tracks, 80 frames • Low track fragmentation • Low false alarm rate • Efficient • > 40 objects tracked at 2 fps • Comparison with M CM C tracker (Yu 2009) • Did not converge to a reasonable solution • Requires good initialization • Does not scale to our domain 28

Tracking VERY M ANY Objects • With the development of surveillance system, we will pay more and more attention to analyzing people in crowded scenes. (Sports, political gathering, etc.) 29

Crowded Scenes • Challenges – Hundreds of similar objects – Cluttered background – Small object size – Occlusions  Detect-then-track method fails: appearance based detector and background modeling based motion blob detector fail 30

Tracking Using M otion Patterns for Very Crowded Scenes We solve the problem of tracking in structured crowded scenes using M otion Structure Tracker (M ST)  M S T is a combination of visual tracking, motion pattern learning and multi-target tracking.  In M S T , tracking and detection are performed jointly, and motion pattern information is integrated in both steps to enforce scene structure constraint.  M S T is initially used to track a single target, and further extended to solve a simplified version of the multi-target tracking problem. 31

An Overview of M otion Structure Tracker Online Unsupervised Learning M otion Pattern Inference Tag Single Target Tracking (Detection & Tracking) Online Tracking First frame M ulti-Target Tracking Detect Similar (Detection & Tracking) Input 32

M otion Structure Tracker for Single Target Tracking  T ag & Track  Results for T emporally Stationary Scenes (motion pattern do not change with time) Marathon-2 Marathon-3 Marathon-1 Sequence M ethod ATR ACLE IVT Tracker 35.21% 62.8 M arathon-1 P-N Tracker 56.16% 35.1 Ours 81.40% 6.7 IVT Tracker 33.47% 86.5 M arathon-2 P-N Tracker 68.60% 56.4 Ours 73.12% 28.5 IVT Tracker 40.03% 64.1 M arathon-3 P-N Tracker 67.16% 33.9 Ours 92.08% 4.8 33 ATR : Average Track Ratiio ACLE: Average Center Location Error (ACLE)

M otion Structure Tracker for Single Target Tracking  Results for T emporally Non-Stationary Scenes (motion pattern change with time) Hongkong Motorbike Sequence M ethod ATR ACLE IVT Tracker 27.63% 58.9 Hongkong P-N Tracker 39.58% 42.3 Ours 62.31% 28.5 IVT Tracker 31.56% 69.7 M otorbike P-N Tracker 47.22% 55.4 Ours 90.75% 5.6 34 ATR : Average Track Ratiio ACLE: Average Center Location Error (ACLE)

M otion Structure Tracker for M ulti-Target Tracking  Once a user labels a target in the first frame, find similar objects and track all of them Ours Frame 1 Frame 71 Frame 141 Frame 211 P-N Tracker Ground Truth Ours Frame 1 Frame 31 Frame 61 Frame 91 P-N Tracker Ground Truth Examples of tracking results comparison. First row: temporally stationary scenes. Second row: temporally non-stationary scenes. 35

Expression Analysis • Understanding facial gestures – By analyzing facial motions – Facial motion induces detectable appearance changes • Two classes of facial motions – Global, rigid head motion • From head pose variation • Indicate subject’s attention – Local, nonrigid facial deformations • From facial muscle activation • Indicate subject’s expression 37

Overview Face Sequences Facial Deformations Head Pose Recognition and Training Database Interpretation 38 Expressions, Facial Gestures

Results ( Rigid tracking, real-time) Rotation, translation, & scale Fast motion Live webcam 39

Expression Analysis 40

Summary • Tracking is a multi-faceted problem • M any axes of complexity • Resolution • Number of objects • Type of motion • … • Significant progress being achieved 41

Telling What-Is-What in Video Gerard Medioni medioni@usc.edu 1 - PowerPoint PPT Presentation

Telling What-Is-What in Video Gerard Medioni medioni@usc.edu 1 Tracking Essential problem Establishes correspondences between elements in successive frames Basic problem easy 2 M any issues One target (pursuit) vs. A few

Aberdeen & Aberdeenshire: Telling the story Lorna Easton Adam Bates Telling the Story of

Video Games Written and Researched by: Patrick Kania First Video Game The first Video Game made

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 3/20/2019 NVIDIA Video Technologies Overview Turing

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 3/26/2018 NVIDIA Video Technologies Overview Video

Video Sur Video Sur rveillance, rveillance, , Video Analyti Video Analyti ics, and You.

Sharing Your Story Through Online Video SHARING YOUR STORY THROUGH VIDEO Agenda 1 The power of

Learning from Unlabeled Video Carl Vondrick Columbia University Survivor Bias of Video Data

Image and Video Coding: Introduction bitstream encoder decoder Motivation Image and Video

091031 091031 VIDEO SIGNALS VIDEO SIGNALS Lecturer: Marco Marcon 091032 - AUDIO AND VIDEO

7. Video databases Video data representations Video = time-ordered sequence of correlated

Image and Video Coding: Video Coding Extensions Screen Content Coding Screen Content Coding

HELPFUL TIPS WHEN MAKING A KICKSTARTER VIDEO KICKSTARTER VIDEO KICKSTARTER VIDEO KICKSTARTER

Estdio de Vdeo HD HD Video Studio Rui Ribeiro Rui Ribeiro FCCN 31 de Maro 2011 I FCCN Video

your sTory Telling Effective church newsletters By Rick Frennea editor of The Edgewood

LONGITUDINAL DATA AND FISCAL IMPACT TRANSPARENCY Transparency is telling all of the people all of

Good Mor Good Mor ning, Ki Or ning, Ki Or a a I am going to begin by telling you a little

Post-processing effects Morten Paluteder Post-processing effect? Render the scene as normal

SVD Applications CS322030 Mar 2009 gaussian 1.0; no noise gaussian 1.0; noise 0.005 motion

Towards Energy-Proportional Datacenter Memory with Mobile DRAM Krishna Malladi 1 Frank Nothaft 1

What Computer Should I Buy (and maintain) with $ XXX Context CIBR center at BCM, ~$100k/year

This Week ! Exposure ! The Art, Science and Algorithms Camera Basics ! Simple Math !

Augmented Reality Ubiquitous Computing Seminar Marc Fischer, 14.04.2015 FS2015, ETH Zurich

Camera Models I July 27, 1999 Motivational Film Card Trick July 27, 1999 Logistics

Decomposition by operator-splitting methods and applications in image deblurring Daniel

Sambuz

Useful Links

Newsletter

Mail Us