Learning Perceptual Causality from Video Amy Fire and Song-Chun Zhu - - PowerPoint PPT Presentation

learning perceptual causality
SMART_READER_LITE
LIVE PREVIEW

Learning Perceptual Causality from Video Amy Fire and Song-Chun Zhu - - PowerPoint PPT Presentation

Learning Perceptual Causality from Video Amy Fire and Song-Chun Zhu Center for Vision, Cognition, Learning, and Art UCLA 1 Ideally: Learn Causality from Raw Video 2 Inference Using Learned Causal Structure c) STC-Parsing a) Input: Video


slide-1
SLIDE 1

Learning Perceptual Causality from Video

Amy Fire and Song-Chun Zhu

Center for Vision, Cognition, Learning, and Art

UCLA

1

slide-2
SLIDE 2

Ideally: Learn Causality from Raw Video

2

slide-3
SLIDE 3

Inference Using Learned Causal Structure

  • Answer why events occurred
  • Joint STC: Infer misdetections and hidden objects/actions
  • Infer triggers, goals, and intents

3

a) Input: Video b) Event Parsing c) STC-Parsing d) Inference Over Time Agent Actions Fluents

Action Hidden Fluent Causal Link

UNKNOWN

Time

ON OFF THIRSTY NOT

Light Agent

Time Time

Drink Flip Switch

slide-4
SLIDE 4

But…

OBSERVATION CAUSALITY

(generally)

4

slide-5
SLIDE 5

SO…WHERE ARE WE NOW?

5

slide-6
SLIDE 6

Vision Research and Causal Knowledge

  • Use pre-specified causal relationships for action detection

– E.g., PADS (Albanese, et al. 2010) – Model Newtonian mechanics (Mann, Jepson, and Siskind 1997)

  • Use causal measures to aid action detection

– E.g., Prabhakar, et al. 2010

  • Use infant perceptions of motion to learn causality

– Using cognitive science (Brand 1997)

  • Needed: Learn causality from video, integrating ST learning

strategies at pixel level

6

slide-7
SLIDE 7

Causality and Video Data: Often Disjoint

  • Learning Bayesian networks

– Constraint satisfaction (Pearl 2009) – Bayesian formulations (Heckerman 1995) – Intractable on vision sensors

  • Commonsense reasoning (Mueller 2006) – first order logic.

– Do not allow for ambiguity/probabilistic solutions

  • MLNs (Richardson and Domingos 2006)

– Intractable – Used for action detection (Tran and Davis 2008)

  • KB formulas not learned

7

slide-8
SLIDE 8

MOVING FORWARD: OUR PROPOSED SOLUTION

8

slide-9
SLIDE 9

Cognitive Science as a Gateway: Perceptual Causality

  • Causal Induction from Observation in Infancy

– Agentive actions are causes (Saxe, Tenenbaum, and Carey 2005) – Co-occurrence of events and effects (Griffiths and Tenenbaum 2005) – Temporal lag between the two is short (Carey 2009) – Cause precedes effect (Carey 2009)

  • Note: NOT the same as necessary and sufficient causes

9

e-specified e-specified Measure co-occurrence cr : Action ¬Action Effect c0 c1 ¬Effect c2 c3 − ✏ − fluent classifiers. − λ λ ✓ · ◆ ’s amyfir .fir

slide-10
SLIDE 10

MODIFIED GOAL: LEARN AND INFER PERCEPTUAL CAUSALITY

10

slide-11
SLIDE 11

What are the effects? Fluent changes.

11

t

Door Opens Door Closed Inertially Door Closes Light On Inertially

Light

ON OFF

Door

OPEN CLOSED

Light Off Inertially Light Turns Off Door Open Inertially Light Turns On

  • Fluents are time-varying statuses of objects

– Mueller – Commonsense Reasoning 2006

slide-12
SLIDE 12

What are the causes? Actions.

  • Probabilistic Graphical Representation for Causality

– And-Or Graph

12

And-Nodes Compose Single Causes (multiple sub-actions)

A

Unlock Door Push Door

a a

Or-Nodes Give Alternative Causes

Open Other Side

A a

slide-13
SLIDE 13

Causal AOG

13

Light fluent

  • n
  • ff

a01 a01

OR OR AND AND

slide-14
SLIDE 14

Connecting Temporal to Causal and Spatial

14

fluent1

  • bject1

A

parent(A) children(A) terminates

  • bject2

R(A) fluent2 S-AOG fragment T-AOG fragment C-AOG fragment

f1 f2 O2 O1

slide-15
SLIDE 15

Grounding on Pixels: Connecting S/T/C-AOG

15

parent(O) children(O) fluents

f1

actions terminates grounding templates(O)

O

S-AOG fragment

f2 f3

A

A1 A2 A3 T-AOG fragment C-AOG fragment

T1(o)

Grounded

  • n Pixels
slide-16
SLIDE 16

LEARNING PERCEPTUAL CAUSALITY

Preliminary Theory

16

slide-17
SLIDE 17

Principled Approach: Information projection

DellaPietra, DellaPietra,Lafferty, 97 Zhu, Wu, Mumford, 97

17

True contingency table distribution Initial distribution Repeat

       

 

I E I E p

f p 2 2 2

: cr cr   

Select cause/effect relationship that makes p1 closest (KL) to p to preserve learning history while maximizing Info. Gain

KL f || p

( ) = KL f || p+ ( )+ KL p+ || p ( )

max KL f || p

( )- KL f || p+ ( )

é ë ù û= maxKL p+ || p

( )        

 

I E I E p

f p 1 1 1

: cr cr   

Match the statistics of the contingency table

slide-18
SLIDE 18

Learning Pursuit: Add Causal Relations

  • Model Pursuit
  • Proposition 1: Find parameters

– Model formed by min KL (p+ || p), matching statistics

  • Proposition 2: Pursue cr.

18

f p p p p p

k 

     

 

1

Ep+ cr

+

( ) = Ef cr

+

( )

   

 

   

  cr , exp 1  pg p z pg p

(On ST-AOG)

cr cr Effect

  • Effect

Action co c2

  • Action

c1 c3

l+,i = log hi h0 ×f0 fi æ è ç ö ø ÷

cr

+ = argmax cr

KL p+ || p

( ) = argmax

cr

KL f || h

( )

hi is ci under p fi is ci from f

slide-19
SLIDE 19

Selection from ST-AOG

2 1 2 1

A A A A A

f f f f f   

1

A

f

) ( F T 

A

A1 A2

2

A

f

1

A

f

) ( F T 

A

A1 A2

p

p  1

2 1

) 1 (

A A A

f p pf f   

Or-node And-node

slide-20
SLIDE 20

Performance vs. TE

20

slide-21
SLIDE 21

Performance vs. Hellinger χ2

21

slide-22
SLIDE 22

Increasing Misdetections

(Simulation)

22

0% misdetection 10% misdetection 20% misdetection

slide-23
SLIDE 23

STC-Parsing Demo

23

slide-24
SLIDE 24

Looking Forward:

  • Finish learning the C-AOG
  • Increase reasoning capacity of the C-AOG
  • Integrate experiment design

24