[PPT] - the role of mobility and control in the inference of PowerPoint Presentation

SLIDE 1

the role of mobility and control in the inference of representations

stefano soatto ucla

1

t. lee, a. ayvaci, j. dong, d. davis, j. balzer, j. hernandez, l. valente

1 Saturday, November 30, 13

SLIDE 2

what is a “representation”? why do we need it? what does control have to do with it?

keywords: data processing inequality, information bottleneck, lambert-ambient model, sufficient excitation, actionable information gap, active sensing/ perception

2

2 Saturday, November 30, 13

SLIDE 3

3

data

3 Saturday, November 30, 13

SLIDE 4

3

data

yt . = {y0, . . . , yt}

3 Saturday, November 30, 13

SLIDE 5

3

data

yt . = {y0, . . . , yt}

3 Saturday, November 30, 13

SLIDE 6

3

data task

yt . = {y0, . . . , yt}

3 Saturday, November 30, 13

SLIDE 7

3

data task

yt . = {y0, . . . , yt} ξ

3 Saturday, November 30, 13

SLIDE 8

3

data task

yt . = {y0, . . . , yt} ξ ?

3 Saturday, November 30, 13

SLIDE 9

3

data task “representation”?

yt . = {y0, . . . , yt} ξ ?

3 Saturday, November 30, 13

SLIDE 10

3

data task “representation”?

yt . = {y0, . . . , yt} ξ ˆ ξ = φ(yt)

?

3 Saturday, November 30, 13

SLIDE 11

3

data task “representation”?

yt . = {y0, . . . , yt} ξ ˆ ξ = φ(yt)

?

3 Saturday, November 30, 13

SLIDE 12

“representation”

4

yt . = {y0, . . . , yt}

4 Saturday, November 30, 13

SLIDE 13

“representation”

4

I(ξ; yt) ≥ I(ξ; φ(yt)) R(ut|yt) ≤ R(ut|φ(yt)) yt . = {y0, . . . , yt}

4 Saturday, November 30, 13

SLIDE 14

“representation”

4

R(ut|yt) ≤ R(ut|φ(yt)) yt . = {y0, . . . , yt} I(ξ; yt) = I(ξ; φ(yt) | {z }

ˆ ξ

)

4 Saturday, November 30, 13

SLIDE 15

“representation”

4

R(ut|yt) ≤ R(ut|φ(yt)) yt . = {y0, . . . , yt} I(ξ; yt) = I(ξ; φ(yt) | {z }

ˆ ξ

)

sufficient statistics [r. fisher]

4 Saturday, November 30, 13

SLIDE 16

“representation”

4

R(ut|yt) ≤ R(ut|φ(yt)) yt . = {y0, . . . , yt} I(ξ; yt) = I(ξ; φ(yt) | {z }

ˆ ξ

) min I(ξ; yt) − I(ξ; φ(yt) | {z }

ˆ ξ

) + βH(ˆ ξ)

4 Saturday, November 30, 13

SLIDE 17

“representation”

4

R(ut|yt) ≤ R(ut|φ(yt)) yt . = {y0, . . . , yt} I(ξ; yt) = I(ξ; φ(yt) | {z }

ˆ ξ

) min I(yt, ˆ ξ) − βI(ˆ ξ; ξ)

4 Saturday, November 30, 13

SLIDE 18

“representation”

4

R(ut|yt) ≤ R(ut|φ(yt)) yt . = {y0, . . . , yt} I(ξ; yt) = I(ξ; φ(yt) | {z }

ˆ ξ

) min I(yt, ˆ ξ) − βI(ˆ ξ; ξ)

information bottleneck [n. tishby]

4 Saturday, November 30, 13

SLIDE 19

“representation”

4

R(ut|yt) ≤ R(ut|φ(yt)) yt . = {y0, . . . , yt} I(ξ; yt) = I(ξ; φ(yt) | {z }

ˆ ξ

) min I(yt, ˆ ξ) − βI(ˆ ξ; ξ)

4 Saturday, November 30, 13

SLIDE 20

“representation”

4

R(ut|yt) ≤ R(ut|φ(yt)) yt . = {y0, . . . , yt} I(ξ; yt) = I(ξ; φ(yt) | {z }

ˆ ξ

) min I(yt, ˆ ξ) − βI(ˆ ξ; ξ) min H(y∞

t |ˆ

ξ) + 1 β H(ˆ ξ)

4 Saturday, November 30, 13

SLIDE 21

“representation”

4

R(ut|yt) ≤ R(ut|φ(yt)) yt . = {y0, . . . , yt} I(ξ; yt) = I(ξ; φ(yt) | {z }

ˆ ξ

) min I(yt, ˆ ξ) − βI(ˆ ξ; ξ) min H(y∞

t |ˆ

ξ) + 1 β H(ˆ ξ)

actionable information

4 Saturday, November 30, 13

SLIDE 22

“representation”

4

R(ut|yt) ≤ R(ut|φ(yt)) yt . = {y0, . . . , yt} I(ξ; yt) = I(ξ; φ(yt) | {z }

ˆ ξ

) min I(yt, ˆ ξ) − βI(ˆ ξ; ξ) min H(y∞

t |ˆ

ξ) + 1 β H(ˆ ξ)

representation = “state”

4 Saturday, November 30, 13

SLIDE 23

“representation”

4

R(ut|yt) ≤ R(ut|φ(yt)) yt . = {y0, . . . , yt} I(ξ; yt) = I(ξ; φ(yt) | {z }

ˆ ξ

) min I(yt, ˆ ξ) − βI(ˆ ξ; ξ) min H(y∞

t |ˆ

ξ) + 1 β H(ˆ ξ)

function of the past that best predicts future (nuisance- invariants of the) data given available resources

4 Saturday, November 30, 13

SLIDE 24

“representation”

4

R(ut|yt) ≤ R(ut|φ(yt)) I(ξ; yt) = I(ξ; φ(yt) | {z }

ˆ ξ

) min I(yt, ˆ ξ) − βI(ˆ ξ; ξ) min H(y∞

t |ˆ

ξ) + 1 β H(ˆ ξ)

function of the past that best predicts future (nuisance- invariants of the) data given available resources

4 Saturday, November 30, 13

SLIDE 25

“the past”

5

φ

5 Saturday, November 30, 13

SLIDE 26

“the past”

phylogenic data aggregation is encoded in the structure of (nuisances, invariances, policies, tradeoffs, tasks);

5

φ

5 Saturday, November 30, 13

SLIDE 27

“the past”

phylogenic data aggregation is encoded in the structure of (nuisances, invariances, policies, tradeoffs, tasks);

ntogenic data aggregation is continuously

integrated into the representation

5

φ

ˆ ξ

5 Saturday, November 30, 13

SLIDE 28

“nuisances”

account for almost all uncertainty/variability in visual data

some can be removed from the data at the outset:

lossless: canonization (e.g., contrast, planar isometries)* co-variant detection/invariant description

thers have to be sampled/marginalized:

e.g., scale, class-specific deformation

ther have to be “discovered”

e.g., occlusion “sufficient exploration”

instantiate for a specific data-formation model (LA Model)

6

6 Saturday, November 30, 13

SLIDE 29

7

SENSING ACTION

REPRESENTATION CONTROL INFERENCE SENSORS NUISANCES CANONIZATION SCENE NUISANCES UNMODELED PHENOMENA TASK QUERIES

EO

IR MS IMU LIDR ...

..

ξ ν g yt

φ∧(yt)

ˆ g(u)

ˆ ν

h(ˆ gˆ ξ, ˆ ν)

ut max

u

H(yt+1|ˆ ξt, u)

min

p(ˆ ξ|yt)

H(yt+1|ˆ ξt)

INNOVATION

n ˆ ξt

7 Saturday, November 30, 13

SLIDE 30

7

SENSING ACTION

REPRESENTATION CONTROL INFERENCE SENSORS NUISANCES CANONIZATION SCENE NUISANCES UNMODELED PHENOMENA TASK QUERIES

EO

IR MS IMU LIDR ...

..

ξ ν g yt

φ∧(yt)

ˆ g(u)

ˆ ν

h(ˆ gˆ ξ, ˆ ν)

ut max

u

H(yt+1|ˆ ξt, u)

min

p(ˆ ξ|yt)

H(yt+1|ˆ ξt)

INNOVATION

n ˆ ξt

INFORMATION BOTTLENECK

7 Saturday, November 30, 13

SLIDE 31

7

SENSING ACTION

REPRESENTATION CONTROL INFERENCE SENSORS NUISANCES CANONIZATION SCENE NUISANCES UNMODELED PHENOMENA TASK QUERIES

EO

IR MS IMU LIDR ...

..

ξ ν g yt

φ∧(yt)

ˆ g(u)

ˆ ν

h(ˆ gˆ ξ, ˆ ν)

ut max

u

H(yt+1|ˆ ξt, u)

min

p(ˆ ξ|yt)

H(yt+1|ˆ ξt)

INNOVATION

n ˆ ξt

INFORMATION BOTTLENECK ACTIONABLE INFORMATION INCREMENT

7 Saturday, November 30, 13

SLIDE 32

the LA model (lambert-ambient)

8

p

S ⊂ R3

ρ : S → R+

p 7! ρ(p)

x ¯ x

D ⊂ R2

It : D → R+

gt ∈ SE(3)

( yt(x) = κt(ρ(p)) + nt(x) ∈ R+ x = π(gtp), p ∈ S ⊂ R3

8 Saturday, November 30, 13

SLIDE 33

the LA model (lambert-ambient)

8

p

S ⊂ R3

ρ : S → R+

p 7! ρ(p)

x ¯ x

D ⊂ R2

It : D → R+

gt ∈ SE(3)

ξ = {ρ, S} ( yt(x) = κt(ρ(p)) + nt(x) ∈ R+ x = π(gtp), p ∈ S ⊂ R3

8 Saturday, November 30, 13

SLIDE 34

the LA model (lambert-ambient)

8

p

S ⊂ R3

ρ : S → R+

p 7! ρ(p)

x ¯ x

D ⊂ R2

It : D → R+

gt ∈ SE(3)

ξ = {ρ, S} gt ∈ SE(3) ( yt(x) = κt(ρ(p)) + nt(x) ∈ R+ x = π(gtp), p ∈ S ⊂ R3

8 Saturday, November 30, 13

SLIDE 35

the LA model (lambert-ambient)

8

p

S ⊂ R3

ρ : S → R+

p 7! ρ(p)

x ¯ x

D ⊂ R2

It : D → R+

gt ∈ SE(3)

ξ = {ρ, S} gt ∈ SE(3) ( yt(x) = κt(ρ(p)) + nt(x) ∈ R+ x = π(gtp), p ∈ S ⊂ R3 κt : R+ → R+

8 Saturday, November 30, 13

SLIDE 36

the LA model (lambert-ambient)

8

p

S ⊂ R3

ρ : S → R+

p 7! ρ(p)

x ¯ x

D ⊂ R2

It : D → R+

gt ∈ SE(3)

ξ = {ρ, S} gt ∈ SE(3) νt = {nt, π} ( yt(x) = κt(ρ(p)) + nt(x) ∈ R+ x = π(gtp), p ∈ S ⊂ R3 κt : R+ → R+

8 Saturday, November 30, 13

SLIDE 37

the LA model (lambert-ambient)

9

p

S ⊂ R3

ρ : S → R+

p 7! ρ(p)

x ¯ x

D ⊂ R2

It : D → R+

gt ∈ SE(3)

9 Saturday, November 30, 13

SLIDE 38

the LA model (lambert-ambient)

9

p

S ⊂ R3

ρ : S → R+

p 7! ρ(p)

x ¯ x

D ⊂ R2

It : D → R+

gt ∈ SE(3)

yt = h(gt, ξ, νt) + nt

9 Saturday, November 30, 13

SLIDE 39

complete representation

given one or more images, , a representation

yt

is a statistic such that

ˆ ξt ˆ ξt = φ(yt)

10

10 Saturday, November 30, 13

SLIDE 40

complete representation

given one or more images, , a representation

yt

is a statistic such that

ˆ ξt ˆ ξt = φ(yt) φ∧(yt) = {h(e, ˆ ξt, νt), e ∈ G, νt ∈ V} . = L(ˆ ξt)

10

10 Saturday, November 30, 13

SLIDE 41

complete representation

i.e., a statistic from which the (maximal invariant of the) images can be “hallucinated” up to an “uninformative” residual

given one or more images, , a representation

yt

is a statistic such that

ˆ ξt ˆ ξt = φ(yt) φ∧(yt) = {h(e, ˆ ξt, νt), e ∈ G, νt ∈ V} . = L(ˆ ξt)

10

10 Saturday, November 30, 13

SLIDE 42

complete representation

i.e., a statistic from which the (maximal invariant of the) images can be “hallucinated” up to an “uninformative” residual L(ˆ ξ) = L(ξ)

given one or more images, , a representation

yt

is a statistic such that

ˆ ξt ˆ ξt = φ(yt) φ∧(yt) = {h(e, ˆ ξt, νt), e ∈ G, νt ∈ V} . = L(ˆ ξt)

10

10 Saturday, November 30, 13

SLIDE 43

complete representation

i.e., a statistic from which the (maximal invariant of the) images can be “hallucinated” up to an “uninformative” residual

complete representation minimal complete representation

L(ˆ ξ) = L(ξ)

given one or more images, , a representation

yt

is a statistic such that

ˆ ξt ˆ ξt = φ(yt) φ∧(yt) = {h(e, ˆ ξt, νt), e ∈ G, νt ∈ V} . = L(ˆ ξt)

10

10 Saturday, November 30, 13

SLIDE 44

complete representation

i.e., a statistic from which the (maximal invariant of the) images can be “hallucinated” up to an “uninformative” residual

complete representation minimal complete representation

L(ˆ ξ) = L(ξ) “light field”

given one or more images, , a representation

yt

is a statistic such that

ˆ ξt ˆ ξt = φ(yt) φ∧(yt) = {h(e, ˆ ξt, νt), e ∈ G, νt ∈ V} . = L(ˆ ξt)

10

10 Saturday, November 30, 13

SLIDE 45

actionable information: uncertainty of the maximal invariant (can be computed from finite data) complete information: uncertainty of a minimal sufficient statistic of a (complete) representation actionable information gap (AIG)

information gap

11

I . = H(φ∨(ˆ ξ)) G(y) = I − H(y) H(y) . = H(φ∧(y))

11 Saturday, November 30, 13

SLIDE 46

nuisances (aside)

“visual recognition is difficult in part because of the large variability that images of a particular object exhibit depending

n extrinsic factors such as vantage point, illumination

conditions, occlusions and other visibility artifacts.” theorem: for viewpoint and illumination (contrast) nuisances, the quotient (“attributed reeb tree”) is supported on a thin set

12 sundaramoorthi, petersen, varadarajan, soatto, “on the set of images modulo viewpoint and contrast changes”, cvpr 2009

12 Saturday, November 30, 13

SLIDE 47

nuisances (aside)

“visual recognition is difficult in part because of the large variability that images of a particular object exhibit depending

n extrinsic factors such as vantage point, illumination

conditions, occlusions and other visibility artifacts.” theorem: for viewpoint and illumination (contrast) nuisances, the quotient (“attributed reeb tree”) is supported on a thin set

12 sundaramoorthi, petersen, varadarajan, soatto, “on the set of images modulo viewpoint and contrast changes”, cvpr 2009

{I}/W(R2 → R2) × H(R+ → R+) = ART

12 Saturday, November 30, 13

SLIDE 48

nuisances (aside)

“visual recognition is difficult in part because of the large variability that images of a particular object exhibit depending

n extrinsic factors such as vantage point, illumination

conditions, occlusions and other visibility artifacts.” theorem: for viewpoint and illumination (contrast) nuisances, the quotient (“attributed reeb tree”) is supported on a thin set

12 sundaramoorthi, petersen, varadarajan, soatto, “on the set of images modulo viewpoint and contrast changes”, cvpr 2009

ˆ ξt = φ(yt) {I}/W(R2 → R2) × H(R+ → R+) = ART

12 Saturday, November 30, 13

SLIDE 49

nuisances (aside)

“visual recognition is difficult in part because of the large variability that images of a particular object exhibit depending

n extrinsic factors such as vantage point, illumination

conditions, occlusions and other visibility artifacts.” theorem: for viewpoint and illumination (contrast) nuisances, the quotient (“attributed reeb tree”) is supported on a thin set

12 sundaramoorthi, petersen, varadarajan, soatto, “on the set of images modulo viewpoint and contrast changes”, cvpr 2009

ˆ ξt = φ(yt) {I}/W(R2 → R2) × H(R+ → R+) = ART

from signals to symbols (canonization), without loss of information, but ... gap!

12 Saturday, November 30, 13

SLIDE 50

(non)invertible nuisances

actionable information gap non-zero

13

13 Saturday, November 30, 13

SLIDE 51

(non)invertible nuisances

actionable information gap non-zero invertibility depends on the sensing process: control authority

13

13 Saturday, November 30, 13

SLIDE 52

(non)invertible nuisances

actionable information gap non-zero invertibility depends on the sensing process: control authority

j. j. gibson: “the occluded becomes unoccluded” in the

process of “information pickup”

13

13 Saturday, November 30, 13

SLIDE 53

two questions:

1.how to build the best possible representation given past data

canonization: co-variant detection/invariant description
sparsity, occlusion detection, detachable objects
structural stability

2.how to gather future data to make the representation (as close as possible to) complete

mapping/exploration, navigation
sufficient exploration/sufficient excitation

14

14 Saturday, November 30, 13

SLIDE 54

cclusion(detec,on

(1)(Lamber,an(reflec,on (2)(Constant(illumina,on (3)(Co9visibility

small dense large sparse nesterov/split9bregman( w/(weighted(isotropic(TV(for(w

15

15 Saturday, November 30, 13

SLIDE 55

ˆ c = arg min

c

Z

D

|c(x) − c(y)|dµ(x, y)

s. t. c(x) < c(y), x ∈ Ω, y ∈ Ωc

Detachable(Object(Detec,on

16

(

16 Saturday, November 30, 13

SLIDE 56

ˆ c = arg min

c

Z

D

|c(x) − c(y)|dµ(x, y)

s. t. c(x) < c(y), x ∈ Ω, y ∈ Ωc

dµ(x, y) = K(x, y)dxdy

Detachable(Object(Detec,on

16

(

16 Saturday, November 30, 13

SLIDE 57

ˆ c = arg min

c

Z

D

|c(x) − c(y)|dµ(x, y)

s. t. c(x) < c(y), x ∈ Ω, y ∈ Ωc

dµ(x, y) = K(x, y)dxdy

Detachable(Object(Detec,on

16

(

K(x, y) = ( e(It(x)It(y))2 + ⇥ekvt(x)vt(y)k2

2

0,

therwise;

kx yk2 < ⇤,

16 Saturday, November 30, 13

SLIDE 58

ˆ c = arg min

c

Z

D

|c(x) − c(y)|dµ(x, y)

s. t. c(x) < c(y), x ∈ Ω, y ∈ Ωc

dµ(x, y) = K(x, y)dxdy

v(x) . = w(x) − x

Detachable(Object(Detec,on

16

(

K(x, y) = ( e(It(x)It(y))2 + ⇥ekvt(x)vt(y)k2

2

0,

therwise;

kx yk2 < ⇤,

16 Saturday, November 30, 13

SLIDE 59

17

17 Saturday, November 30, 13

SLIDE 60

18

18 Saturday, November 30, 13

SLIDE 61

19

SENSING ACTION

REPRESENTATION CONTROL INFERENCE SENSORS NUISANCES CANONIZATION SCENE NUISANCES UNMODELED PHENOMENA TASK QUERIES

EO

IR MS IMU LIDR ...

..

ξ ν g yt

φ∧(yt)

ˆ g(u)

ˆ ν

h(ˆ gˆ ξ, ˆ ν)

ut max

u

H(yt+1|ˆ ξt, u)

min

p(ˆ ξ|yt)

H(yt+1|ˆ ξt)

INNOVATION

n ˆ ξt

19 Saturday, November 30, 13

SLIDE 62

information pickup

20

ˆ ut = arg max

u

min

ˆ ξt=φ(yt)

H(yt+1|ˆ ξt, u) + λH(ˆ ξt)

20 Saturday, November 30, 13

SLIDE 63

21

21 Saturday, November 30, 13

SLIDE 64

II. Exploratory path planning of

unknown environments

φ = φ(·, γ) In the rest,

22 Saturday, November 30, 13

SLIDE 65

bounds

for finite complexity environment

(bounded, with finite number of simply- connected obstacles with complexity )

for a given exploration residual (unseen

area)

other technical conditions
“sufficient exploration” in finite steps:

23

M Q ✏ O(MQ) + O( 1 ✏2 )

23 Saturday, November 30, 13

SLIDE 66

control/recognition bounds

v. karasev, nips 2012

24 Saturday, November 30, 13

SLIDE 67

1 E x p e r i m e n t

1 . 1 R i s k a n d s e n s

r

p a r a m e t e r s

i t h 5 , 1 , 3 c l u t t e r

b

j e c t s ( l

w

, m e d i u m , h i g h c

m

p l e x i t y ) . 25 Saturday, November 30, 13

SLIDE 68

summary

26

26 Saturday, November 30, 13

SLIDE 69

summary

data processing inequality/information bottleneck: key

to representation

26

26 Saturday, November 30, 13

SLIDE 70

summary

data processing inequality/information bottleneck: key

to representation

canonization: lossless pre-processing, constructive

procedure for maximal invariants (ART, TST/BTD) [cvpr ’09, imavis ’11]

26

26 Saturday, November 30, 13

SLIDE 71

summary

data processing inequality/information bottleneck: key

to representation

canonization: lossless pre-processing, constructive

procedure for maximal invariants (ART, TST/BTD) [cvpr ’09, imavis ’11]

non-group nuisances (occlusion, scaling) cause info gap

[iccv ’09, arxiv ’11]

26

26 Saturday, November 30, 13

SLIDE 72

summary

data processing inequality/information bottleneck: key

to representation

canonization: lossless pre-processing, constructive

procedure for maximal invariants (ART, TST/BTD) [cvpr ’09, imavis ’11]

non-group nuisances (occlusion, scaling) cause info gap

[iccv ’09, arxiv ’11]

occlusion detection [ijcv ’11], detachable object

detection [pami ’12]

26

26 Saturday, November 30, 13

SLIDE 73

summary

data processing inequality/information bottleneck: key

to representation

canonization: lossless pre-processing, constructive

procedure for maximal invariants (ART, TST/BTD) [cvpr ’09, imavis ’11]

non-group nuisances (occlusion, scaling) cause info gap

[iccv ’09, arxiv ’11]

occlusion detection [ijcv ’11], detachable object

detection [pami ’12]

visual exploration [ijrr ’10, ciss ’12]

26

26 Saturday, November 30, 13

SLIDE 74

references

s. soatto, “steps towards a theory of visual information”, http://arxiv.org/abs/1110.2053 and ucla-

csd100028, september 13, 2010 (also on video lecture, nips tutorial 2010; also icvss lecture notes, ‘07-’09)

s. soatto, “actionable information in vision”, iccv ’09, long version in “machine learning for computer

vision”, r. cipolla et al. (eds.), 2011

g. sundaramoorthi et al. “on the set of images modulo viewpoint and contrast changes”, cvpr ’09
k. wnuk and s. soatto, “multiple-instance filtering”, nips ’11
a. ayvaci and s. soatto, “detachable object detection”, PAMI ’12
t. lee and s. soatto, “video-based descriptors for object recognition”, IMAVIS ’11
a. ayvaci et al. “sparse occlusion detection with optical flow”, IJCV ’11
e. jones and s. soatto, “visual-inertial navigation, localization and mapping”, IJRR ’11
m. raptis and s. soatto, “tracklet descriptors for action modeling and video analysis”, ECCV ’10

27

27 Saturday, November 30, 13