On the Limitations of Representing Functions on Sets Edward - - PowerPoint PPT Presentation

on the limitations of representing functions on sets
SMART_READER_LITE
LIVE PREVIEW

On the Limitations of Representing Functions on Sets Edward - - PowerPoint PPT Presentation

On the Limitations of Representing Functions on Sets Edward Wagstaff*, Fabian Fuchs*, Martin Engelcke* Ingmar Posner, Michael Osborne M achine L earning R esearch G roup *Equal contribution Examples for Permutation Invariant Problems:


slide-1
SLIDE 1

On the Limitations of Representing Functions on Sets

Edward Wagstaff*, Fabian Fuchs*, Martin Engelcke* Ingmar Posner, Michael Osborne

*Equal contribution

Machine Learning Research Group

slide-2
SLIDE 2

Examples for Permutation Invariant Problems:
 Detecting Common Attributes

Smiling Blond Hair

CelebA Dataset, Liu et al.

slide-3
SLIDE 3

Input

The deep sets architecture

slide-4
SLIDE 4

Input

The deep sets architecture

ϕ

slide-5
SLIDE 5

Input Latent A

The deep sets architecture

ϕ

slide-6
SLIDE 6

+

Input Latent A

The deep sets architecture

ϕ

slide-7
SLIDE 7

+

Input Latent A Latent B

The deep sets architecture

ϕ

slide-8
SLIDE 8

+

Input Latent A Latent B

The deep sets architecture

ρ ϕ

slide-9
SLIDE 9

+

Input Output Latent A Latent B

The deep sets architecture

ρ ϕ

slide-10
SLIDE 10

+

ρ ϕ

X ⊂ ℝM ℝNxM ℝN ℝ

Input Output

x1 xM

Y

f(x1, …, xM)

ϕ(x1) ϕ(xM)

slide-11
SLIDE 11

+

ρ ϕ

X ⊂ ℝM ℝNxM ℝN ℝ

Input Output

x1 xM

Y

f(x1, …, xM)

ϕ(x1) ϕ(xM)

Theorem 1 (Zaheer et al.): This architecture can successfully model any permutation invariant function, even for latent dimension N=1.

slide-12
SLIDE 12

+

ρ ϕ

X ⊂ ℝM ℝNxM ℝN ℝ

Input Output

x1 xM

Y

f(x1, …, xM)

ϕ(x1) ϕ(xM)

Theorem 1 (Zaheer et al.): This architecture can successfully model any permutation invariant function, even for latent dimension N=1.

Proof

slide-13
SLIDE 13

+

ρ ϕ

X ⊂ ℝM ℝNxM ℝN ℝ

Input Output

x1 xM

Y

f(x1, …, xM)

ϕ(x1) ϕ(xM)

Theorem 1 (Zaheer et al.): This architecture can successfully model any permutation invariant function, even for latent dimension N=1.

Proof

Assume that neural networks Φ and ρ are universal function approximators

slide-14
SLIDE 14

+

ρ ϕ

X ⊂ ℝM ℝNxM ℝN ℝ

Input Output

x1 xM

Y

f(x1, …, xM)

ϕ(x1) ϕ(xM)

Theorem 1 (Zaheer et al.): This architecture can successfully model any permutation invariant function, even for latent dimension N=1.

Proof

Assume that neural networks Φ and ρ are universal function approximators Find a Φ such that mapping from input set X to latent representation Y is injective

&

slide-15
SLIDE 15

+

ρ ϕ

X ⊂ ℝM ℝNxM ℝN ℝ

Input Output

x1 xM

Y

f(x1, …, xM)

ϕ(x1) ϕ(xM)

Theorem 1 (Zaheer et al.): This architecture can successfully model any permutation invariant function, even for latent dimension N=1.

Proof

Assume that neural networks Φ and ρ are universal function approximators Find a Φ such that mapping from input set X to latent representation Y is injective

&

Everything can be modelled

slide-16
SLIDE 16

+

ρ ϕ

X ⊂ ℝM ℝNxM ℝN ℝ

Input Output

x1 xM

Y

f(x1, …, xM)

ϕ(x1) ϕ(xM)

Theorem 1 (Zaheer et al.): This architecture can successfully model any permutation invariant function, even for latent dimension N=1.

Proof

Assume that neural networks Φ and ρ are universal function approximators Find a Φ such that mapping from input set X to latent representation Y is injective

&

Everything can be modelled

define c(x) : ℚ → ℕ

slide-17
SLIDE 17

+

ρ ϕ

X ⊂ ℝM ℝNxM ℝN ℝ

Input Output

x1 xM

Y

f(x1, …, xM)

ϕ(x1) ϕ(xM)

Theorem 1 (Zaheer et al.): This architecture can successfully model any permutation invariant function, even for latent dimension N=1.

Proof

Assume that neural networks Φ and ρ are universal function approximators Find a Φ such that mapping from input set X to latent representation Y is injective

&

Everything can be modelled

define c(x) : ℚ → ℕ then define ϕ(x) = 2c(x)

slide-18
SLIDE 18

Role of Continuity

We need to take real numbers into account!

slide-19
SLIDE 19

+

ρ ϕ

X ⊂ ℝM ℝNxM ℝN ℝ

Input Output

x1 xM

Y

f(x1, …, xM)

ϕ(x1) ϕ(xM)

slide-20
SLIDE 20

+

ρ ϕ

X ⊂ ℝM ℝNxM ℝN ℝ

Input Output

x1 xM

Y

f(x1, …, xM)

ϕ(x1) ϕ(xM)

Theorem 2: If we want to model all permutation invariant functions, it is suf<icient and necessary that the latent dimension N is at least as large as the maximum input set size M.

slide-21
SLIDE 21

+

ρ ϕ

X ⊂ ℝM ℝNxM ℝN ℝ

Input Output

x1 xM

Y

f(x1, …, xM)

ϕ(x1) ϕ(xM)

Theorem 2: If we want to model all permutation invariant functions, it is suf<icient and necessary that the latent dimension N is at least as large as the maximum input set size M.

Sketch of Proof for 
 Necessity

slide-22
SLIDE 22

+

ρ ϕ

X ⊂ ℝM ℝNxM ℝN ℝ

Input Output

x1 xM

Y

f(x1, …, xM)

ϕ(x1) ϕ(xM)

Theorem 2: If we want to model all permutation invariant functions, it is suf<icient and necessary that the latent dimension N is at least as large as the maximum input set size M.

Sketch of Proof for 
 Necessity

To prove necessity, we

  • nly need one function

which can’t be decomposed with N<M. We pick max(X).

slide-23
SLIDE 23

+

ρ ϕ

X ⊂ ℝM ℝNxM ℝN ℝ

Input Output

x1 xM

Y

f(x1, …, xM)

ϕ(x1) ϕ(xM)

Theorem 2: If we want to model all permutation invariant functions, it is suf<icient and necessary that the latent dimension N is at least as large as the maximum input set size M.

Sketch of Proof for 
 Necessity

To prove necessity, we

  • nly need one function

which can’t be decomposed with N<M. We pick max(X). We show that, in order to represent max(X), needs to be injective

Φ(X) = ∑

x

ϕ(x)

slide-24
SLIDE 24

+

ρ ϕ

X ⊂ ℝM ℝNxM ℝN ℝ

Input Output

x1 xM

Y

f(x1, …, xM)

ϕ(x1) ϕ(xM)

Theorem 2: If we want to model all permutation invariant functions, it is suf<icient and necessary that the latent dimension N is at least as large as the maximum input set size M.

Sketch of Proof for 
 Necessity

To prove necessity, we

  • nly need one function

which can’t be decomposed with N<M. We pick max(X). We show that, in order to represent max(X), needs to be injective This is not possible with N<M

Φ(X) = ∑

x

ϕ(x)

slide-25
SLIDE 25

Illustrative Example: Regressing to the Median

{0.1, 0.6, − 0.32, 1.61, 0.5, 0.67, 0.3}

slide-26
SLIDE 26

Illustrative Example: Regressing to the Median

{0.1, 0.6, − 0.32, 1.61, 0.5, 0.67, 0.3}

slide-27
SLIDE 27

100 101 102 103

N (latent dim)

10−2 10−1 100

RMSE 15 30 60 100 200 300 400 500

100 200 300 400 500 600

input size M

20 40 60 80 100

critical latent dim Nc

Illustrative Example: Regressing to the Median

slide-28
SLIDE 28

Thank You