On the Limitations of Representing Functions on Sets Edward - - PowerPoint PPT Presentation

▶

Aug 04, 2023 240 likes •523 views

On the Limitations of Representing Functions on Sets Edward Wagstaff*, Fabian Fuchs*, Martin Engelcke* Ingmar Posner, Michael Osborne M achine L earning R esearch G roup *Equal contribution Examples for Permutation Invariant Problems:

SLIDE 1

On the Limitations of Representing Functions on Sets

Edward Wagstaff*, Fabian Fuchs*, Martin Engelcke* Ingmar Posner, Michael Osborne

*Equal contribution

Machine Learning Research Group

SLIDE 2

Examples for Permutation Invariant Problems:  Detecting Common Attributes

Smiling Blond Hair

CelebA Dataset, Liu et al.

SLIDE 3

Input

The deep sets architecture

SLIDE 4

Input

The deep sets architecture

ϕ

+

ρ ϕ

X ⊂ ℝM ℝNxM ℝN ℝ

Input Output

x1 xM

Y

f(x1, …, xM)

ϕ(x1) ϕ(xM)

SLIDE 11

+

ρ ϕ

X ⊂ ℝM ℝNxM ℝN ℝ

Input Output

x1 xM

Y

f(x1, …, xM)

ϕ(x1) ϕ(xM)

Theorem 1 (Zaheer et al.): This architecture can successfully model any permutation invariant function, even for latent dimension N=1.

SLIDE 12

+

ρ ϕ

X ⊂ ℝM ℝNxM ℝN ℝ

Input Output

x1 xM

Y

f(x1, …, xM)

ϕ(x1) ϕ(xM)

Theorem 1 (Zaheer et al.): This architecture can successfully model any permutation invariant function, even for latent dimension N=1.

Proof

SLIDE 13

+

ρ ϕ

X ⊂ ℝM ℝNxM ℝN ℝ

Input Output

x1 xM

Y

f(x1, …, xM)

ϕ(x1) ϕ(xM)

Theorem 1 (Zaheer et al.): This architecture can successfully model any permutation invariant function, even for latent dimension N=1.

Proof

Assume that neural networks Φ and ρ are universal function approximators

SLIDE 14

+

ρ ϕ

X ⊂ ℝM ℝNxM ℝN ℝ

Input Output

x1 xM

Y

f(x1, …, xM)

ϕ(x1) ϕ(xM)

Theorem 1 (Zaheer et al.): This architecture can successfully model any permutation invariant function, even for latent dimension N=1.

Proof

Assume that neural networks Φ and ρ are universal function approximators Find a Φ such that mapping from input set X to latent representation Y is injective

&

SLIDE 15

+

ρ ϕ

X ⊂ ℝM ℝNxM ℝN ℝ

Input Output

x1 xM

Y

f(x1, …, xM)

ϕ(x1) ϕ(xM)

Theorem 1 (Zaheer et al.): This architecture can successfully model any permutation invariant function, even for latent dimension N=1.

Proof

Assume that neural networks Φ and ρ are universal function approximators Find a Φ such that mapping from input set X to latent representation Y is injective

&

Everything can be modelled

SLIDE 16

+

ρ ϕ

X ⊂ ℝM ℝNxM ℝN ℝ

Input Output

x1 xM

Y

f(x1, …, xM)

ϕ(x1) ϕ(xM)

Theorem 1 (Zaheer et al.): This architecture can successfully model any permutation invariant function, even for latent dimension N=1.

Proof

Assume that neural networks Φ and ρ are universal function approximators Find a Φ such that mapping from input set X to latent representation Y is injective

&

Everything can be modelled

define c(x) : ℚ → ℕ

SLIDE 17

+

ρ ϕ

X ⊂ ℝM ℝNxM ℝN ℝ

Input Output

x1 xM

Y

f(x1, …, xM)

ϕ(x1) ϕ(xM)

Theorem 1 (Zaheer et al.): This architecture can successfully model any permutation invariant function, even for latent dimension N=1.

Proof

Assume that neural networks Φ and ρ are universal function approximators Find a Φ such that mapping from input set X to latent representation Y is injective

&

Everything can be modelled

define c(x) : ℚ → ℕ then define ϕ(x) = 2c(x)

SLIDE 18

Role of Continuity

We need to take real numbers into account!

SLIDE 19

+

ρ ϕ

X ⊂ ℝM ℝNxM ℝN ℝ

Input Output

x1 xM

Y

f(x1, …, xM)

ϕ(x1) ϕ(xM)

SLIDE 20

+

ρ ϕ

X ⊂ ℝM ℝNxM ℝN ℝ

Input Output

x1 xM

Y

f(x1, …, xM)

ϕ(x1) ϕ(xM)

Theorem 2: If we want to model all permutation invariant functions, it is suf<icient and necessary that the latent dimension N is at least as large as the maximum input set size M.

SLIDE 21

+

ρ ϕ

X ⊂ ℝM ℝNxM ℝN ℝ

Input Output

x1 xM

Y

f(x1, …, xM)

ϕ(x1) ϕ(xM)

Theorem 2: If we want to model all permutation invariant functions, it is suf<icient and necessary that the latent dimension N is at least as large as the maximum input set size M.

Sketch of Proof for   Necessity

SLIDE 22

+

ρ ϕ

X ⊂ ℝM ℝNxM ℝN ℝ

Input Output

x1 xM

Y

f(x1, …, xM)

ϕ(x1) ϕ(xM)

Theorem 2: If we want to model all permutation invariant functions, it is suf<icient and necessary that the latent dimension N is at least as large as the maximum input set size M.

Sketch of Proof for   Necessity

To prove necessity, we

nly need one function

which can’t be decomposed with N<M. We pick max(X).

SLIDE 23

+

ρ ϕ

X ⊂ ℝM ℝNxM ℝN ℝ

Input Output

x1 xM

Y

f(x1, …, xM)

ϕ(x1) ϕ(xM)

Theorem 2: If we want to model all permutation invariant functions, it is suf<icient and necessary that the latent dimension N is at least as large as the maximum input set size M.

Sketch of Proof for   Necessity

To prove necessity, we

nly need one function

which can’t be decomposed with N<M. We pick max(X). We show that, in order to represent max(X), needs to be injective

Φ(X) = ∑

ϕ(x)

SLIDE 24

+

ρ ϕ

X ⊂ ℝM ℝNxM ℝN ℝ

Input Output

x1 xM

Y

f(x1, …, xM)

ϕ(x1) ϕ(xM)

Theorem 2: If we want to model all permutation invariant functions, it is suf<icient and necessary that the latent dimension N is at least as large as the maximum input set size M.

Sketch of Proof for   Necessity

To prove necessity, we

nly need one function

which can’t be decomposed with N<M. We pick max(X). We show that, in order to represent max(X), needs to be injective This is not possible with N<M

Φ(X) = ∑

ϕ(x)

SLIDE 25

Illustrative Example: Regressing to the Median

{0.1, 0.6, − 0.32, 1.61, 0.5, 0.67, 0.3}

SLIDE 26

Illustrative Example: Regressing to the Median

{0.1, 0.6, − 0.32, 1.61, 0.5, 0.67, 0.3}

SLIDE 27

100 101 102 103

N (latent dim)

10−2 10−1 100

RMSE 15 30 60 100 200 300 400 500

100 200 300 400 500 600

input size M

20 40 60 80 100

critical latent dim Nc

Illustrative Example: Regressing to the Median

SLIDE 28

On the Limitations of Representing Functions on Sets

Examples for Permutation Invariant Problems: Detecting Common Attributes

The deep sets architecture

The deep sets architecture

ϕ

The deep sets architecture

ϕ

+

The deep sets architecture

ϕ

+

The deep sets architecture

ϕ

+

The deep sets architecture

ρ ϕ

+

The deep sets architecture

ρ ϕ

+

ρ ϕ

Y

+

ρ ϕ

Y

+

ρ ϕ

Y

+

ρ ϕ

Y

+

ρ ϕ

Y

&

+

ρ ϕ

Y

&

+

ρ ϕ

Y

&

+

ρ ϕ

Y

&

Role of Continuity

+

ρ ϕ

Y

+

ρ ϕ

Y

+

ρ ϕ

Y

+

ρ ϕ

Y

+

ρ ϕ

Y

+

ρ ϕ

Y

Illustrative Example: Regressing to the Median

{0.1, 0.6, − 0.32, 1.61, 0.5, 0.67, 0.3}

Illustrative Example: Regressing to the Median

{0.1, 0.6, − 0.32, 1.61, 0.5, 0.67, 0.3}

Illustrative Example: Regressing to the Median

Thank You

Examples for Permutation Invariant Problems:  Detecting Common Attributes