STRUCTURE INTO MACHINE LEARNING TRINITY OF AI ALGORITHMS COMPUTE - - PowerPoint PPT Presentation

β–Ά
structure into machine learning trinity of ai algorithms
SMART_READER_LITE
LIVE PREVIEW

STRUCTURE INTO MACHINE LEARNING TRINITY OF AI ALGORITHMS COMPUTE - - PowerPoint PPT Presentation

Anima Anandkumar BEYOND BLACK BOXES: INFUSING STRUCTURE INTO MACHINE LEARNING TRINITY OF AI ALGORITHMS COMPUTE DATA 2 DEEP LEARNING IS DATA-HUNGRY STRUCTURE-INFUSED LEARNING Learning Data Priors = + Learning Data Priors = + How


slide-1
SLIDE 1

Anima Anandkumar

BEYOND BLACK BOXES: INFUSING STRUCTURE INTO MACHINE LEARNING

slide-2
SLIDE 2

2

TRINITY OF AI DATA COMPUTE ALGORITHMS

slide-3
SLIDE 3

DEEP LEARNING IS DATA-HUNGRY

Data Priors

+

Learning

=

STRUCTURE-INFUSED LEARNING

slide-4
SLIDE 4

Examples of Priors

  • Graphs/Tensors
  • Symbolic rules
  • Physical laws
  • Simulations
  • Generative models

How to use structure and domain knowledge to design Priors? Data Priors

+

Learning

=

slide-5
SLIDE 5

Examples of Priors

  • Graphs/Tensors
  • Symbolic rules
  • Physical laws
  • Simulations
  • Generative models

How to use structure and domain knowledge to design Priors? Data Priors

+

Learning

=

slide-6
SLIDE 6

6

NEXT GENERATION AI

FROM PREDICTION TO GENERATION

y x y x

DOG DOG

slide-7
SLIDE 7

7

Generative Adversarial Networks

slide-8
SLIDE 8

8

TURING TEST FOR FACE GENERATION

http://www.whichfaceisreal.com/index.php

slide-9
SLIDE 9

WHAT IS THE SOLUTION OF A GAN?

GAN objective for loss function 𝑀 min

𝒣

max

𝒠

𝑀 𝒣, 𝒠

Generator Discriminator

Latent Code Loss

Real Images

slide-10
SLIDE 10

COMPETITION IN GANS

  • Training GANs challenging : unstable and mode collapse
  • Standard optimization: alternating gradient descent
  • Fails even for simple case with bilinear objectives

Generator vs Discriminator optimization

slide-11
SLIDE 11

A VERY SIMPLE GAN

Current optimization methods:

slide-12
SLIDE 12

COMPETITIVE GRADIENT DESCENT

NeurIPS 2019

Florian SchΓ€fer A

slide-13
SLIDE 13

INTUITIONS

Components in decision making:

1.

Belief about loss function

2.

Uncertainty of environment

3.

Anticipation of action of adversary

Opponent awareness in optimization

slide-14
SLIDE 14

COMPETITIVE GRADIENT DESCENT

Linear for one player β†’ Bilinear for two players 𝑦𝑙+1 βˆ’ 𝑦𝑙 = argmin𝑦 𝑔 + π‘¦π‘ˆπ›Ό

𝑦𝑔 + π‘¦π‘ˆπΈπ‘¦π‘§ 2 𝑔 𝑧 + π‘§π‘ˆπ›Ό 𝑧𝑔 + 1

2πœƒ π‘¦π‘ˆπ‘¦ 𝑧𝑙+1 βˆ’ 𝑧𝑙 = argmin𝑧 𝑕 + π‘¦π‘ˆπ›Ό

𝑦𝑕 + π‘¦π‘ˆπΈπ‘¦π‘§ 2 𝑕 𝑧 + π‘§π‘ˆπ›Ό 𝑧𝑕 + 1

2πœƒ π‘§π‘ˆπ‘§ Local approximation is interactive! Nash equilibrium of local game

slide-15
SLIDE 15

A VERY SIMPLE GAN

CGD converges for all step sizes:

slide-16
SLIDE 16

RESULTS ON W-GAN

We use architecture intended for WGAN-GP, no additional hyperparameter tuning Best performance by WGAN loss with Adaptive-CGD (no regularization)

slide-17
SLIDE 17

TAKE-AWAYS

Competition between generation and discriminator leads to instability and mode collapse Stabilization in competitive gradient descent through opponent awareness Implicit competitive regularization through stabilization State of art performance with no tuning and explicit penalties

Competitive optimization in GANs

slide-18
SLIDE 18

18

DISENTANGLEMENT IN STYLEGANS

A Animesh Garg Ankit Patel Terro Karas Weili Nie

slide-19
SLIDE 19

19

CONTROLLABLE STYLEGAN

  • Multi-resolution generator and

discriminator

  • Generator conditions on factor

code

  • Mapping network –conditioned

styles – modulate each block in the synthesis network

  • Encoder shares all layers with

discriminator except last

β—‹ to predict factor code.

slide-20
SLIDE 20

20

SEMI-SUPERVISED LEARNING

slide-21
SLIDE 21

21

DISENTANGLED LEARNING

  • Loss on encoder encourages disentanglement
  • Loss incorporates code of real images when available

(semi-supervised)

slide-22
SLIDE 22

22

5% OF LABELLED DATA ON CELEB-A (256X256)

slide-23
SLIDE 23

23

1% OF LABELLED DATA ON ISAAC SIM (512X512)

slide-24
SLIDE 24

24

TAKE-AWAYS

Controllable photo-realistic generation in StyleGANs Disentanglement through reconstruction of style codes Semi-supervised learning with very little labeled data

Disentangled learning in StyleGAN

slide-25
SLIDE 25

25

Flow-based Generative Models

slide-26
SLIDE 26

26

p(z) p(x)

  • Exact likelihood
  • Invertibility
  • Use ODE solvers

CONTINUOUS NORMALIZING FLOWS

slide-27
SLIDE 27

27

CONTINUOUS NORMALIZING FLOWS

z= z0

zl

zL = x

p0(x 0) pl(x l) pL (x L ) @ l

  • g p(z(t))

@t = Tr βœ“@f(z(t);βœ“ ) @z(t) β—†

z(t

0) = z, @z(t)

@t = f(z(t),t;βœ“ )

β‡’

slide-28
SLIDE 28

28

CONTINUOUS NORMALIZING FLOWS

z= z0

zl

zL = x

p0(x 0) pl(x l) pL (x L ) @ l

  • g p(z(t))

@t = Tr βœ“@f(z(t);βœ“ ) @z(t) β—†

z(t

0) = z, @z(t)

@t = f(z(t),t;βœ“ )

β‡’

Ordinary Differential Equation

slide-29
SLIDE 29

29

NEURAL ODE MODELS FOR TIME SERIES

slide-30
SLIDE 30

30

Gavin Portwood, Peetak Mitra, Mateus Dias Riberio, Tan Mihn Nguyen, Anima Anandkumar

AI4PHYSICS: TURBULENCE FORECASTING VIA NEURAL ODE

slide-31
SLIDE 31

31

MOTIVATION

Fluid Turbulence is difficult to model:

  • Multi-scale: Dynamics of different scales non-linear

and coupled

  • Direct numerical simulation (DNS) resolves all scales

and hence, is expensive

  • Current reduced order models are heuristic, not

high fidelity

  • Can neural ODEs help?
slide-32
SLIDE 32

32

EXPERIMENTAL RESULTS

  • Neural ODE predictions of evolution of dissipation rate are better
  • Neural ODE generalizes well on unseen test data
slide-33
SLIDE 33

33

TAKE-AWAYS

Good alternatives to GANs when likelihood estimates are needed Ideal for scientific applications with underlying differential equations and need for uncertainty estimates Challenges in scaling

Flow-based generative models

slide-34
SLIDE 34

FEEDBACK GENERATIVE MODELS

slide-35
SLIDE 35

35

NEXT GENERATION AI

FROM PREDICTION TO GENERATION

y x y x

DOG DOG

One model to do both?

slide-36
SLIDE 36

36

Taking inspiration from Biological brains..

slide-37
SLIDE 37

37

HUMAN VISION: FEEDFORWARD & FEEDBACK

n– e– sβ€” s exβ€”i gβ€”a β€” β€œ s)” a β€˜ ’ w : β€œ [ a] s a yβ€”t sβ€”h ”

Second-order predictions First-order predictions Visual input Representational sharpening Hierarchical predictive coding First-order streams

Prediction error (superficial pyramidal cells) Expectations (deep pyramidal cells)

Second-order streams

Prediction error (precision) Expectations (precision) Inhibitory (backward) connections Excitatory (forward) connections Modulatory backward connections

–

Interaction between the feedforward and feedback connections are crucial for core object recognition in human vision

slide-38
SLIDE 38

DECONVOLUTIONAL GENERATIVE MODEL

  • bject

category intermediate rendering image latent variables

  • Feedback network for

deconvolution

  • Latent variables to
  • vercome non-invertibility

Nhat Ho Tan Nyugen Ankit Patel Michael Jordan Rich Baranuik

slide-39
SLIDE 39

39

CONVOLUTIONAL NEURAL NETWORK WITH FEEDBACK

. . .

x y

. . . . . . . . .

CNN-F performs approximate belief propagation through feedforward CNN and feedback generative model

Tan Nyugen Rich Baranuik Doris Tsaos Yujia Huang Sihui Dai Pinglei Bao

slide-40
SLIDE 40

CNN-F CAN RECOVER CLEAN DATA

Noise Blur Occlusion Input CNN-F Reconstruction

slide-41
SLIDE 41

CNN-F YIELDS ROBUST CLASSIFICATION

slide-42
SLIDE 42

TAKE-AWAYS

Biological inspiration can lead to more robust architectures Combining feedforward and feedback networks for iterative inference Robust prediction on degraded images

Adding feedback to CNNs

slide-43
SLIDE 43

CONCLUSION

  • Generative models are important in many applications
  • Photorealistic generation now possible
  • Competitive optimization to stabilize GAN training
  • Controlled disentangled generation in Style GANs
  • Continuous flow-based models for physical applications
  • Brain inspired CNN with feedback
  • Outstanding challenges:
  • How to combine generative models with simulations and

downstream taskss

slide-44
SLIDE 44

44

Thank you