GENERALIZABLE AI: A NEW FOUNDATION ANIMA ANANDKUMAR TRINITY OF AI - - PowerPoint PPT Presentation

β–Ά
generalizable ai a new foundation
SMART_READER_LITE
LIVE PREVIEW

GENERALIZABLE AI: A NEW FOUNDATION ANIMA ANANDKUMAR TRINITY OF AI - - PowerPoint PPT Presentation

GENERALIZABLE AI: A NEW FOUNDATION ANIMA ANANDKUMAR TRINITY OF AI ALGORITHMS COMPUTE DATA 2 IMPRESSIVE GROWTH OF AI Wide range of domains Deep reinforcement learning NVIDIA GAN generates photo- beats human champion realistic images,


slide-1
SLIDE 1

GENERALIZABLE AI: A NEW FOUNDATION

ANIMA ANANDKUMAR

slide-2
SLIDE 2

2

TRINITY OF AI DATA COMPUTE ALGORITHMS

slide-3
SLIDE 3

IMPRESSIVE GROWTH OF AI

Wide range of domains NVIDIA GAN generates photo- realistic images, passes Turing test Deep reinforcement learning beats human champion

slide-4
SLIDE 4

BUT NOTABLE FAILURES

AI is not living up to its hype

Safety-critical applications Language understanding How do we fix these gaps in deep learning?

slide-5
SLIDE 5

5

PATH TO GENERALIZABLE AI

slide-6
SLIDE 6

Data Priors

+

AI Algorithm

=

INGREDIENTS OF AN AI ALGORITHM

+

Task + Action

Learning Decision making

slide-7
SLIDE 7

Data Priors

+

AI Algorithm

= +

Task

DEEP LEARNING STATUS QUO

  • Massive datasets
  • Expensive human labeling

Data hungry

slide-8
SLIDE 8

Data Priors

+

AI Algorithm

= +

Task

DEEP LEARNING STATUS QUO

  • Easy to fool current models
  • Not domain specific

Not robust

slide-9
SLIDE 9

Data Priors

+

AI Algorithm

= +

Task

DEEP LEARNING STATUS QUO

  • Fixed tasks
  • Limited benchmarks

Simplistic

slide-10
SLIDE 10

Data Priors

+

AI Algorithm

= +

Task

NEXT FRONTIER IN AI

  • Recurrent feedback
  • Domain knowledge
  • Compositionality
  • Disentanglement learning
  • Domain adaptation
  • Multi-task & domains
  • Online and continual learning

Unsupervised Robust Adaptive

slide-11
SLIDE 11

Data Priors

+

AI Algorithm

= +

Task

NEXT FRONTIER IN AI

  • Recurrent feedback
  • Domain knowledge
  • Compositionality
  • Disentanglement learning
  • Domain adaptation
  • Multi-task & domains
  • Online and continual learning

Unsupervised Robust Adaptive

slide-12
SLIDE 12

12

BRAIN-INSPIRED ARCHITECTURES WITH RECURRENT FEEDBACK

A Yujia Huang Sihui Dai Tan Nyugen Doris Tsao James Gornet Zhiding Yu

slide-13
SLIDE 13

13

THE HUMAN BRAIN IS HIERARCHICAL

Adapted from Journal of Vision (2013), 13, 10

slide-14
SLIDE 14

14

HUMAN VISION IS ROBUST

slide-15
SLIDE 15

15

THE BRAIN IS BAYESIAN

slide-16
SLIDE 16

16

COMBINING CLASSIFIER AND GENERATOR THROUGH FEEDBACK CONNECTIONS

slide-17
SLIDE 17

17

GENERATIVE VS DISCRIMINATIVE CLASSIFIER

Logistic Regression Gaussian Mixture

π‘ž 𝑦, 𝑧 β†’ π‘ž(𝑧|𝑦)

Deconvolutional Generative Model

π‘ž 𝑦, 𝑧, 𝑨 β†’ π‘ž(𝑧|𝑦, 𝑨)

CNN

slide-18
SLIDE 18

18

MESSAGE PASSING NETWORK

CNN CNN-F Generative feedback

Soft label Feedfoward layers Image Feedback layers Latent variables Feedforward Feedback

slide-19
SLIDE 19

19

SELF-CONSISTENCY THROUGH RECURRENT FEEDBACK

CNN-F

…

Initialization Iteration 1 Iteration 2

slide-20
SLIDE 20

20

CNN-F CAN REPAIR DISTORTED IMAGES WITHOUT SUPERVISION

Shot Noise Gaussian Noise Dotted Line Corrupted images Ground-truth

slide-21
SLIDE 21

21

CNN-F IMPROVES ADVERSARIAL ROBUSTNESS

  • Standard training on

Fashion-MNIST .

  • Attack with PGD-40.
  • CNN-F has higher

adversarial robustness than CNN.

slide-22
SLIDE 22

22

CNN-F COMBINED WITH ADVERSARIAL TRAINING

  • Adversarial training on Fashion-MNIST

.

  • Trained with PGD-40 (eps=0.3). Attack with PGD-40.
  • CNN-F augmented with adversarial images achieves high accuracy for both

clean and adversarial data.

slide-23
SLIDE 23

23

CNN-F HAS HIGHER BRAIN SCORE

Feedback is biologically more plausible

slide-24
SLIDE 24

24

TAKE-AWAYS

Human brain has feedback pathways for top-down inference Internal generative model of the world Bayesian brain: bottom up feedforward + top down feedback Robustness is inherent in CNN-F Biological plausibility in CNN-F

Recurrent generative feedback for robust learning

slide-25
SLIDE 25

NEURO-SYMBOLIC SYSTEMS FOR COMPOSITIONAL REASONING

Forough Arabshahi Sameer Singh A

slide-26
SLIDE 26

SYMBOLISTS VS. CONNECTIONISTS

Explainability Generalization & knowledge coverage Extrapolation

Representation Extraction

slide-27
SLIDE 27

TYPES OF TRAINING EXAMPLES

Symbolic Expressions Function Evaluation Number Encoding Decimal Tree for 2.5 sin2 πœ„ + cos2 πœ„ = 1 sin βˆ’2.5 = βˆ’0.6

slide-28
SLIDE 28

CONTINUOUS REPRESENTATIONS FOR REASONING

Representations of symbols, numbers and functions in common embedding space

2.45 πœ„ 2 … sin cos Γ— …

3 + 4 7 Γ— 1.1 1 βˆ’ sin2(πœ„) cos2(πœ„)

slide-29
SLIDE 29

TREE-LSTM FOR COMPOSITIONALITY

slide-30
SLIDE 30

30

82% 72% 82% 76% 97% 96%

65% 70% 75% 80% 85% 90% 95% 100%

GENERALIZATION EXTRAPOLATION Accuracy

Sympy LSTM Tree-LSTM

EQUATION VERIFICATION

slide-31
SLIDE 31

AUGMENTING WITH STACK MEMORY

Train: Depth 1-7 Test: Depth 8-13

Differentiable memory for extrapolation to harder examples

slide-32
SLIDE 32

TAKE-AWAYS

Math reasoning tasks Combine symbolic expressions and numerical data Generalizable and composable representation of functions Differentiable memory stack for extrapolation to harder examples

Neuro-symbolic systems for compositional learning

slide-33
SLIDE 33

AI4SCIENCE

ROLE OF PRIORS

slide-34
SLIDE 34

34

Examples of Priors

  • Tensors and graphs
  • Laws of nature
  • Simulations

How to use structure and domain knowledge to design Priors? Data Priors

+

Learning

=

slide-35
SLIDE 35

35

fly T’

’

firs floor flight field flo flo fields figurable infinite to–

AUTONOMOUS DYNAMIC ROBOTS AT CAST, CALTECH

flying

’

flo configuration infinite flo field. flo field fit confines

Chung’

flapping flying flat floors Profile

slide-36
SLIDE 36

LEARNING RESIDUAL DYNAMICS FOR DRONE LANDING

𝑑𝑒+1 = 𝑔 𝑑𝑒, 𝑏𝑒 + ሚ 𝑔 𝑑𝑒, 𝑏𝑒 + πœ—

New State Current State Current Action (aka control input) Unmodeled Disturbance

𝑔 = nominal dynamics ሚ 𝑔 = learned dynamics Our method is

  • Provably robust and safe
  • Generalizes to higher landing speeds
slide-37
SLIDE 37

37

CAST @ CALTECH LEARNING TO LAND

slide-38
SLIDE 38

QUANTUM FEATURES IN CHEMISTRY

MOB-ML features: universal mapping in chemical space

Pair correlation energy MOB feature value

  • M. Welborn, Z. Qiao, F

. R. Manby, A. Anandkumar , T . F . Miller III, arXiv:2007.08026.

slide-39
SLIDE 39

39

ORBNET: MOB + GRAPH NEURAL NETWORKS

  • M. Welborn, Z. Qiao, F

. R. Manby, A. Anandkumar , T . F . Miller III, arXiv:2007.08026.

slide-40
SLIDE 40

Drug-molecule conformer stability rankings (R2)

ORBNET:1000X SIMULATION SPEED-UP

10 - 2 0.2 0.4 0.6 0.8 1.0 Force Field Semiempirical Machine Learning DFT MP2 CC

OrbNet

MMFF94 GFN0 ANI-1ccx ANI-2x ANI-1x BATTY UFF BAT GAFF GFN1/GFN2 PM7BoB

  • 1. B97-3c
  • 2. PBE-D3(BJ)/Def2-SVP
  • 3. PBE-D3(BJ)/Def2-TZVP
  • 4. B3LYP-D3(BJ)/Def2-SVP
  • 5. PBEH-3c
  • 6. B3LYP-D3(BJ)/Def2-TZVP
  • 7. Ο‰B97X-D3/Def2-TZVP
1 2 3 4 5 6 7

100 102 104

Time-to-solution (s)

Quantum-mechanical accuracy at semi-empirical cost

  • M. Welborn, Z. Qiao, F

. R. Manby, A. Anandkumar , T . F . Miller III, arXiv:2007.08026.

Test data: Drug-like molecules with 10-50 heavy atoms Zero shot generalization: testing

  • n molecules ~10x larger
slide-41
SLIDE 41

STATE-OF-ART DATA EFFICIENCY

Solvent T raining Data MAE (kcal/mol) Benzene 50 0.57 Carbon tetrachloride 50 0.40 Chloroform 40 0.84 Cyclohexane 30 0.44 Diethylether 40 0.58 Hexadecane 40 0.57 Octanol 50 0.84

MOB-ML works across solvents MOB-ML vs. others for water

MoleculeNet: Chem. Sci., 2018, 9, 513; MPNN: J. Chem. Inf. Model. 2019, 59,3370

  • No. of training molecules

RMSE *kcal/mol)

slide-42
SLIDE 42

LEARNING FAMILY OF PDE

Problems in science and engineering reduce to PDEs. Learning mapping from parameters to output through operator learning

slide-43
SLIDE 43

MULTIPOLE GRAPHS

  • Multi-scale graphs to capture different ranges of interaction
  • Linear complexity
slide-44
SLIDE 44

EXPERIMENTAL RESULTS

Burgers equation

Graph neural networks for operator learning Super-resolution and generalization within family of PDEs

slide-45
SLIDE 45

TAKE-AWAYS

Black-box deep learning is unsuitable for scientific domains Lack of labeled data and robustness Domain knowledge can tailor learning to the problem What is right mix of priors + deep learning?

Domain knowledge augments deep learning

slide-46
SLIDE 46

46

UNSUPERVISED LEARNING

slide-47
SLIDE 47

47

DISENTANGLEMENT LEARNING

Learning latent variables that disentangle data

slide-48
SLIDE 48

48

DISENTANGLED GENERATION

Semi-supervised learning on 1% Labeled data Semi-supervised learning on 0.5% Labeled data

>

Semi-supervised learning with very little labeled data

>

Both unsupervised and supervised auxiliary losses help in disentanglement

https://sites.google.com/nvidia.com/semi-stylegan

slide-49
SLIDE 49

49

SELF-SUPERVISED LEARNING

  • Data invariances provide supervision
  • Self training with pseudo-labels
  • Need confidence measure to select

pseudo-labels

Robust measures of confidence

slide-50
SLIDE 50

50

Target Images (Cityscapes) Deep CNN Source Labels (GTA5) Source Images (GTA5) Predictions (Cityscapes) at 2nd round Predictions (Cityscapes) at 1st round Pseudo-labels (Cityscapes) at 2nd round Pseudo-labels (Cityscapes) at 1st round

Network re-training Pseudo-label generation

DOMAIN ADAPTATION THROUGH SELF-TRAINING

slide-51
SLIDE 51

51

ANGULAR MEASURE IMPROVES SELF-TRAINING

Angular measure of hardness for sample selection for self training https://sites.google.com/nvidia.com/avh

slide-52
SLIDE 52

52

TASK ADAPTATION AND GENERALIZATION

A Animesh Garg Yuke Zhu Hongyu Ren

slide-53
SLIDE 53

53

  • Agents should be versatile! Given a new task, should quickly adapt
  • Each task is complex and requires finishing a sequence of sub-tasks.

Distribution of tasks

Task Agent State Reward Action

πœŒπœ„(𝑏|𝑑)

META-REINFORCEMENT LEARNING

slide-54
SLIDE 54

54

Task inference is key!

What is the current task?

Task Agent State Reward Action

Distribution of tasks πœŒπœ„(𝑏|𝑑)

META-REINFORCEMENT LEARNING

Unsupervised learning

slide-55
SLIDE 55

55

OCEAN combines both global and local task inference.

55

OCEAN: ONLINE CONTEXT ADAPTATION FOR TASK INFERENCE

http://snap.stanford.edu/ocean/

slide-56
SLIDE 56

56

DEMONSTRATION

Our algorithm finishes each subtask by accurately

56

  • Accurate task

inference for Meta-RL.

  • Real-world tasks

are sequential and compositional.

  • Tasks should be

inferred and updated online.

Our algorithm Baseline

slide-57
SLIDE 57

CONCLUSION

  • Generalizable AI needs rethinking of deep learning
  • Unsupervised learning is the key
  • Brain-inspired NN with recurrent feedback for

robustness

  • Neuro-symbolic AI enables compositionality
  • Domain knowledge is critical for AI4science