Some Success Stories in Bridging Theory and Practice Anima - - PowerPoint PPT Presentation

some success stories in bridging theory and practice
SMART_READER_LITE
LIVE PREVIEW

Some Success Stories in Bridging Theory and Practice Anima - - PowerPoint PPT Presentation

Some Success Stories in Bridging Theory and Practice Anima Anandkumar Bren Professor at Caltech Director of ML Research at NVIDIA SIGNSGD: COMPRESSED OPTIMIZATION FOR NON-CONVEX PROBLEMS JEREMY BERNSTEIN, JIAWEI ZHAO, KAMYAR AZZIZADENESHELI,


slide-1
SLIDE 1

Some Success Stories in Bridging Theory and Practice

Anima Anandkumar

Bren Professor at Caltech Director of ML Research at NVIDIA

slide-2
SLIDE 2

SIGNSGD: COMPRESSED OPTIMIZATION FOR NON-CONVEX PROBLEMS

JEREMY BERNSTEIN, JIAWEI ZHAO, KAMYAR AZZIZADENESHELI, YU-XIANG WANG, ANIMA ANANDKUMAR

slide-3
SLIDE 3

DISTRIBUTED TRAINING INVOLVES COMPUTATION & COMMUNICATION

Parameter server

GPU 1 GPU 2

With 1/2 data With 1/2 data

slide-4
SLIDE 4

DISTRIBUTED TRAINING INVOLVES COMPUTATION & COMMUNICATION

Parameter server

GPU 1 GPU 2

With 1/2 data With 1/2 data

Compress? Compress? Compress?

slide-5
SLIDE 5

DISTRIBUTED TRAINING BY MAJORITY VOTE

Parameter server

GPU 1 GPU 2 GPU 3

sign(g) sign(g) sign(g)

Parameter server

GPU 1 GPU 2 GPU 3

sign [sum(sign(g))]

slide-6
SLIDE 6

SINGLE WORKER RESULTS

Assumptions SGD gets rate

피 [ 1 K

K−1

k=0

∥gk∥2

2] ≤

1 N [2∥ ⃗ L ∥∞(f0 − f*) + ∥ ⃗ σ ∥2

2]

signSGD gets rate

피 [ 1 K

K−1

k=0

∥gk∥1]

2

≤ 1 N [ ∥ ⃗ L ∥1 (f0 − f* + 1 2 ) + 2∥ ⃗ σ ∥1]

2

f*

➤ Objective function lower bound

⃗ σ

➤ Coordinate-wise variance bound

⃗ L

➤ Coordinate-wise gradient Lipschitz

K

➤ Number of iterations

N

➤ Number of backpropagations

Define

LARGE-BATCH ANALYSIS

SINGLE WORKER RESULTS

Assumptions SGD gets rate

피 [ 1 K

K−1

k=0

∥gk∥2

2] ≤

1 N [2∥ ⃗ L ∥∞(f0 − f*) + ∥ ⃗ σ ∥2

2]

signSGD gets rate

피 [ 1 K

K−1

k=0

∥gk∥1]

2

≤ 1 N [ ∥ ⃗ L ∥1 (f0 − f* + 1 2 ) + 2∥ ⃗ σ ∥1]

2

f*

➤ Objective function lower bound

⃗ σ

➤ Coordinate-wise variance bound

⃗ L

➤ Coordinate-wise gradient Lipschitz

K

➤ Number of iterations

N

➤ Number of backpropagations

Define

d ∥ ⃗ L ∥∞

d∥ ⃗ σ ∥2

d∥ ⃗ g k∥2

slide-7
SLIDE 7

A sparse vector A dense vector Fully dense vector……………….a sign vector Natural measure of density =1 for fully dense v ≈0 for fully sparse v

7

VECTOR DENSITY & ITS RELEVANCE IN DEEP LEARNING

slide-8
SLIDE 8

DISTRIBUTED SIGNSGD: MAJORITY VOTE THEORY

If gradients are unimodal and symmetric… …reasonable by central limit theorem… …majority vote with M workers converges at rate:

Same variance reduction as SGD

slide-9
SLIDE 9

MINI-BATCH ANALYSIS

Under symmetric noise assumption:

slide-10
SLIDE 10

CIFAR-10 SNR

slide-11
SLIDE 11

SIGNSGD PROVIDES “FREE LUNCH"

Throughput gain with only tiny accuracy loss

P3.2x machines on AWS, Resnet50 on imagenet

slide-12
SLIDE 12

SIGNSGD: TIME PER EPOCH

slide-13
SLIDE 13

SIGNSGD ACROSS DOMAINS AND ARCHITECTURES

Huge throughput gain!

slide-14
SLIDE 14

BYZANTINE FAULT TOLERANCE

Under symmetric noise assumption:

slide-15
SLIDE 15

SIGNSGD IS ALSO BYZANTINE FAULT TOLERANT

slide-16
SLIDE 16

TAKE-AWAYS FOR SIGN-SGD

  • Convergence even under biased gradients and noise.
  • Faster than SGD in theory and in practice.
  • For distributed training, similar variance reduction as SGD.
  • In practice, similar accuracy but with far less communication.
slide-17
SLIDE 17

LEARNING FROM NOISY SINGLY-LABELED DATA

ASHISH KHETAN, ZACHARY C. LIPTON, ANIMA ANANDKUMAR

slide-18
SLIDE 18

CROWDSOURCING: AGGREGATION OF CROWD ANNOTATIONS Majority rule

  • Simple and common.
  • Wasteful: ignores annotator

quality of different workers.

slide-19
SLIDE 19

CROWDSOURCING: AGGREGATION OF CROWD ANNOTATIONS Majority rule

  • Simple and common.
  • Wasteful: ignores annotator

quality of different workers. Annotator-quality models

  • Can improve accuracy.
  • Hard: needs to be estimated

without ground-truth.

slide-20
SLIDE 20

SOME INTUITIONS

Majority rule to estimate annotator quality

  • Justification: Majority rule approaches

ground-truth when enough workers.

  • Downside: Requires large number of

annotations for each example for majority rule to be correct.

Annotator quality model (Prob. of correctness)

slide-21
SLIDE 21

PROPOSED CROWDSOURCING ALGORITHM

Repeat Posterior of ground-truth labels given annotator quality model Use trained model to infer ground-truth labels Noisy crowdsourced annotations MLE : update Annotator quality using inferred labels from model Training with weighted loss. Use posterior as weights

slide-22
SLIDE 22

LABELING ONCE IS OPTIMAL: THEORY

Theorem: Under fixed budget, generalization error minimized with single annotation per sample. Assumptions:

  • Best predictor is accurate enough (under no label noise).
  • Simplified case: All workers have same quality.
  • Prob. of being correct > 83%
slide-23
SLIDE 23

LABELING ONCE IS OPTIMAL: PRACTICE

MS-COCO dataset. Fixed budget: 35k annotations

  • No. of workers

5% wrt Majority rule

Imagenet dataset. Simulated workers and fixed budget

slide-24
SLIDE 24

NEURAL RENDERING MODEL (NRM): JOINT GENERATION AND PREDICTION FOR SEMI-SUPERVISED LEARNING

Nhat Ho, Tan Nguyen, Ankit Patel, A. , Michael Jordan, Richard Baraniuk

slide-25
SLIDE 25

SEMI-SUPERVISED LEARNING WITH GENERATIVE MODELS?

Merits

  • Captures statistics of

natural images

  • Learnable

Peril

  • Feedback is real vs. fake:

different from prediction.

  • Introduces artifacts

GAN

slide-26
SLIDE 26

PREDICTIVE VS GENERATIVE MODELS

y x y x

P(y | x) P(x | y) One model to do both?

  • SOTA prediction from CNN models.
  • What class of p(x|y) yield CNN models for p(y|x)?
slide-27
SLIDE 27

NEURAL DEEP RENDERING MODEL (NRM)

. . . . . .

  • bject

category intermediate rendering image latent variables

x

<latexit sha1_base64="TQGEygQLk4MpTONXYfpVxGVfRf8=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqMeCF48t2A9oQ9lsJ+3azSbsbsQS+gu8eFDEqz/Jm/GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKLSPJb3ZpqgH9GR5CFn1Fip+TQoV9yquwBZJ15OKpCjMSh/9YcxSyOUhgmqdc9zE+NnVBnOBM5K/VRjQtmEjrBnqaQRaj9bHDojF1YZkjBWtqQhC/X3REYjradRYDsjasZ61ZuL/3m91IQ3fsZlkhqUbLkoTAUxMZl/TYZcITNiaglitbCRtTRZmx2ZRsCN7qy+ukfVX13KrXvK7Uu3kcRTiDc7gED2pQhztoQAsYIDzDK7w5D86L8+58LFsLTj5zCn/gfP4A7NeNEg=</latexit><latexit sha1_base64="TQGEygQLk4MpTONXYfpVxGVfRf8=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqMeCF48t2A9oQ9lsJ+3azSbsbsQS+gu8eFDEqz/Jm/GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKLSPJb3ZpqgH9GR5CFn1Fip+TQoV9yquwBZJ15OKpCjMSh/9YcxSyOUhgmqdc9zE+NnVBnOBM5K/VRjQtmEjrBnqaQRaj9bHDojF1YZkjBWtqQhC/X3REYjradRYDsjasZ61ZuL/3m91IQ3fsZlkhqUbLkoTAUxMZl/TYZcITNiaglitbCRtTRZmx2ZRsCN7qy+ukfVX13KrXvK7Uu3kcRTiDc7gED2pQhztoQAsYIDzDK7w5D86L8+58LFsLTj5zCn/gfP4A7NeNEg=</latexit><latexit sha1_base64="TQGEygQLk4MpTONXYfpVxGVfRf8=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqMeCF48t2A9oQ9lsJ+3azSbsbsQS+gu8eFDEqz/Jm/GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKLSPJb3ZpqgH9GR5CFn1Fip+TQoV9yquwBZJ15OKpCjMSh/9YcxSyOUhgmqdc9zE+NnVBnOBM5K/VRjQtmEjrBnqaQRaj9bHDojF1YZkjBWtqQhC/X3REYjradRYDsjasZ61ZuL/3m91IQ3fsZlkhqUbLkoTAUxMZl/TYZcITNiaglitbCRtTRZmx2ZRsCN7qy+ukfVX13KrXvK7Uu3kcRTiDc7gED2pQhztoQAsYIDzDK7w5D86L8+58LFsLTj5zCn/gfP4A7NeNEg=</latexit><latexit sha1_base64="TQGEygQLk4MpTONXYfpVxGVfRf8=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqMeCF48t2A9oQ9lsJ+3azSbsbsQS+gu8eFDEqz/Jm/GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKLSPJb3ZpqgH9GR5CFn1Fip+TQoV9yquwBZJ15OKpCjMSh/9YcxSyOUhgmqdc9zE+NnVBnOBM5K/VRjQtmEjrBnqaQRaj9bHDojF1YZkjBWtqQhC/X3REYjradRYDsjasZ61ZuL/3m91IQ3fsZlkhqUbLkoTAUxMZl/TYZcITNiaglitbCRtTRZmx2ZRsCN7qy+ukfVX13KrXvK7Uu3kcRTiDc7gED2pQhztoQAsYIDzDK7w5D86L8+58LFsLTj5zCn/gfP4A7NeNEg=</latexit>

y

<latexit sha1_base64="zX/nehuC+fK5+AT4o3l1JMUrCQ=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPBi8cW7Ae0oWy2k3btZhN2N0I/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O6WNza3tnfJuZW/4PCoenzS0XGqGLZLGLVC6hGwSW2DTcCe4lCGgUCu8H0bu53n1BpHsHkyXoR3QsecgZNVZqZcNqza27C5B14hWkBgWaw+rXYBSzNEJpmKBa9z03MX5OleFM4KwySDUmlE3pGPuWShqh9vPFoTNyYZURCWNlSxqyUH9P5DTSOosC2xlRM9Gr3lz8z+unJrz1cy6T1KBky0VhKoiJyfxrMuIKmRGZJZQpbm8lbEIVZcZmU7EheKsvr5POVd1z617rutboFXGU4QzO4RI8uIEG3EMT2sA4Rle4c15dF6cd+dj2VpyiplT+APn8wfuW40T</latexit><latexit sha1_base64="zX/nehuC+fK5+AT4o3l1JMUrCQ=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPBi8cW7Ae0oWy2k3btZhN2N0I/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O6WNza3tnfJuZW/4PCoenzS0XGqGLZLGLVC6hGwSW2DTcCe4lCGgUCu8H0bu53n1BpHsHkyXoR3QsecgZNVZqZcNqza27C5B14hWkBgWaw+rXYBSzNEJpmKBa9z03MX5OleFM4KwySDUmlE3pGPuWShqh9vPFoTNyYZURCWNlSxqyUH9P5DTSOosC2xlRM9Gr3lz8z+unJrz1cy6T1KBky0VhKoiJyfxrMuIKmRGZJZQpbm8lbEIVZcZmU7EheKsvr5POVd1z617rutboFXGU4QzO4RI8uIEG3EMT2sA4Rle4c15dF6cd+dj2VpyiplT+APn8wfuW40T</latexit><latexit sha1_base64="zX/nehuC+fK5+AT4o3l1JMUrCQ=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPBi8cW7Ae0oWy2k3btZhN2N0I/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O6WNza3tnfJuZW/4PCoenzS0XGqGLZLGLVC6hGwSW2DTcCe4lCGgUCu8H0bu53n1BpHsHkyXoR3QsecgZNVZqZcNqza27C5B14hWkBgWaw+rXYBSzNEJpmKBa9z03MX5OleFM4KwySDUmlE3pGPuWShqh9vPFoTNyYZURCWNlSxqyUH9P5DTSOosC2xlRM9Gr3lz8z+unJrz1cy6T1KBky0VhKoiJyfxrMuIKmRGZJZQpbm8lbEIVZcZmU7EheKsvr5POVd1z617rutboFXGU4QzO4RI8uIEG3EMT2sA4Rle4c15dF6cd+dj2VpyiplT+APn8wfuW40T</latexit><latexit sha1_base64="zX/nehuC+fK5+AT4o3l1JMUrCQ=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPBi8cW7Ae0oWy2k3btZhN2N0I/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O6WNza3tnfJuZW/4PCoenzS0XGqGLZLGLVC6hGwSW2DTcCe4lCGgUCu8H0bu53n1BpHsHkyXoR3QsecgZNVZqZcNqza27C5B14hWkBgWaw+rXYBSzNEJpmKBa9z03MX5OleFM4KwySDUmlE3pGPuWShqh9vPFoTNyYZURCWNlSxqyUH9P5DTSOosC2xlRM9Gr3lz8z+unJrz1cy6T1KBky0VhKoiJyfxrMuIKmRGZJZQpbm8lbEIVZcZmU7EheKsvr5POVd1z617rutboFXGU4QzO4RI8uIEG3EMT2sA4Rle4c15dF6cd+dj2VpyiplT+APn8wfuW40T</latexit>

Design joint priors for latent variables based on reverse-engineering CNN predictive architectures

slide-28
SLIDE 28

NEURAL RENDERING MODEL (NRM)

0.5 dog 0.2 cat 0.1 horse … 1.0 dog Choose render

  • r not

Upsample, select location Render NRM: Generation CNN: Inference image unpooled feature map pooled feature map rectified feature map class template masked template upsampled template rendered image

slide-29
SLIDE 29

MAX-MIN CROSS-ENTROPY ➡ MAX-MIN NETWORKS

Cross-Entropy Loss for Training the CNNs with Labeled Data

min

θ∈Aγ Hp,q(y|x, zmax) ≥

min

(zi)n

i=1,θ

1 n

n

X

i=1

− log p(yi|xi, zi; θ)

<latexit sha1_base64="o8aAYTACJXrjfyUfzuzjiD9z6U=">ACkXicbVFdb9MwFHXC1ygwOvbIyxUVUiuVKZkmwYSGCrxM4mVIdJtUd5HjOqk128liBzXz/H/4Pbzxb3DaSMDGlXx1dO491/a5aSm4NlH0Kwjv3X/w8NHW496Tp8+2n/d3Xpzqoq4om9JCFNV5SjQTXLGp4Uaw87JiRKaCnaWXn9v62XdWaV6ob6Yp2VySXPGMU2I8lfR/YMlVYrFZMkMAcwVYErOkRNiPzvM5kZI4B8eJLcdXbtjcrMbXF76frYyVZOXcCHDOrmAzZ3idWO5GPh3F7sIqN4bNaIcXWUWojZ0nAeta/umBN4BFkUM5bFr1zarNY1iPer+RjyDpD6K9aB1wF8QdGKAuTpL+T7woaC2ZMlQrWdxVJq5JZXhVDXw7VmJaGXJGczDxWRTM/t2lEHrz2zgKyo/FEG1uzfCkuk1o1MfWdrl75da8n/1Wa1yd7NLVdlbZim4uyWoApoF0PLHjFqBGNB4RW3L8V6J454xfYs+bEN/+8l1wur8Xe/z1YD51NmxhV6iV2iIYvQWTdAxOkFTRIPt4CA4Cj6Eu+FhOAm73jDoNLvonwi/Aa9kMoP</latexit><latexit sha1_base64="o8aAYTACJXrjfyUfzuzjiD9z6U=">ACkXicbVFdb9MwFHXC1ygwOvbIyxUVUiuVKZkmwYSGCrxM4mVIdJtUd5HjOqk128liBzXz/H/4Pbzxb3DaSMDGlXx1dO491/a5aSm4NlH0Kwjv3X/w8NHW496Tp8+2n/d3Xpzqoq4om9JCFNV5SjQTXLGp4Uaw87JiRKaCnaWXn9v62XdWaV6ob6Yp2VySXPGMU2I8lfR/YMlVYrFZMkMAcwVYErOkRNiPzvM5kZI4B8eJLcdXbtjcrMbXF76frYyVZOXcCHDOrmAzZ3idWO5GPh3F7sIqN4bNaIcXWUWojZ0nAeta/umBN4BFkUM5bFr1zarNY1iPer+RjyDpD6K9aB1wF8QdGKAuTpL+T7woaC2ZMlQrWdxVJq5JZXhVDXw7VmJaGXJGczDxWRTM/t2lEHrz2zgKyo/FEG1uzfCkuk1o1MfWdrl75da8n/1Wa1yd7NLVdlbZim4uyWoApoF0PLHjFqBGNB4RW3L8V6J454xfYs+bEN/+8l1wur8Xe/z1YD51NmxhV6iV2iIYvQWTdAxOkFTRIPt4CA4Cj6Eu+FhOAm73jDoNLvonwi/Aa9kMoP</latexit><latexit sha1_base64="o8aAYTACJXrjfyUfzuzjiD9z6U=">ACkXicbVFdb9MwFHXC1ygwOvbIyxUVUiuVKZkmwYSGCrxM4mVIdJtUd5HjOqk128liBzXz/H/4Pbzxb3DaSMDGlXx1dO491/a5aSm4NlH0Kwjv3X/w8NHW496Tp8+2n/d3Xpzqoq4om9JCFNV5SjQTXLGp4Uaw87JiRKaCnaWXn9v62XdWaV6ob6Yp2VySXPGMU2I8lfR/YMlVYrFZMkMAcwVYErOkRNiPzvM5kZI4B8eJLcdXbtjcrMbXF76frYyVZOXcCHDOrmAzZ3idWO5GPh3F7sIqN4bNaIcXWUWojZ0nAeta/umBN4BFkUM5bFr1zarNY1iPer+RjyDpD6K9aB1wF8QdGKAuTpL+T7woaC2ZMlQrWdxVJq5JZXhVDXw7VmJaGXJGczDxWRTM/t2lEHrz2zgKyo/FEG1uzfCkuk1o1MfWdrl75da8n/1Wa1yd7NLVdlbZim4uyWoApoF0PLHjFqBGNB4RW3L8V6J454xfYs+bEN/+8l1wur8Xe/z1YD51NmxhV6iV2iIYvQWTdAxOkFTRIPt4CA4Cj6Eu+FhOAm73jDoNLvonwi/Aa9kMoP</latexit><latexit sha1_base64="o8aAYTACJXrjfyUfzuzjiD9z6U=">ACkXicbVFdb9MwFHXC1ygwOvbIyxUVUiuVKZkmwYSGCrxM4mVIdJtUd5HjOqk128liBzXz/H/4Pbzxb3DaSMDGlXx1dO491/a5aSm4NlH0Kwjv3X/w8NHW496Tp8+2n/d3Xpzqoq4om9JCFNV5SjQTXLGp4Uaw87JiRKaCnaWXn9v62XdWaV6ob6Yp2VySXPGMU2I8lfR/YMlVYrFZMkMAcwVYErOkRNiPzvM5kZI4B8eJLcdXbtjcrMbXF76frYyVZOXcCHDOrmAzZ3idWO5GPh3F7sIqN4bNaIcXWUWojZ0nAeta/umBN4BFkUM5bFr1zarNY1iPer+RjyDpD6K9aB1wF8QdGKAuTpL+T7woaC2ZMlQrWdxVJq5JZXhVDXw7VmJaGXJGczDxWRTM/t2lEHrz2zgKyo/FEG1uzfCkuk1o1MfWdrl75da8n/1Wa1yd7NLVdlbZim4uyWoApoF0PLHjFqBGNB4RW3L8V6J454xfYs+bEN/+8l1wur8Xe/z1YD51NmxhV6iV2iIYvQWTdAxOkFTRIPt4CA4Cj6Eu+FhOAm73jDoNLvonwi/Aa9kMoP</latexit>

Max-Min Loss for Training the CNNs with Labeled Data

αmaxHp,q(y|x, zmax) + αminHp,q(y|x, zmin)

<latexit sha1_base64="CTyL0GiHpBuDE9+L99VekVIDKxM=">AAACSHicdZBLSwMxFIUz9VXrq+rSTbAIiiIzIuiy6KZLBauFtg530tQGM5kxuSOt4/w8Ny7d+RvcuFDEnWntwla9EPg45x6SnCCWwqDrPju5icmp6Zn8bGFufmFxqbi8cm6iRDNeZZGMdC0Aw6VQvIoCJa/FmkMYSH4RXB/3/Ytbro2I1Bn2Yt4M4UqJtmCAVvKLfgNk3IHLtIG8i2kI3Syr+Gm8c5Nt9u67O3cjzhbdpmMBof4L9J0tv1hyd93B0N/gDaFEhnPiF58arYglIVfIJBhT99wYmyloFEzyrNBIDI+BXcMVr1tUEHLTTAdFZHTDKi3ajrQ9CulA/ZlIITSmFwZ2MwTsmHGvL/7l1RNsHzZToeIEuWLfF7UTSTGi/VZpS2jOUPYsANPCvpWyDmhgaLsv2BK88S//hvO9Xc/y6X6pfDSsI0/WyDrZJB45IGVSISekShh5IC/kjbw7j86r8+F8fq/mnGFmlYxMLvcFK2q1gQ==</latexit> <latexit sha1_base64="CTyL0GiHpBuDE9+L99VekVIDKxM=">AAACSHicdZBLSwMxFIUz9VXrq+rSTbAIiiIzIuiy6KZLBauFtg530tQGM5kxuSOt4/w8Ny7d+RvcuFDEnWntwla9EPg45x6SnCCWwqDrPju5icmp6Zn8bGFufmFxqbi8cm6iRDNeZZGMdC0Aw6VQvIoCJa/FmkMYSH4RXB/3/Ytbro2I1Bn2Yt4M4UqJtmCAVvKLfgNk3IHLtIG8i2kI3Syr+Gm8c5Nt9u67O3cjzhbdpmMBof4L9J0tv1hyd93B0N/gDaFEhnPiF58arYglIVfIJBhT99wYmyloFEzyrNBIDI+BXcMVr1tUEHLTTAdFZHTDKi3ajrQ9CulA/ZlIITSmFwZ2MwTsmHGvL/7l1RNsHzZToeIEuWLfF7UTSTGi/VZpS2jOUPYsANPCvpWyDmhgaLsv2BK88S//hvO9Xc/y6X6pfDSsI0/WyDrZJB45IGVSISekShh5IC/kjbw7j86r8+F8fq/mnGFmlYxMLvcFK2q1gQ==</latexit> <latexit sha1_base64="CTyL0GiHpBuDE9+L99VekVIDKxM=">AAACSHicdZBLSwMxFIUz9VXrq+rSTbAIiiIzIuiy6KZLBauFtg530tQGM5kxuSOt4/w8Ny7d+RvcuFDEnWntwla9EPg45x6SnCCWwqDrPju5icmp6Zn8bGFufmFxqbi8cm6iRDNeZZGMdC0Aw6VQvIoCJa/FmkMYSH4RXB/3/Ytbro2I1Bn2Yt4M4UqJtmCAVvKLfgNk3IHLtIG8i2kI3Syr+Gm8c5Nt9u67O3cjzhbdpmMBof4L9J0tv1hyd93B0N/gDaFEhnPiF58arYglIVfIJBhT99wYmyloFEzyrNBIDI+BXcMVr1tUEHLTTAdFZHTDKi3ajrQ9CulA/ZlIITSmFwZ2MwTsmHGvL/7l1RNsHzZToeIEuWLfF7UTSTGi/VZpS2jOUPYsANPCvpWyDmhgaLsv2BK88S//hvO9Xc/y6X6pfDSsI0/WyDrZJB45IGVSISekShh5IC/kjbw7j86r8+F8fq/mnGFmlYxMLvcFK2q1gQ==</latexit> <latexit sha1_base64="CTyL0GiHpBuDE9+L99VekVIDKxM=">AAACSHicdZBLSwMxFIUz9VXrq+rSTbAIiiIzIuiy6KZLBauFtg530tQGM5kxuSOt4/w8Ny7d+RvcuFDEnWntwla9EPg45x6SnCCWwqDrPju5icmp6Zn8bGFufmFxqbi8cm6iRDNeZZGMdC0Aw6VQvIoCJa/FmkMYSH4RXB/3/Ytbro2I1Bn2Yt4M4UqJtmCAVvKLfgNk3IHLtIG8i2kI3Syr+Gm8c5Nt9u67O3cjzhbdpmMBof4L9J0tv1hyd93B0N/gDaFEhnPiF58arYglIVfIJBhT99wYmyloFEzyrNBIDI+BXcMVr1tUEHLTTAdFZHTDKi3ajrQ9CulA/ZlIITSmFwZ2MwTsmHGvL/7l1RNsHzZToeIEuWLfF7UTSTGi/VZpS2jOUPYsANPCvpWyDmhgaLsv2BK88S//hvO9Xc/y6X6pfDSsI0/WyDrZJB45IGVSISekShh5IC/kjbw7j86r8+F8fq/mnGFmlYxMLvcFK2q1gQ==</latexit>

Input Image Max Xentropy Min Xentropy Max-Min Xentropy Shared weights

  • Max cross-entropy maximizes the posteriors of correct labels.

Min cross-entropy minimizes the posteriors of incorrect labels.

  • Co-learning: Max and Min networks try to learn from each other
slide-30
SLIDE 30

STATISTICAL GUARANTEES FOR THE NRM

Bound on the generalization error Risk ≤

!"#$%& () *+,-.% &%/0%&-/1 2*,34 /5/7

  • Rendering path normalization:
  • new form of regularization

Training loss in the CNNs equivalent to likelihood in NRM

Max-Min NRM with RPN achieves SOTA on benchmarks

Rendering Path

slide-31
SLIDE 31

EMPIRICAL RESULTS

Max-Min NRM achieves SOTA on semi-supervised & supervised learning

Semi-Supervised Learning

  • NRM + Max-Min improves SOTA by 0.7%-1.8% on CIFAR10, CIFAR100, SVHN.
  • Especially in low labeled data setting.

Supervised Learning

  • Max-Min improves SOTA on CIFAR10 by 0.26% and on ImageNet by 0.17% (top

5 error)

slide-32
SLIDE 32

TENSOR METHODS

slide-33
SLIDE 33

Images: 3 dimensions Videos: 4 dimensions

TENSORS FOR MULTI-DIMENSIONAL DATA AND HIGHER ORDER MOMENTS

Pairwise correlations Triplet correlations

slide-34
SLIDE 34

OPERATIONS ON TENSORS: TENSOR CONTRACTION

Tensor Contraction

Extends the notion of matrix product Matrix product Mv =

  • j

vjMj

=

+

Tensor Contraction T(u, v, ·) =

  • i,j

uivjTi,j,:

=

+ + +

slide-35
SLIDE 35

UNSUPERVISED LEARNING OF TOPIC MODELS THROUGH TENSOR METHODS

Justice Education Sports Topics

slide-36
SLIDE 36

TENSOR-BASED LDA TRAINING IS FASTER

0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 90.00 5 10 15 20 25 30 50 75 100

Time in minutes Number of Topics Training time for NYTimes Spectral Time(minutes) Mallet Time (minutes)

0.00 50.00 100.00 150.00 200.00 250.00 5 10 15 20 25 50 100

Time in minutes Number of Topics Training time for PubMed Spectral Time (minutes) Mallet Time (minutes)

8 million documents

22x faster on average 12x faster on average

  • Mallet is an open-source framework for topic modeling
  • Benchmarks on AWS SageMaker Platform
  • Bulit into AWS Comprehend NLP service.

300000 documents

slide-37
SLIDE 37

TENSORLY: HIGH-LEVEL API FOR TENSOR ALGEBRA

  • Python programming
  • User-friendly API
  • Multiple backends:

flexible + scalable

  • Example notebooks in

repository

slide-38
SLIDE 38

A New Vision for Autonomy

Center for Autonomous Systems and Technologies

slide-39
SLIDE 39

CHALLENGES IN LANDING A QUADROTOR DRONE

  • Unknown aerodynamic forces & moments.
  • Example 1: when drone is close to

ground.

  • Example 2: as velocity goes up, air drag.
  • Example 3: external wind conditions.

Wind generation in CALTECH CAST wind tunnel

slide-40
SLIDE 40

CHALLENGES IN USING DNNS TO LEARN UNKNOWN DYNAMICS

  • Our idea: use DNNs to learn unknown aerodynamic forces and then design

nonlinear controller to cancel it (unknown moments are very limited in landing)

  • Challenge 1: DNNs are data-hungry
  • Challenge 2: DNNs can be unstable and generate unpredictable output
  • Challenge 3: DNNs are difficult to analyze and design provably stable controller

based on them

  • Our approach: using Spectral Normalization to control Lipschitz property of DNNs

and then design stable nonlinear controller (Neural-Lander)

slide-41
SLIDE 41

FIRST SET OF RESULTS: LEARNING TO LAND

slide-42
SLIDE 42
slide-43
SLIDE 43

SOME RESEARCH LEADERS AT NVIDIA

Robotics

Dieter Fox

Learning & Perception

Jan Kautz Bill Dally Dave Luebke Alex Keller Aaron Lefohn

Graphics

Steve Keckler Dave Nellans Mike O’Connor

Architecture Programming

Michael Garland

VLSI

Brucek Khailany

Circuits

Tom Gray

Networks

Larry Dennison

Chief Scientist Computer vision Core ML

Sanja Fidler Me !

Applied research

Bryan Catanzaro

slide-44
SLIDE 44

ALGORITHMS

  • OPTIMIZATION
  • SCALABILITY
  • MULTI-DIMENSIONALITY

DATA

  • COLLECTION
  • AGGREGATION
  • AUGMENTATION

INFRASTRUCTURE FULL STACK FOR ML

  • APPLICATION SERVICES
  • ML PLATFORM
  • GPUS

TRINITY FUELING ARTIFICIAL INTELLIGENCE

slide-45
SLIDE 45

COLLABORATORS (LIMITED LIST)

slide-46
SLIDE 46

Thank you