CSC 2547: Machine Learning for Vision as Inverse Graphics Anthony - - PowerPoint PPT Presentation

csc 2547 machine learning for vision as inverse graphics
SMART_READER_LITE
LIVE PREVIEW

CSC 2547: Machine Learning for Vision as Inverse Graphics Anthony - - PowerPoint PPT Presentation

CSC 2547: Machine Learning for Vision as Inverse Graphics Anthony Bonner www.cs.toronto.edu/~bonner Paper Presentations Each week will focus on one topic, as listed on the course web page (soon). You can vote for your choice of


slide-1
SLIDE 1

CSC 2547: Machine Learning for Vision as Inverse Graphics

Anthony Bonner www.cs.toronto.edu/~bonner

slide-2
SLIDE 2

Paper Presentations

  • Each week will focus on one topic, as listed on

the course web page (soon).

  • You can vote for your choice of topic/week

(soon).

  • I will assign you to a week (soon).
  • Papers on each topic will be listed on the

course web page.

  • If you have a particular paper you would like

to add to the list, please let me know.

slide-3
SLIDE 3

Paper Presentations

  • Goal: high quality, accessible tutorials.
  • 7 weeks and 44 students = 6 or 7 students per

week and about 15 minutes per student.

  • 2-week planning cycle:

– 2 weeks before your presentation, meet me after class to discuss and assign papers. – The following week, meet the TA for a practice presentation (required). – Present in class under strict time constraints.

slide-4
SLIDE 4

Team Presentatations

  • Papers may be presented in teams of two or more

with longer presentations (15 minutes per team member).

  • Unless a paper is particularly difficult or long, a team

will be expected to cover more than one paper (one paper per team member).

  • A team may cover one of the listed papers and one
  • r more of its references (but see me first).
slide-5
SLIDE 5

Tentative Topics

  • Discriminative approaches.
  • Generative approaches.
  • Differentiable rendering.
  • Capsule networks
  • Group symmetries and equivariance
  • Visual attention mechanisms
  • Adversarial methods
slide-6
SLIDE 6

Project Ideas

  • Improve upon the work in a paper

– Even a small improvement is OK

  • For example,

– Make a generative model conditional – Disentangle (some) latent variables – Adapt a method to new circumstances

  • Different kinds of data
  • Missing or noisy data

– Make a supervised method semi-supervised

slide-7
SLIDE 7

Project Ideas

  • Examples (continued)

– Modify the cost function

  • Introduce learnable parameters into a cost function
  • Use an adversarial cost
  • Try a variation on KL divergence

– Modify the latent priors

  • Make the prior learnable
  • Do not assume Gaussianity

– Modify the variational assumptions

  • Do not assume complete independence
  • Do not assume Gaussianity
slide-8
SLIDE 8

Project Ideas

  • Implement and compare different methods

for the same problem (e.g., different methods for inferring 3D structure)

– Clearly and succinctly describe each method – Clearly articulate their differences – Describe their strengths and weaknesses – Ideally, include experiments highlighting the differences between the methods on realistic problems.

slide-9
SLIDE 9

Project Considerations

  • Is your idea sensible?
  • Can you download all the necessary data?
  • Do you have the computational resources

(GPUs)?

  • Do you have time to complete it?
  • Start by duplicating the results in the paper

(if the paper gives enough details).

slide-10
SLIDE 10

Project Dates

  • Proposal due February 18

– about 2 pages – include preliminary literature search

  • Project presentations: March 24 and 31

– about 5 minutes per student (like “spotlight presentations” at a conference)

  • Project due: April 12

– project report (4-8 pages) and code

slide-11
SLIDE 11

Generative Approaches

  • Given a scene, s, a graphics program, G,

produces an image, G(s).

  • Given an image, x, find s such that G(s) ≈ x
  • More generally, find P(s|x),.
  • P(s|x) is high when G(s) is close to x.
slide-12
SLIDE 12

Variational Approximations

  • Finding P(s|x) is intractable in general.
  • Use variational approximations.
  • Variational auto-encoders work very well.
  • G can be a neural net that we learn

(unsupervised).

  • Computationally intensive.
slide-13
SLIDE 13

Variational Autoencoders

5x5 conv 5x5 conv 5x5 conv 64x64x3 32x32x64 16x16x128 8x8x256 Volume Generator Perspective Transformer 1x1x 512 latent unit 1x1x1024 1x32x32x32 6x6x6 conv 4x4 transformation 1x32x32 Encoder Decoder 1x1x1024 512x3x3x3 256x6x6x6 96x15x15x15 4x4x4 conv 5x5x5 conv

Τθ(G)

Grid generator Sampler 1x32x32x32 Input image Target projection

From Yan et al, Perspective Transformer Nets, arXiv 2017

slide-14
SLIDE 14

Disentangled Representations

z (handwriting style) y (digit label)

Disentangled Representation

the

From Siddharth et al, Semi-supervised Deep Generative Models, NIPS 2017

slide-15
SLIDE 15

Disentangled Representations

Pose manifold coordinates Identity manifold coordinates Input

Input images

Fixed ID Fixed Pose

Learning

From Reed et al, Learning to Disentangle Factors of Variation, ICML 2014

slide-16
SLIDE 16

Learning 3D Shape

From Yan et al, Perspective Transformer Nets, arXiv 2017

slide-17
SLIDE 17

Learning 3D Structure

From Niu et al, Im2Struct: recovering 3D Shape Structure, CVPR 2018

slide-18
SLIDE 18

Scene Understanding

From Wu et al, Neural Scene De-rendering, CVPR 2017

slide-19
SLIDE 19

Scene Understanding

From Huang et al, Occlusion Aware Generative Models, ICLR 2016

slide-20
SLIDE 20

Conditional Image Generation

ground

  • truth

NN CVAE

From Sohn et al, Deep Conditional Generative Models, NIPS 2015

slide-21
SLIDE 21

Conditional Image Generation

From Ivanov et al, Variational Autoencoder with Arbitrary Conditioning, ICLR 2019

slide-22
SLIDE 22

Attribute Conditioned Image Generation

age: young gender: female hair color: brown expression: smile 0.9 0.9 1.3 1.3

  • 0.4

0.4 0.8 0.8 ? ? ? ? viewpoint background lighting … brown

From Yan et al, Attribute2Image: Conditional Image Generation, arXiv 2016

slide-23
SLIDE 23

Making Visual Analogies

  • Given images A, B, C, generate image D so

that D is to C as B is to A. Infer Relationship Transform query

From Reed et al, Deep Visual Analogy-Making, NIPS 2015