ZOOM, ENHANCE, SYNTHESIZE! MAGIC UPSCALING AND MATERIAL SYNTHESIS - - PowerPoint PPT Presentation

zoom enhance synthesize magic upscaling and material
SMART_READER_LITE
LIVE PREVIEW

ZOOM, ENHANCE, SYNTHESIZE! MAGIC UPSCALING AND MATERIAL SYNTHESIS - - PowerPoint PPT Presentation

ZOOM, ENHANCE, SYNTHESIZE! MAGIC UPSCALING AND MATERIAL SYNTHESIS USING DEEP LEARNING Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies DEEP LEARNING FOR ART Active R&D but ready now Style transfer Generative


slide-1
SLIDE 1

Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

ZOOM, ENHANCE, SYNTHESIZE! MAGIC UPSCALING AND MATERIAL SYNTHESIS USING DEEP LEARNING

slide-2
SLIDE 2

2

DEEP LEARNING FOR ART

Active R&D but ready now

▪ Style transfer ▪ Generative networks creating images and voxels

▪ Adversarial networks (DCGAN) – still early but promising

▪ DL & ML based tools from NVIDIA and partners

▪ NVIDIA ▪ Artomatix ▪ Allegorithmic ▪ Autodesk

slide-3
SLIDE 3

3

STYLE TRANSFER

Something Fun

Content Style

▪ Doodle a masterpiece! ▪ Uses CNN to take the “style” from one image and apply it to another

▪ Sept 2015: A Neural Algorithm of Artistic Style by Gatys et al ▪ Dec 2015: neural-style (github) ▪ Mar 2016: neural-doodle (github) ▪ Mar 2016: texture-nets (github) ▪ Oct 2016: fast-neural-style (github) ▪ 2 May 2017 (last week!): Deep Image Analogy (arXiv)

▪ Also numerous services: Vinci, Prisma, Artisto, Ostagram

slide-4
SLIDE 4

4

HTTP://OSTAGRAM.RU/STATIC_PAGES/LENTA

slide-5
SLIDE 5

5

STYLE TRANSFER

▪ Game remaster & texture enhancement

▪ Try Neural Style and use a real-world photo for the “style” ▪ For stylized or anime up-rez try https://github.com/nagadomi/waifu2x

▪ Experiment with art styles ▪ Dream or power-up sequences

▪ “Come Swim” by Kirsten Stewart - https://arxiv.org/pdf/1701.04928v1.pdf

Something Useful

slide-6
SLIDE 6

6

GAMEWORKS: MATERIALS & TEXTURES

Using DL for Game Development & Content Creation

▪ Set of tools targeting the game industry using machine learning and deep learning ▪ Launched at Game Developer Conference in March, tools run as a web service ▪ Sign up for the Beta at: https://gwmt.nvidia.com ▪ Tools in this initial release:

▪ Photo to Material: 2shot ▪ Texture Multiplier ▪ Super-Resolution

slide-7
SLIDE 7

7

PHOTO TO MATERIAL

▪ From two photos of a surface, generate a “material” ▪ Based on a SIGGRAPH 2015 paper by NVIDIA Research & Aalto University (Finland)

▪ “Two-Shot SVBRDF Capture for Stationary Materials” ▪ https://mediatech.aalto.fi/publications/graphics/TwoShotSVBRDF/

▪ Input is pixel aligned “flash” and “guide” photographs

▪ Use tripod and remote shutter or bracket ▪ Or align later

▪ Use for flat surfaces with repeating patterns

The 2Shot Tool

slide-8
SLIDE 8

8

MATERIAL SYNTHESIS FROM TWO PHOTOS

Flash image Guide image Diffuse albedo Specular Normals Glossiness Anisotropy

slide-9
SLIDE 9

9

TEXTURE MULTIPLIER

▪ Put simply: texture in, new texture out ▪ Inspired by Gatys, Ecker & Bethge

▪ Texture Synthesis Using Convolutional Neural Networks ▪ https://arxiv.org/pdf/1505.07376.pdf

▪ Artomatix

▪ Similar product “Texture Mutation” ▪ https://artomatix.com/

Organic variations of textures

slide-10
SLIDE 10

10

SUPER RESOLUTION

slide-11
SLIDE 11

11

SUPER RESOLUTION

Zoom.. ENHANCE! Zoom in on the license plate OK! Sure! Can you enhance that?

slide-12
SLIDE 12

12

SUPER RESOLUTION

The task at hand

Upscale (magic?)

W H Given a low-resolution image n * W n * H Construct a high-resolution image

slide-13
SLIDE 13

13

UPSCALE: CREATE MORE PIXELS

An ill-posed task?

Pixels of the upscaled image Pixels of the given image ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

slide-14
SLIDE 14

14

TRADITIONAL APPROACH

▪ Interpolation (bicubic, lanczos, etc.) ▪ Interpolation + Sharpening (and other filtration)

Filter-based sharpening Interpolation

▪ Rough estimation of the data behavior  too general ▪ Too many possibilities (8x8 grayscale has 256(8∗8) ≈ 10153 pixel combinations!)

slide-15
SLIDE 15

15

A NEW APPROACH

First: narrow the possible set

Photos Textures

All possible images

Focus on the domain of “natural images”

Natural images

slide-16
SLIDE 16

16

A NEW APPROACH

Data from natural images is sparse, it’s compressible in some domain Then “reconstruct” images (rather than create new ones)

Second: Place image in the domain, then reconstruct

+ prior information + constraints

Reconstruct Compress

slide-17
SLIDE 17

17

PATCH-BASED MAPPING: TRAINING

Model

params Mapping Training images , LR,HR pairs of patches training Low-resolution patch High-resolution patch

slide-18
SLIDE 18

18

PATCH-BASED MAPPING

LR patch HR patch Encode Decode 𝒚𝑴 𝒚𝑰

High-level information about the patch

slide-19
SLIDE 19

19

PATCH-BASED MAPPING: SPARSE CODING

LR patch HR patch Sparse code Encode Decode 𝒚𝑴 𝒚𝑰

High-level information about the patch “Features”

slide-20
SLIDE 20

20

PATCH FEATURES & RECONSTRUCTION

𝒚 = 𝑬𝒜 = 𝒆𝟐𝒜𝟐 + ⋯ + 𝒆𝑳𝒜𝑳

= 0.8 * + 0.3 * + 0.5 *

𝑬

𝒆𝟒𝟕 𝒆𝟓𝟑 𝒆𝟕𝟒 𝒚

Image patch can be reconstructed as a sparse linear combination of features Features are learned from the dataset over time

𝒜 𝒚 𝑬 - dictionary

  • patch
  • sparse code
slide-21
SLIDE 21

21

GENERALIZED PATCH-BASED MAPPING

Mapping Mapping LR patch HR patch High-level representation of the LR patch

“Features”

High-level representation of the HR patch Mapping in feature space

slide-22
SLIDE 22

22

GENERALIZED PATCH-BASED MAPPING

Mapping in feature space Mapping Mapping LR patch HR patch

Trainable parameters

𝑋

1

𝑋

2

𝑋

3

slide-23
SLIDE 23

23

MAPPING OF THE WHOLE IMAGE

Using Convolutions

LR image HR image Mapping in feature space Mapping Mapping

Convolutional operators

slide-24
SLIDE 24

24

AUTO-ENCODERS

input

  • utput ≈ input
slide-25
SLIDE 25

25

AUTO-ENCODER

input features

Encode

  • utput ≈ input

Decode

slide-26
SLIDE 26

26

AUTO-ENCODER

𝑦 𝑧

Parameters

𝑋

Training 𝑋 = 𝑏𝑠𝑕𝑛𝑗𝑜 ෍

𝑗

𝐸𝑗𝑡𝑢(𝑦𝑗 , 𝐺

𝑋 𝑦𝑗 )

𝑧 = 𝐺

𝑋(𝑦)

Inference

𝑦𝑗 - training set

slide-27
SLIDE 27

27

AUTO-ENCODER

input

Encode

information loss ▪ Our encoder is LOSSY by definition

slide-28
SLIDE 28

28

SUPER-RESOLUTION AUTO-ENCODER

Training 𝑧 = 𝐺

𝑋(𝑦)

Inference

𝑦𝑗 - training set

𝑋 = 𝑏𝑠𝑕𝑛𝑗𝑜 ෍

𝑗

𝐸𝑗𝑡𝑢(𝑦𝑗 , 𝐺

𝑋 𝑦𝑗 )

𝑦 𝑧

Parameters

𝑋

slide-29
SLIDE 29

29

𝑋 = 𝑏𝑠𝑕𝑛𝑗𝑜 ෍

𝑗

𝐸𝑗𝑡𝑢(𝑦𝑗 , 𝐺

𝑋 𝐸(𝑦𝑗) )

SUPER RESOLUTION AE: TRAINING

y 𝑦𝑗

  • training set

Ground-truth HR image Downscaling LR image SR AE Reconstructed HR image

𝑦 𝐺W 𝐸 ො 𝑦

𝑋

slide-30
SLIDE 30

30

SUPER RESOLUTION AE: INFERENCE

Given LR image Constructed HR image

y ො 𝑦

𝑧 = 𝐺𝑋(ො 𝑦)

SR AE

𝐺W

𝑋

slide-31
SLIDE 31

31

SUPER-RESOLUTION: ILL-POSED TASK?

slide-32
SLIDE 32

32

THE LOSS FUNCTION

slide-33
SLIDE 33

33

THE LOSS FUNCTION

Distance function is a key element to obtaining good results.

Measuring the “distance” from a good result 𝑋 = 𝑏𝑠𝑕𝑛𝑗𝑜 ෍

𝑗

𝐸 𝑦𝑗, 𝐺

𝑋(𝑦𝑗 )

Choice of the loss function is an important decision

slide-34
SLIDE 34

34

LOSS FUNCTION

1 𝑂 𝑦 − 𝐺 𝑦

2

MSE

Mean Squared Error

slide-35
SLIDE 35

35

LOSS FUNCTION: PSNR

1 𝑂 𝑦 − 𝐺 𝑦

2

MSE

Mean Squared Error

PSNR

Peak Signal-to-Noise Ratio

10 ∗ 𝑚𝑝𝑕10 𝑁𝐵𝑌2 𝑁𝑇𝐹

slide-36
SLIDE 36

36

LOSS FUNCTION: HFEN

1 𝑂 𝑦 − 𝐺 𝑦

2

MSE

Mean Squared Error

PSNR

Peak Signal-to-Noise Ratio

10 ∗ 𝑚𝑝𝑕10 𝑁𝐵𝑌2 𝑁𝑇𝐹 𝐼𝑄(𝑦 − 𝐺 𝑦 ) 2

HFEN(see A)

High Frequency Error Norm High-Pass filter

Perceptual loss

Ref A: http://ieeexplore.ieee.org/document/5617283/

slide-37
SLIDE 37

37

REGULAR LOSS

Result 4x Result 4x

slide-38
SLIDE 38

38

REGULAR LOSS + PERCEPTUAL LOSS

Result 4x Result 4x

slide-39
SLIDE 39

39

WARNING… THIS IS EXPERIMENTAL!

slide-40
SLIDE 40

40

SUPER-RESOLUTION: GAN-BASED LOSS

Total loss = Regular (MSE+PSNR+HFEN) loss + GAN loss Generator Discriminator

𝑦 𝐺(𝑦) 𝐸(𝑧) 𝑧 = −𝑚𝑜𝐸(𝐺 𝑦 )

GAN loss

real fake

slide-41
SLIDE 41

Extended presentation from Game Developer Conference 2017 https://developer.nvidia.com/deep-learning-games GameWorks: Materials & Textures https://gwmt.nvidia.com

QUESTIONS?