Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies
ZOOM, ENHANCE, SYNTHESIZE! MAGIC UPSCALING AND MATERIAL SYNTHESIS - - PowerPoint PPT Presentation
ZOOM, ENHANCE, SYNTHESIZE! MAGIC UPSCALING AND MATERIAL SYNTHESIS - - PowerPoint PPT Presentation
ZOOM, ENHANCE, SYNTHESIZE! MAGIC UPSCALING AND MATERIAL SYNTHESIS USING DEEP LEARNING Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies DEEP LEARNING FOR ART Active R&D but ready now Style transfer Generative
2
DEEP LEARNING FOR ART
Active R&D but ready now
▪ Style transfer ▪ Generative networks creating images and voxels
▪ Adversarial networks (DCGAN) – still early but promising
▪ DL & ML based tools from NVIDIA and partners
▪ NVIDIA ▪ Artomatix ▪ Allegorithmic ▪ Autodesk
3
STYLE TRANSFER
Something Fun
Content Style
▪ Doodle a masterpiece! ▪ Uses CNN to take the “style” from one image and apply it to another
▪ Sept 2015: A Neural Algorithm of Artistic Style by Gatys et al ▪ Dec 2015: neural-style (github) ▪ Mar 2016: neural-doodle (github) ▪ Mar 2016: texture-nets (github) ▪ Oct 2016: fast-neural-style (github) ▪ 2 May 2017 (last week!): Deep Image Analogy (arXiv)
▪ Also numerous services: Vinci, Prisma, Artisto, Ostagram
4
HTTP://OSTAGRAM.RU/STATIC_PAGES/LENTA
5
STYLE TRANSFER
▪ Game remaster & texture enhancement
▪ Try Neural Style and use a real-world photo for the “style” ▪ For stylized or anime up-rez try https://github.com/nagadomi/waifu2x
▪ Experiment with art styles ▪ Dream or power-up sequences
▪ “Come Swim” by Kirsten Stewart - https://arxiv.org/pdf/1701.04928v1.pdf
Something Useful
6
GAMEWORKS: MATERIALS & TEXTURES
Using DL for Game Development & Content Creation
▪ Set of tools targeting the game industry using machine learning and deep learning ▪ Launched at Game Developer Conference in March, tools run as a web service ▪ Sign up for the Beta at: https://gwmt.nvidia.com ▪ Tools in this initial release:
▪ Photo to Material: 2shot ▪ Texture Multiplier ▪ Super-Resolution
7
PHOTO TO MATERIAL
▪ From two photos of a surface, generate a “material” ▪ Based on a SIGGRAPH 2015 paper by NVIDIA Research & Aalto University (Finland)
▪ “Two-Shot SVBRDF Capture for Stationary Materials” ▪ https://mediatech.aalto.fi/publications/graphics/TwoShotSVBRDF/
▪ Input is pixel aligned “flash” and “guide” photographs
▪ Use tripod and remote shutter or bracket ▪ Or align later
▪ Use for flat surfaces with repeating patterns
The 2Shot Tool
8
MATERIAL SYNTHESIS FROM TWO PHOTOS
Flash image Guide image Diffuse albedo Specular Normals Glossiness Anisotropy
9
TEXTURE MULTIPLIER
▪ Put simply: texture in, new texture out ▪ Inspired by Gatys, Ecker & Bethge
▪ Texture Synthesis Using Convolutional Neural Networks ▪ https://arxiv.org/pdf/1505.07376.pdf
▪ Artomatix
▪ Similar product “Texture Mutation” ▪ https://artomatix.com/
Organic variations of textures
10
SUPER RESOLUTION
11
SUPER RESOLUTION
Zoom.. ENHANCE! Zoom in on the license plate OK! Sure! Can you enhance that?
12
SUPER RESOLUTION
The task at hand
Upscale (magic?)
W H Given a low-resolution image n * W n * H Construct a high-resolution image
13
UPSCALE: CREATE MORE PIXELS
An ill-posed task?
Pixels of the upscaled image Pixels of the given image ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
14
TRADITIONAL APPROACH
▪ Interpolation (bicubic, lanczos, etc.) ▪ Interpolation + Sharpening (and other filtration)
Filter-based sharpening Interpolation
▪ Rough estimation of the data behavior too general ▪ Too many possibilities (8x8 grayscale has 256(8∗8) ≈ 10153 pixel combinations!)
15
A NEW APPROACH
First: narrow the possible set
Photos Textures
All possible images
Focus on the domain of “natural images”
Natural images
16
A NEW APPROACH
Data from natural images is sparse, it’s compressible in some domain Then “reconstruct” images (rather than create new ones)
Second: Place image in the domain, then reconstruct
+ prior information + constraints
Reconstruct Compress
17
PATCH-BASED MAPPING: TRAINING
Model
params Mapping Training images , LR,HR pairs of patches training Low-resolution patch High-resolution patch
18
PATCH-BASED MAPPING
LR patch HR patch Encode Decode 𝒚𝑴 𝒚𝑰
High-level information about the patch
19
PATCH-BASED MAPPING: SPARSE CODING
LR patch HR patch Sparse code Encode Decode 𝒚𝑴 𝒚𝑰
High-level information about the patch “Features”
20
PATCH FEATURES & RECONSTRUCTION
𝒚 = 𝑬𝒜 = 𝒆𝟐𝒜𝟐 + ⋯ + 𝒆𝑳𝒜𝑳
= 0.8 * + 0.3 * + 0.5 *
𝑬
𝒆𝟒𝟕 𝒆𝟓𝟑 𝒆𝟕𝟒 𝒚
Image patch can be reconstructed as a sparse linear combination of features Features are learned from the dataset over time
𝒜 𝒚 𝑬 - dictionary
- patch
- sparse code
21
GENERALIZED PATCH-BASED MAPPING
Mapping Mapping LR patch HR patch High-level representation of the LR patch
“Features”
High-level representation of the HR patch Mapping in feature space
22
GENERALIZED PATCH-BASED MAPPING
Mapping in feature space Mapping Mapping LR patch HR patch
Trainable parameters
𝑋
1
𝑋
2
𝑋
3
23
MAPPING OF THE WHOLE IMAGE
Using Convolutions
LR image HR image Mapping in feature space Mapping Mapping
Convolutional operators
24
AUTO-ENCODERS
input
- utput ≈ input
25
AUTO-ENCODER
input features
Encode
- utput ≈ input
Decode
26
AUTO-ENCODER
𝑦 𝑧
Parameters
𝑋
Training 𝑋 = 𝑏𝑠𝑛𝑗𝑜
𝑗
𝐸𝑗𝑡𝑢(𝑦𝑗 , 𝐺
𝑋 𝑦𝑗 )
𝑧 = 𝐺
𝑋(𝑦)
Inference
𝑦𝑗 - training set
27
AUTO-ENCODER
input
Encode
information loss ▪ Our encoder is LOSSY by definition
28
SUPER-RESOLUTION AUTO-ENCODER
Training 𝑧 = 𝐺
𝑋(𝑦)
Inference
𝑦𝑗 - training set
𝑋 = 𝑏𝑠𝑛𝑗𝑜
𝑗
𝐸𝑗𝑡𝑢(𝑦𝑗 , 𝐺
𝑋 𝑦𝑗 )
𝑦 𝑧
Parameters
𝑋
29
𝑋 = 𝑏𝑠𝑛𝑗𝑜
𝑗
𝐸𝑗𝑡𝑢(𝑦𝑗 , 𝐺
𝑋 𝐸(𝑦𝑗) )
SUPER RESOLUTION AE: TRAINING
y 𝑦𝑗
- training set
Ground-truth HR image Downscaling LR image SR AE Reconstructed HR image
𝑦 𝐺W 𝐸 ො 𝑦
𝑋
30
SUPER RESOLUTION AE: INFERENCE
Given LR image Constructed HR image
y ො 𝑦
𝑧 = 𝐺𝑋(ො 𝑦)
SR AE
𝐺W
𝑋
31
SUPER-RESOLUTION: ILL-POSED TASK?
32
THE LOSS FUNCTION
33
THE LOSS FUNCTION
Distance function is a key element to obtaining good results.
Measuring the “distance” from a good result 𝑋 = 𝑏𝑠𝑛𝑗𝑜
𝑗
𝐸 𝑦𝑗, 𝐺
𝑋(𝑦𝑗 )
Choice of the loss function is an important decision
34
LOSS FUNCTION
1 𝑂 𝑦 − 𝐺 𝑦
2
MSE
Mean Squared Error
35
LOSS FUNCTION: PSNR
1 𝑂 𝑦 − 𝐺 𝑦
2
MSE
Mean Squared Error
PSNR
Peak Signal-to-Noise Ratio
10 ∗ 𝑚𝑝10 𝑁𝐵𝑌2 𝑁𝑇𝐹
36
LOSS FUNCTION: HFEN
1 𝑂 𝑦 − 𝐺 𝑦
2
MSE
Mean Squared Error
PSNR
Peak Signal-to-Noise Ratio
10 ∗ 𝑚𝑝10 𝑁𝐵𝑌2 𝑁𝑇𝐹 𝐼𝑄(𝑦 − 𝐺 𝑦 ) 2
HFEN(see A)
High Frequency Error Norm High-Pass filter
Perceptual loss
Ref A: http://ieeexplore.ieee.org/document/5617283/
37
REGULAR LOSS
Result 4x Result 4x
38
REGULAR LOSS + PERCEPTUAL LOSS
Result 4x Result 4x
39
WARNING… THIS IS EXPERIMENTAL!
40
SUPER-RESOLUTION: GAN-BASED LOSS
Total loss = Regular (MSE+PSNR+HFEN) loss + GAN loss Generator Discriminator
𝑦 𝐺(𝑦) 𝐸(𝑧) 𝑧 = −𝑚𝑜𝐸(𝐺 𝑦 )
GAN loss
real fake
Extended presentation from Game Developer Conference 2017 https://developer.nvidia.com/deep-learning-games GameWorks: Materials & Textures https://gwmt.nvidia.com