Synthesizing Images and Videos Qifeng Chen Assistant Professor, - - PowerPoint PPT Presentation

synthesizing images and videos
SMART_READER_LITE
LIVE PREVIEW

Synthesizing Images and Videos Qifeng Chen Assistant Professor, - - PowerPoint PPT Presentation

New Perspectives for Processing and Synthesizing Images and Videos Qifeng Chen Assistant Professor, HKUST Q&A Which company is the most valuable worldwide? Apple What is the most important product of Apple? iPhone What is


slide-1
SLIDE 1

New Perspectives for Processing and Synthesizing Images and Videos

Qifeng Chen Assistant Professor, HKUST

slide-2
SLIDE 2

Q&A

◼Which company is the most valuable worldwide? ◼Apple ◼What is the most important product of Apple? ◼iPhone ◼What is the most differentiable functionality of a

smart phone today?

◼Photography (arguably)

slide-3
SLIDE 3

Low-light Imaging

slide-4
SLIDE 4

Powerful Zoom

slide-5
SLIDE 5

Overview

◼Image and Video Processing

▪Learning to See in the Dark ▪Zoom to Learn, Learn to Zoom ▪Fast Image and Video Processing ▪Reflection Removal

◼Image and Video Synthesis

▪Photographic Image Synthesis ▪Semi-parametric Image Synthesis ▪RGBD Future Video Prediction ▪Fully Automatic Video Colorization

slide-6
SLIDE 6

Image and Video Processing

slide-7
SLIDE 7

Learning to See in the Dark

slide-8
SLIDE 8

Low-light Imaging

A deep learning based Image Signal Processor

Chen Chen, Qifeng Chen, Jia Xu, and Vladlen Koltun. Learning to See in the Dark, CVPR 2018

Learning to See in the Dark

slide-9
SLIDE 9

Dataset

slide-10
SLIDE 10

Amplication Ratio

slide-11
SLIDE 11

Results

slide-12
SLIDE 12

Demo

slide-13
SLIDE 13

Results

slide-14
SLIDE 14

Zoom to Learn, Learn to Zoom

slide-15
SLIDE 15

Data Collection

slide-16
SLIDE 16

Data Collection

slide-17
SLIDE 17

What not just super-resolution with GANs?

◼Existing super-resolution methods

are trained on downsampled RGB images that contain little noise

◼But in 8X digital zoom, noise is

prominent

◼RGB images are the output of ISP

▪High frequency is removed by denoising

◼We train our model to recover

underlying high-frequency details from noisy input

slide-18
SLIDE 18

Contextual Bilateral Loss

Contextual Loss A novel loss (CoBi) for measuring similarity

  • f slightly misaligned image pairs
slide-19
SLIDE 19

Contextual Bilateral Loss

slide-20
SLIDE 20

Results

slide-21
SLIDE 21

Results

slide-22
SLIDE 22

Results

slide-23
SLIDE 23

Results

https://youtu.be/xmCzET2GNk0 https://youtu.be/xmCzET2GNk0

slide-24
SLIDE 24

Going well

slide-25
SLIDE 25

A hazy day

slide-26
SLIDE 26

Dehazed image

Nonlocal Dehazing [Berman et al. 2016]

slide-27
SLIDE 27

But not practical

Nonlocal Dehazing takes a few seconds

slide-28
SLIDE 28

Alternative solutions?

◼Use another method

▪No state-of-the-art accuracy

◼Accelerate implementation

▪Time consuming

◼Nonlinear Function Approximator

▪Simple, general, accurate and fast

slide-29
SLIDE 29

Real-time performance

Our approxminator runs at 30fps

slide-30
SLIDE 30

Fast Image Processing

Qifeng Chen, Jia Xu, and Vladlen Koltun. Fast Image Processing with Fully-Convolutional Networks, ICCV 2017

slide-31
SLIDE 31

Results

slide-32
SLIDE 32

Demo

slide-33
SLIDE 33

Single Image Reflection Removal

slide-34
SLIDE 34

Data Collection

slide-35
SLIDE 35

Method

slide-36
SLIDE 36

Results

slide-37
SLIDE 37

Deep Image and Video Synthesis

slide-38
SLIDE 38

Art by Human Creation

slide-39
SLIDE 39

Art by Human Creation & AI

slide-40
SLIDE 40

Photographic image synthesis

Input semantic layouts Synthesized images

Qifeng Chen and Vladlen Koltun. Photographic Image Synthesis with Cascaded Refinement Networks. ICCV 2017

slide-41
SLIDE 41

Motivation

◼Computer graphics

▪ Alternative route to

photorealism

▪ Capture photographic

appearance

▪ Fast image synthesis

slide-42
SLIDE 42

Motivation

◼Artificial Intelligence

▪ Visual Imagination

slide-43
SLIDE 43

Our approach

◼ Cascaded refinement networks ◼ Perceptual Loss ◼ Diversity

slide-44
SLIDE 44

Cascaded refinement networks

High Resolution

slide-45
SLIDE 45

Perceptual Loss

slide-46
SLIDE 46

Diversity

slide-47
SLIDE 47

Comparisons on Cityscapes

slide-48
SLIDE 48

Results on NYU dataset

Tseung Kwan O, Kowloon

slide-49
SLIDE 49

User Study

slide-50
SLIDE 50

User study

slide-51
SLIDE 51

GTA5 and Demo Video

slide-52
SLIDE 52

Semi-parametric Image Synthesis

Semantic layouts Our result

Xiaojuan Qi, Qifeng Chen, Jiaya Jia, and Vladlen Koltun Semi-parametric Image Synthesis. CVPR 2018

slide-53
SLIDE 53

Image Synthesis

NYU dataset [Silberman et al. ECCV 2012] ADE20K dataset [Zhou et al. 2017]

Semantic layouts Our result

slide-54
SLIDE 54

Prior Work: Parametric Models

CRN [Chen and Koltun 2017] Pix2pix [Isola et al. 2017]

slide-55
SLIDE 55

Prior Work: Non-parametric Models

Scene Completion using Millions of Photographs [Hays and Efros 2007]

slide-56
SLIDE 56

Our Approach

Sky Forest

… … …

Grass

External memory

Mountain

slide-57
SLIDE 57

Our Approach

… … … …

External memory

Sky Forest Grass Mountain

Semantic layout

Sky Forest

Mountain

Grass

slide-58
SLIDE 58

Our Approach

… … … …

External memory

Sky Forest Grass Mountain

Semantic layout

Sky Forest Grass

Mountain

slide-59
SLIDE 59

Our Approach

Stage 1: Canvas Generation Retrieved segments Canvas

slide-60
SLIDE 60

Canvas Final result

Our Approach

Sky Forest Grass Mountain

Semantic layout Stage 2: Image Synthesis

slide-61
SLIDE 61

SIMS: Canvas Generation

Semantic layout External memory

… … Building Car … …

slide-62
SLIDE 62

SIMS: Canvas Generation

Semantic layout External memory Retrieved segments

… … Building Car … … …

slide-63
SLIDE 63

SIMS: Canvas Generation

Semantic layout External memory

… … Building

Transformed segments Transformation network

Car … … …

Retrieved segments

slide-64
SLIDE 64

SIMS: Canvas Generation

Semantic layout External memory

… … Building

Transformed segments Transformation network Ordering network Canvas

Car … … …

Retrieved segments

slide-65
SLIDE 65

SIMS: Image Synthesis

Semantic layout Canvas

slide-66
SLIDE 66

SIMS: Image Synthesis

Semantic layout Canvas

Convolution Pooling Upsampling

Synthesis network f

slide-67
SLIDE 67

SIMS: Image Synthesis

Semantic layout Canvas

Convolution Pooling Upsampling

Output Synthesis network f

slide-68
SLIDE 68

Results

slide-69
SLIDE 69
slide-70
SLIDE 70
slide-71
SLIDE 71
slide-72
SLIDE 72
slide-73
SLIDE 73

Semantic layout

slide-74
SLIDE 74

Pix2pix [Isola et al. 2017]

slide-75
SLIDE 75

CRN [Chen and Koltun 2017]

slide-76
SLIDE 76

Our result

slide-77
SLIDE 77

Diversified Synthesis

slide-78
SLIDE 78

Image Statistics Mean Power Spectrum

Pix2pix [Isola et al. 2017] Real images

slide-79
SLIDE 79

Image Statistics

Real images CRN [Chen and Koltun 2017]

slide-80
SLIDE 80

Image Statistics

Real images Our approach

slide-81
SLIDE 81

Perceptual Experiments

Cityscape s (coarse) Cityscap es (fine) Cityscap es (GTA5) NYU (fine) ADE20K (coarse) Mean SIMS > Pix2pix 94.2% 98.1% 95.7% 94.9% 87.6% 94.1% SIMS > CRN 93.9% 74.1% 84.5% 89.1% 88.9% 86.1%

slide-82
SLIDE 82

Thank You

slide-83
SLIDE 83

Thank You

slide-84
SLIDE 84

Thank You

slide-85
SLIDE 85

Thank You

slide-86
SLIDE 86

Future Prediction

slide-87
SLIDE 87

Video Prediction

slide-88
SLIDE 88

3D Motion Decomposition for RGBD Future Dynamic Scene Synthesis

slide-89
SLIDE 89

3D Motion Decomposition for RGBD Future Dynamic Scene Synthesis

slide-90
SLIDE 90

Results

slide-91
SLIDE 91

Results

slide-92
SLIDE 92

Results

slide-93
SLIDE 93

Video Colorization

slide-94
SLIDE 94

Fully Automatic Video Colorization with Self Regularization and Diversity

slide-95
SLIDE 95

Diversity

slide-96
SLIDE 96

Results

slide-97
SLIDE 97

Thank You

https://cqf.io