Synthesis Deep Learning Use cases at technicolor Louis Chevallier - - PowerPoint PPT Presentation

synthesis
SMART_READER_LITE
LIVE PREVIEW

Synthesis Deep Learning Use cases at technicolor Louis Chevallier - - PowerPoint PPT Presentation

Image Analysis and Synthesis Deep Learning Use cases at technicolor Louis Chevallier Principal Scientist, Research and Innovation technicolor.com GTC Munich - October 2017 PROVIDING THE BUSINESS A COMPETITIVE EDGE EXPLOIT DATA TO OPTIMIZE


slide-1
SLIDE 1

Image Analysis and Synthesis Deep Learning Use cases at technicolor

Louis Chevallier Principal Scientist, Research and Innovation technicolor.com GTC Munich - October 2017

slide-2
SLIDE 2

PROVIDING THE BUSINESS A COMPETITIVE EDGE

PROPOSE NEW TOOLS TO ANALYSE, PROCESS, REPRESENT AND RENDER AUDIOVISUAL CONTENT OFFER PROFESSIONALS AND END-USERS SOLUTIONS TO AMPLIFY THEIR IMMERSIVE EXPERIENCES IMPROVE THE USER EXPERIENCE AT HOME WITH BETTER NETWORK, VIDEO SERVICES AND CONTEXTUAL SOLUTIONS EXPLOIT DATA TO OPTIMIZE AND EXPAND OUR BUSINESS WITH AI- BASED INNOVATION

slide-3
SLIDE 3

► Dailies and Color

pipeline management

► VFX ► Marketing Services ► Sound Finishing ► Color Finishing

(including IMAX theatrical and HDR for home)

► DVD manufacturing and

distribution

► Dailies and Color

pipeline management

► VFX ► Marketing Services ► Sound Finishing ► Color Finishing ► DVD manufacturing

and Distribution

► Creative ► VFX ► Sound Finishing ► Color Finishing ► Immersive Experiences ► Original IP and production ► Asset creation ► Full servicing of film and television

properties

► Full servicing including asset

and level building

► Sound Finishing ► Packaged media manufacturing

and distribution

► Immersive Experiences ► Packaged media manufacturing

and distribution

POWERING PREMIUM CONTENT ACROSS MARKETS

slide-4
SLIDE 4

Impact of Deep Learning

Technicolor is a company working in the media and entertainment sector for filmmakers and advertisers. Ackowledging outstanding performance of Deep Learning based solutions in computer vision.

  • New functionalities emerge, requesting to revisit existing workflows.
  • Higher performances calls for proper evaluation and metrics.
  • Deep learning specific requirements raises integration and deployment challenges.
slide-5
SLIDE 5

Uses cases

  • Video Enhancement
  • Upscaling
  • denoising
  • Video Editing, Augmentation
  • Style Transfer
  • Mono to Stereo
  • Video encoding
  • Compression
  • Asset Management
  • Indexing, Retrieval
  • Classification
  • CGI, Animation
  • Video 2 animation
slide-6
SLIDE 6

UC#1 The Super Resolution

Converting images 2K into 4K

Y (LR)

Solution invert the distortions using a deep network : a stack of convolutionnal layers

Dong, Chao, et al. "Learning a deep convolutional network for image super-resolution." European Conference

  • n Computer Vision. Springer International Publishing, 2014.

Baseline

X (HR) Geometric Distortion Blur, noise lens PSF Subsampling Quantization

slide-7
SLIDE 7

Evaluation

PSNR dB Bicubic Deep Set5 33.19 37.80

  • Images captured at 2

different resolutions : LR, HR(x2)

Accuracy – 2X scale factor: Speed : HD (4K) image : about 1 sec with a GPU

slide-8
SLIDE 8

Approach

  • Knowledge Transfer

Applying filters a, b conditionally Y = a if K else b : relu(a+K-1) + relu(b-K)

Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2921-2929).

r e l u

slide-9
SLIDE 9

Approach

  • Selective Sampling
  • And better loss, wide receptive field

Training Images Model train Select patch

slide-10
SLIDE 10

Deep approach Ground truth Standard approach (Bicubic)

slide-11
SLIDE 11

High order loss, GAN

nice looking but PSNR no more applicable

Ledig, Christian, et al. "Photo-realistic single image super-resolution using a generative adversarial network." arXiv preprint arXiv:1609.04802 (2016).

Content Loss Discriminator Loss « perceptual loss » != PSNR Bicubic DNN mse GAN Original

slide-12
SLIDE 12

UC#2 Video Classification Problem Predicting interestingness

12

  • Which one is more interesting?
  • Which one is more interesting?
slide-13
SLIDE 13

Application

  • Media file search & browse
  • Advertisement
  • Filtering and summarization
  • E-learning
slide-14
SLIDE 14

Approach

  • From image / frame: CNN feature
  • utput coefficients from a dense layer (fc7) of a CNN model (CaffeNet)

Size = 4096

  • From audio: MFCC feature

Classic audio spectrum feature: Mel-frequency cepstral coefficients + Delta + Delta2 , Size = 60 * 3 = 180

slide-15
SLIDE 15

Results

Predicted interestingness: 0.00040983 Ground truth: not interesting Predicted interestingness: 0.64466870 Ground truth: interesting

slide-16
SLIDE 16

Evaluation

  • Datasets
  • Flickr (≈200000, balanced, patented API)
  • MediaEval (≈5000, unbalanced, human annotation)
  • The Mean Average Precision (MAP) metric

A ranking based metric, used for MediaEval performance evaluation:

  • Depth of the network
  • Adversarial content are possible
  • What about robustness against adversarial examples
slide-17
SLIDE 17

UC#3 3D Animation

Speeding up the production of animated movies

slide-18
SLIDE 18

Video 2 Animation

  • Sketching animations starting from video

Joints coordinates are extracted from images using plausible motions

slide-19
SLIDE 19

Integration

Animation are carried out by highly skilled artists with specialized GUI Need to devise new user/machine interface Mixing Rig controllers and learnt manifold compatibility

slide-20
SLIDE 20

UC#4 Post-filters in future video codec

  • Results

Fully convolutional neural network + Post-filter BD-rate DBF + SAO + ALF

  • 3.2%

CNN

  • 4.91%

Encoder Decoder

boundaries

slide-21
SLIDE 21

Results

DNN has a high computation cost

No filters CNN

slide-22
SLIDE 22

UC#5 Style Transfer

A new image editing tool

slide-23
SLIDE 23

Approach

  • Base line

Problem :

  • Flickering, stability over time
  • Speed

Johnson, Justin, Alexandre Alahi, and Li Fei-Fei. "Perceptual losses for real-time style transfer and super-resolution." arXiv preprint arXiv:1603.08155(2016).

slide-24
SLIDE 24

UC#5 Style Transfer

An interesting tool for artists How to control the output, how to evaluate?

slide-25
SLIDE 25

Conclusion

  • DL allows substantial improvements in several applications
  • Many new deep based tools are emerging and more are still to be

discovered

  • Existing workflows have to be adapted
  • Integration challenges need be solved
slide-26
SLIDE 26

Thanks