 
              Image Analysis and Synthesis Deep Learning Use cases at technicolor Louis Chevallier Principal Scientist, Research and Innovation technicolor.com GTC Munich - October 2017
PROVIDING THE BUSINESS A COMPETITIVE EDGE EXPLOIT DATA TO OPTIMIZE AND EXPAND OUR BUSINESS WITH AI- BASED INNOVATION IMPROVE THE USER EXPERIENCE AT HOME WITH OFFER PROFESSIONALS AND BETTER NETWORK, VIDEO SERVICES AND CONTEXTUAL END-USERS SOLUTIONS TO SOLUTIONS AMPLIFY THEIR IMMERSIVE EXPERIENCES PROPOSE NEW TOOLS TO ANALYSE, PROCESS, REPRESENT AND RENDER AUDIOVISUAL CONTENT
POWERING PREMIUM CONTENT ACROSS MARKETS ► Original IP and production ► Asset creation ► Full servicing of film and television properties ► Dailies and Color ► Full servicing including asset pipeline management ► Creative and level building ► VFX ► VFX ► Dailies and Color ► Sound Finishing ► Marketing Services ► Sound Finishing pipeline management ► Packaged media manufacturing ► Sound Finishing ► Color Finishing ► VFX and ► Color Finishing ► Immersive Experiences ► Marketing Services distribution ► DVD manufacturing ► Sound Finishing and Distribution ► Color Finishing (including IMAX ► Immersive Experiences theatrical and HDR for ► Packaged media manufacturing home) and distribution ► DVD manufacturing and distribution
Impact of Deep Learning Technicolor is a company working in the media and entertainment sector for filmmakers and advertisers. Ackowledging outstanding performance of Deep Learning based solutions in computer vision. • New functionalities emerge, requesting to revisit existing workflows. • Higher performances calls for proper evaluation and metrics. • Deep learning specific requirements raises integration and deployment challenges.
Uses cases • Video Enhancement • Asset Management • Upscaling • Indexing, Retrieval • denoising • Classification • Video Editing, Augmentation • CGI, Animation • Style Transfer • Video 2 animation • Mono to Stereo • Video encoding • Compression
UC#1 The Super Resolution Converting images 2K into 4K Geometric Blur, noise Subsampling X (HR) Y (LR) Distortion lens PSF Quantization Solution invert the distortions using a deep network : a stack of convolutionnal layers Baseline Dong, Chao, et al. "Learning a deep convolutional network for image super-resolution." European Conference on Computer Vision . Springer International Publishing, 2014.
Evaluation • Images captured at 2 different resolutions : LR, HR(x2) Accuracy – 2X scale factor: PSNR dB Bicubic Deep Set5 33.19 37.80 Speed : HD (4K) image : about 1 sec with a GPU
Approach • Knowledge Transfer Applying filters a, b conditionally Y = a if K else b : relu(a+K-1) + relu(b-K) r e l u Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2921-2929).
Approach Training Images • Selective Sampling Select patch train Model • And better loss, wide receptive field
Ground truth Deep approach Standard approach (Bicubic)
High order loss, GAN « perceptual loss » != PSNR Content Loss Bicubic DNN mse Discriminator Loss GAN Original nice looking but PSNR no more applicable Ledig, Christian, et al. "Photo-realistic single image super-resolution using a generative adversarial network." arXiv preprint arXiv:1609.04802 (2016).
UC#2 Video Classification Problem Predicting interestingness • Which one is more interesting? • Which one is more interesting? 12
Application • Media file search & browse • Advertisement • Filtering and summarization • E-learning
Approach • From image / frame: CNN feature output coefficients from a dense layer (fc7) of a CNN model (CaffeNet) Size = 4096 • From audio: MFCC feature Classic audio spectrum feature: Mel-frequency cepstral coefficients + Delta + Delta 2 , Size = 60 * 3 = 180
Results Predicted interestingness: 0.00040983 Predicted interestingness: 0.64466870 Ground truth: not interesting Ground truth: interesting
Evaluation • Datasets • Flickr (≈200000, balanced, patented API) • MediaEval (≈5000, unbalanced, human annotation ) • The Mean Average Precision (MAP) metric A ranking based metric, used for MediaEval performance evaluation : • Depth of the network • Adversarial content are possible • What about robustness against adversarial examples
UC#3 3D Animation Speeding up the production of animated movies
Video 2 Animation • Sketching animations starting from video Joints coordinates are extracted from images using plausible motions
Integration Animation are carried out by highly skilled artists with specialized GUI Need to devise new user/machine interface Mixing Rig controllers and learnt manifold compatibility
UC#4 Post-filters in future video codec Encoder Decoder Fully convolutional + neural network boundaries Post-filter BD-rate • Results DBF + SAO + ALF -3.2% CNN -4.91%
Results CNN No filters DNN has a high computation cost
UC#5 Style Transfer A new image editing tool
Approach • Base line Problem : • Flickering, stability over time • Speed Johnson, Justin, Alexandre Alahi, and Li Fei-Fei. "Perceptual losses for real-time style transfer and super-resolution." arXiv preprint arXiv:1603.08155(2016).
UC#5 Style Transfer An interesting tool for artists How to control the output, how to evaluate?
Conclusion • DL allows substantial improvements in several applications • Many new deep based tools are emerging and more are still to be discovered • Existing workflows have to be adapted • Integration challenges need be solved
Thanks
Recommend
More recommend