Outline Introduction Convolutional neural networks (CNN) The - PowerPoint PPT Presentation

23rd International Conference on MultiMedia Modeling (MMM 2017) On the Exploration of Convolutional Fusion Networks for Visual Recognition Yu Liu, Yanming Guo, and Michael S. Lew Leiden Institute of Advanced Computer Science, Leiden University Presenter: Yu Liu Discover the world at Leiden University

Outline • Introduction • Convolutional neural networks (CNN) • The usage of intermediate layers • Multi-layer fusion • Motivation • How to develop an efficient multi-layer fusion network • Our approach • Convolutional fusion networks (CFN) • Results • Image-level and pixel-level classification • Conclusions Discover the world at Leiden University

Introduction: CNNs • A plain CNN … FC 1 × 1 Conv Conv S -1 Conv S Conv 2 GAP Conv 1 Prediction … Pooling Pooling Pooling Conv: convolutional layer Pooling: max or average pooling layer 1x1 Conv: use 1x1 kernel size GAP: global average pooling FC: fully-connected layer This pipeline of CNN becomes widely used in recent works, because it can reduce a large number of parameters. Discover the world at Leiden University

Introduction: CNNs • A plain CNN … FC 1 × 1 Conv Conv S -1 Conv S Conv 2 GAP Conv 1 Prediction … Pooling Pooling Pooling A plain CNN estimates a final prediction based on the topmost layer. If useful information in intermediate layers are lost during forward propagation? Can we develop a fusion architecture which exploits the intermediate layers? Discover the world at Leiden University

Introduction: intermediate layers • Apart from fully-connected layers, intermediate convolutional layers can also offer discriminative representations. Input image Feature encoder Output vector Feature extractor CNN … BoW, VLAD, … Fisher Vector, et al. Encoder Method BoW DeepIndex (ICMR2015), BLCF (ICMR2016), MSCE (IJCV2016) VLAD MOP-CNN (ECCV2014), NetVLAD (CVPR2016), CCS (MM2016) Fisher Vector DSP (ICCV2015), MPP (CVPR2015), FV-CNN (CVPR2015) Other encoders SCFVC (NIPS2014), SPoC (ICCV2015), SPLeaP (ECCV2016) Discover the world at Leiden University

Introduction: multi-layer fusion • To integrate the strengths of different layers, aggregate multi-layer activations and generate a richer representation. References: Lingqiao Liu, Chunhua Shen, Anton van den Hengel. “The treasure beneath convolutional layers: cross convolutional 1. layer pooling for image classification”, CVPR, 2015. Ying Li, Xiangwei Kong, Liang Zheng, Qi Tian. “Exploiting Hierarchical Activations of Neural Network for Image 2. Retrieval”, ACM Multimedia 2016. Discover the world at Leiden University

Introduction: multi-layer fusion • To integrate the strengths of different layers, aggregate multi-layer activations and generate a richer representation. • However, these works use a pre-trained model without improving the training procedure. Discover the world at Leiden University

Introduction: multi-layer fusion • Add new side branches and train them jointly with the full-depth main branch. DAG-CNNs Figure from “Yang , S., Ramanan, D.: Multi-scale recognition with DAG-CNNs, ICCV 2015.” Discover the world at Leiden University

Introduction: multi-layer fusion • Add new side branches and train them jointly with the full-depth main branch. DAG-CNNs Figure from “Yang , S., Ramanan, D.: Multi-scale recognition with DAG-CNNs, ICCV 2015.” • This approach spends a large number of additional parameters for developing side branches (i.e. fully-connected layers). • The summation operation ignores different importance of side branches. Discover the world at Leiden University

Motivation • Question: How to exploit an efficient multi-layer fusion network built upon CNNs ? • Three key issues • Efficiency: adding few parameters in the side branches. • Better fusion module: learn adaptive weights for different side branches. • Accuracy: considerable improvements over a plain CNN. Discover the world at Leiden University

Outline • Introduction • Convolutional neural networks (CNN) • The usage of intermediate layers • Multi-layer fusion • Motivation • How to develop an efficient multi-layer fusion network • Our approach • Convolutional fusion networks (CFN) • Results • Image-level and pixel-level classification • Conclusion & Discussion Discover the world at Leiden University

Our approach: CFN • Overall Architecture Advantage I : Efficient side branches Discover the world at Leiden University

Efficient side outputs … 1 × 1 Conv GAP Conv 2 Conv S -1 Conv S Conv 1 … Pooling Pooling Pooling Side branch S ( Main branch ) Side branch S -1 …… … Side branch 2 Side branch 1 1. Creating the side branches from the pooling layers. 2. Employing the 1x1 convolution to “receive” the side -branch inputs. 3. Performing an efficient global average pooling (GAP) to “send” the side -branch outputs. CFN has a minimal increase in parameters for the side branches. Discover the world at Leiden University

Our approach: CFN • Overall Architecture Advantage I : Efficient side branches Advantage II : Early fusion and late prediction Discover the world at Leiden University

Early fusion and late prediction … 1 × 1 Conv GAP Conv 2 Conv S -1 Conv S Conv 1 … Pooling Pooling Pooling Side branch S ( Main branch ) FC Fusion module Side branch S -1 Prediction …… … Side branch 2 Side branch 1 1. Using a fusion module to integrate the side-branch outputs. 2. The fused feature is fed to a fully-connected layer to make a final prediction. Discover the world at Leiden University

Comparison Early fusion and late prediction (EFLP) Early prediction and late fusion (EPLF) The advantages of EFLP: • competitive performance with EPLF. • fewer parameters (i.e. use one FC layer) than EPLF. • the fused feature can act as a richer image representation. Discover the world at Leiden University

Our approach: CFN • Overall Architecture Advantage I : Efficient side branches Advantage II : Early fusion and late prediction Advantage III: Locally-connected fusion Discover the world at Leiden University

Locally-connected (LC) fusion 1 × 1 Conv GAP Fusion module branch 1 Locally Stack FC connected branch 2 Prediction … … … S branch S -1 branch S • The side outputs are first stacked together. • A locally-connected layer with 1x1 kernel size is performed over the stacked maps. LC layer can learn adaptive weights for different side outputs. Discover the world at Leiden University

Comparison • Since a locally-connected layer does not share the weights over spatial dimensions, it can learn better fusion than other fusion modules. • To the best of our knowledge, this is the first attempt to apply a locally- connected layer to a fusion module. Discover the world at Leiden University

Our approach: CFN • Overall Architecture Advantage I : Efficient side branches Advantage II : Early fusion and late prediction Advantage III: Locally-connected fusion Discover the world at Leiden University

Our approach: CFN • Overall Architecture • CFN can integrate the intermediate layers using additional side branches, and deliver their effects on the final prediction explicitly and directly. Discover the world at Leiden University

Discussion (1) Difference from DSN • Deeply-supervised nets (DSN) add extra supervision to guide “Loss fusion” intermediate layers earlier. • However, CFN aims to generate a fused and richer feature and uses only one supervision towards the final prediction. “Feature fusion” References: Lee, C., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: AISTATS 2015. Discover the world at Leiden University

Discussion (2) Difference from ResNet • ResNet makes use of “linear” shortcut connections to make “ Depth that matters ” much deeper neural networks work well. • However, CFN exploits existing intermediate layers to improve the discriminative capability of CNNs. “ Fusion that matters ” References: He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR 2016. Discover the world at Leiden University

Outline Introduction Convolutional neural networks (CNN) The - PowerPoint PPT Presentation

23rd International Conference on MultiMedia Modeling (MMM 2017) On the Exploration of Convolutional Fusion Networks for Visual Recognition Yu Liu, Yanming Guo, and Michael S. Lew Leiden Institute of Advanced Computer Science, Leiden University

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

Developing a Food Procurement Policy or Profile With Healthcare Without Harm, Inova Center for

Stat 5102 Lecture Slides: Deck 8 Bootstrap Charles J. Geyer School of Statistics University of

A Gentle Introduction to Mathematical Fuzzy Logic 6. Further lines of research and open problems

Deductive Program Verification Jean-Christophe Filli atre STOP r = 1 v = u s = 1 u

June 18, 2020 Our Agenda Welcome Eric Chapman, Attorney, CowanPerryPC Going Forward

Structured Prediction via Implicit Embeddings Alessandro Rudi Imaging and Machine Learning, April

Mobile Communications Towards 2020 Carlos Caseiro January 2017 Evolution Mobile Networks

MARKETING TO LARGER ORGANISATIONS Presented by JE Consulting Corporate Finance Network Workshop