MultiModal I nform ation Fusion Ling Guan Ryerson Multimedia - PDF document

18/09/2018 MultiModal I nform ation Fusion Ling Guan Ryerson Multimedia Laboratory & Centre for Interactive Multimedia Information Mining Ryerson University, Toronto, Ontario Canada lguan@ee.ryerson.ca https://www.ryerson.ca/multimedia-research-laboratory/ 9/18/2018 1 Acknow ledgem ent  The presenter would like to thank his former and current students, P. Muneesawang, Y. Wang, R. Zhang, M.T. Ibrahim, L. Gao, C. Liang and N. El Madany for their contributions to this research.  Slides on fusion fundamentals provided by Prof. S-Y Kung of Princeton University are greatly appreciated.  This presentation is supported by  The Canada Research Chair (CRC) Program,  Canada Foundation for Innovations (CFI),  The Ontario Research Fund (ORF), and  Ryerson University Ryerson University – Multimedia Research Laboratory 9/18/2018 2 L. Guan, P. Muneesawang, Y. Wang, R. Zhang, Y. Tie, A. Bulzacki, M.T. Ibrahim, N. Joshi, Z.Xie, L. Gao, N. El Madany 1

18/09/2018 Relevant Publications Y. Wang and L. Guan, “Combining speech and facial expression for recognition of human 1. emotional state,” IEEE Trans. on Multimedia , 10(5), 936 - 946, Aug 2008 C. Liang, E. Chen, L. Qi and L. Guan, “Heterogeneous features fusion with collaborative 2. representation learning for 3D action recognition,” Proc. IEEE Int. Symposium on Multimedia , pp. 162-168, Taichung, Taiwan, December 2017 (Top Six (5%) Paper Honor). L. Gao, L. Qi, E. Chen and L. Guan, “Discriminative multiple canonical correlation analysis 3. for information fusion,” IEEE Trans. on Image Processin g, vol. 27, no. 4, pp. 1951-1965, Apr 2018. P. Muneesawang, T. Amin and L. Guan, “A new learning algorithms for the fusion of 4. adaptive audio-visual features for the retrieval and classification of movie clips,” J. Signal Processing Systems , vol. 59, no. 2, 177-188, May 2010. T. Amin, M. Zeytinoglu and L. Guan, “Application of Laplacian mixture model for image and 5. video retrieval,” IEEE Transactions on Multimedia , vol. 9, no, 7, pp. 1416-1429, November 2007. P. Muneesawang and L. Guan, “Adaptive video indexing and automatic/semi-automatic 6. relevance feedback,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 15, no. 8, pp. 1032-1046, August 2005 L. Guan, P. Muneesawang, Y. Wang, R. Zhang, Y. Tie, A. Bulzacki and M.T. Ibriham, 7. “Multimedia Multimodal Technologies,” Proc. IEEE Workshop on Multimedia Signal Processing and Novel Parallel Computing (In conjunction with ICME 2009), 1600-1603, NYC, USA, Jul 2009 (Overview Paper). R. Zhang and L. Guan, “Multimodal image retrieval via Bayesian information fusion,” Proc. 8. IEEE Int. Conf. on Multimedia and Expo , pp. 830-833, NYC, USA, Jun/Jul 2009. Ryerson University – Multimedia Research Laboratory 9/18/2018 3 Ling Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim W hy Multim edia Multim odal Methodology? ( revisit)  Multimedia is a domain of multi-facets, e.g., audio, visual, text, graphics, etc.  A central aspect of multimedia processing is the coherent integration of media from different sources or multimodalities.  Easy to define each facet individually, but difficult to consider them as a combined identity  Humans are natural and generic multimedia processing machines Can we teach computers/machines to do the same (via fusion technologies)? Ryerson University – Multimedia Research Laboratory 9/18/2018 4 Ling Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 2

18/09/2018 Potential Applications  Human–Computer Interaction  Learning Environments  Consumer Relations  Entertainment  Digital Home, Domestic Helper  Security/Surveillance  Educational Software  Computer Animation  Call Centers Ryerson University – Multimedia Research Laboratory 9/18/2018 5 Ling Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim Source of Fusion for Classification Score (Decision) Score (Decision) Representation Representation Data/Feature #1 Data/Feature #2 Ryerson University – Multimedia Research Laboratory Ling Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 9/18/2018 6 3

18/09/2018 Feature (Data) level fusion 9/18/2018 7 Direct Data (Feature) Level Fusion V i fea features: tu Prior knowledge can be incorporated into the fusion models by modifying Ryerson University – Multimedia Research Laboratory 9/18/2018 8 Ling Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 4

18/09/2018 Representation level fusion 9/18/2018 9 Representation Level Fusion Fused HMM HMM (Face Model) Ryerson University – Multimedia Research Laboratory 9/18/2018 10 Ling Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 5

18/09/2018 Decision (Score) level fusion 9/18/2018 11 A Different Interpretation E 1 (Human emotion recognition) N E .... y 1 ••• •• • Hierarchical Structure T W R D E 2 • Each Sub-network E r an O E E E M C M R expert system y 2 C O O O .... ••• G I T K D •• N I S I U O • The decision module Z . . . . I N L Y net . . . . E I . . . . O classifies the input vector E D N N as a particular class when E r P U .... Y net = arg max y j ••• y j •• T Modular Netw orks ( Decision Level) 9/18/2018 12 6

18/09/2018 Score Fusion Architecture (Audio-Visual) The scores are independently obtained, which are then combined. The lower layer contains local experts, each produces a • local score based on a single modality The upper layer combines the score. • Ryerson University – Multimedia Research Laboratory 9/18/2018 13 Ling Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim Linear Fusion Non-uniformly weighted Accept Score 2 Reject Score 1 The most prevailing unsupervised approaches estimate the confidence based on prior knowledge or training data. Linear SVM (supervised) Fusion is an appealing alternative. Ryerson University – Multimedia Research Laboratory 9/18/2018 14 Ling Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 7

18/09/2018 Nonlinear Adaptive Fusion (via supervision) (Kernel, SVM) Score 2 Score 1 Ryerson University – Multimedia Research Laboratory 9/18/2018 15 Ling Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim Data/ Feature Fusion  Simple and straightforward (Good)  Curse of Dimensionality (Bad)  Normalization issue  Case study: 1. Bimodal Human emotion recognition (also with a score level fusion flavor) 2. 3D human action recognition 3. Fusion by feature mapping Ryerson University – Multimedia Research Laboratory 9/18/2018 16 Ling Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 8

18/09/2018 Bim odal Hum an em otion recognition — Also with a score level fusion flavor Y. Wang and L. Guan, “Recognizing human emotional state from 1. audiovisual signals,” IEEE Transactions on Multimedia , vol. 10, no. 5, pp. 936 - 946, August 2008. Ryerson University – Multimedia Research Laboratory 9/18/2018 17 Ling Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim Indicators of emotion Speech Major indicators of emotion Major indicators of emotion Major indicators of emotion Major indicators of emotion Facial expression Body language: highly dependent on personality, gender, age, etc Semantic meaning: two sentences could have the same lexical meaning but different emotional information ...... Ryerson University – Multimedia Research Laboratory 9/18/2018 18 Ling Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 9

18/09/2018 9/18/2018 19 Objective To develop a generic language and cultural background independent system for recognition of hum an em otional state from audiovisual signals Ryerson University – Multimedia Research Laboratory 9/18/2018 20 Ling Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 10

18/09/2018 Audio feature extraction Hamming Prosodic Preprocessing window Input Noise Leading and trailing Windowing MFCC speech reduction edge elimination Audio Wavelet 512 points, feature set Formant thresholding 50% overlap Ryerson University – Multimedia Research Laboratory 9/18/2018 21 Ling Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim Visual feature extraction Key Frame Extraction Input Image Sequence Maximum Speech Amplitude Face Detection Visual Features Gabor Feature Mapping Filter Bank Ryerson University – Multimedia Research Laboratory 9/18/2018 22 Ling Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 11

MultiModal I nform ation Fusion Ling Guan Ryerson Multimedia - PDF document

18/09/2018 MultiModal I nform ation Fusion Ling Guan Ryerson Multimedia Laboratory & Centre for Interactive Multimedia Information Mining Ryerson University, Toronto, Ontario Canada lguan@ee.ryerson.ca

I nform ation & Referral I nform ation & Referral - Building on our Mandate Building on

Probabilistic and Model Fusion: . . . Model Fusion: . . . Interval Uncertainty Model Fusion:

Pollinator I nform ation Netw ork of the Am ericas Vital inform ation for a vital ecosystem

High resolution image fusion via fusion frames Shidong Li San Francisco State University

Multimodal Machine Learning Louis-Philippe (LP) Morency CMU Multimodal Communication and Machine

Multimodal Machine Learning Louis-Philippe (LP) Morency CMU Multimodal Communication and Machine

Product I nform ation 4 .0 Gebrauchsinform ation 4 .0 ( GI 4 .0 ) Digital Product

October 2016 October 2016 WHAT IS FUSION? TWO FUSION TYPES NEUTRONIC ANEUTRONIC TWO

Update on the Fusion Update on the Fusion Energy Sciences Program Energy Sciences Program Ed

Modeling with MOSEK Fusion Ulf Worse INFORMS Minneapolis October 5 2013 http://www.mosek.com

Nursing I nform atics for Nursing I nform atics for Clinical I nstructors Clinical I nstructors

Multimodal Language Analysis with Recurrent Multistage Fusion Presenter: Paul Pu Liang Paul Pu

Update of Magnetic Fusion Energy Research Brian A. Nelson for the UW Fusion Energy Research Group

Fusion Nothing But The Truth Fusion Orbotech s True Commitment To The PCB Industry Overall

Oncentra Prostate Image Fusion Josh Mason Oncentra Prostate Image Fusion Multiple image

Multimodal Corridor Planning & Engineering Analysis Project A1A MULTIMODAL CORRIDOR PLANNING

Towards a Rich Model Toolkit An Infrastructure for Reliable Computer Systems The objective of the

Discovery Projects Strategies for Defining the Opportunity Tom Martin Senior Technology

OpenModelica Compiler Bootstrapping Martin Sjlund, Linkping University 2011-02-07 3 rd

24 April 2013 The overall classification of this brief is Derived From: NSA/CSSM 1-52 TOP

An Interpreter of DSL in ReactiveML and JoCaml Louis Mandel Universit e Paris-Sud 11 INRIA

On the Complexity of Dynamic Epistemic Logic Guillaume Aucher 1 Francois Schwarzentruber 2

Graphs, Strings, Languages and Boolean Logic Graphs, Strings, Languages and Boolean Logic

Why Are We Matrices? Studying plenty Matrices have uses in Computer Science. E.g.: of

MultiModal I nform ation Fusion Ling Guan Ryerson Multimedia - PDF document

18/09/2018 MultiModal I nform ation Fusion Ling Guan Ryerson Multimedia Laboratory & Centre for Interactive Multimedia Information Mining Ryerson University, Toronto, Ontario Canada lguan@ee.ryerson.ca

I nform ation &amp; Referral I nform ation &amp; Referral - Building on our Mandate Building on

Probabilistic and Model Fusion: . . . Model Fusion: . . . Interval Uncertainty Model Fusion:

Pollinator I nform ation Netw ork of the Am ericas Vital inform ation for a vital ecosystem

High resolution image fusion via fusion frames Shidong Li San Francisco State University

Multimodal Machine Learning Louis-Philippe (LP) Morency CMU Multimodal Communication and Machine

Multimodal Machine Learning Louis-Philippe (LP) Morency CMU Multimodal Communication and Machine

Product I nform ation 4 .0 Gebrauchsinform ation 4 .0 ( GI 4 .0 ) Digital Product

October 2016 October 2016 WHAT IS FUSION? TWO FUSION TYPES NEUTRONIC ANEUTRONIC TWO

Update on the Fusion Update on the Fusion Energy Sciences Program Energy Sciences Program Ed

Modeling with MOSEK Fusion Ulf Worse INFORMS Minneapolis October 5 2013 http://www.mosek.com

Nursing I nform atics for Nursing I nform atics for Clinical I nstructors Clinical I nstructors

Multimodal Language Analysis with Recurrent Multistage Fusion Presenter: Paul Pu Liang Paul Pu

Update of Magnetic Fusion Energy Research Brian A. Nelson for the UW Fusion Energy Research Group

Fusion Nothing But The Truth Fusion Orbotech s True Commitment To The PCB Industry Overall

Oncentra Prostate Image Fusion Josh Mason Oncentra Prostate Image Fusion Multiple image

Multimodal Corridor Planning &amp; Engineering Analysis Project A1A MULTIMODAL CORRIDOR PLANNING

Towards a Rich Model Toolkit An Infrastructure for Reliable Computer Systems The objective of the

Discovery Projects Strategies for Defining the Opportunity Tom Martin Senior Technology

OpenModelica Compiler Bootstrapping Martin Sjlund, Linkping University 2011-02-07 3 rd

24 April 2013 The overall classification of this brief is Derived From: NSA/CSSM 1-52 TOP

An Interpreter of DSL in ReactiveML and JoCaml Louis Mandel Universit e Paris-Sud 11 INRIA

On the Complexity of Dynamic Epistemic Logic Guillaume Aucher 1 Francois Schwarzentruber 2

Graphs, Strings, Languages and Boolean Logic Graphs, Strings, Languages and Boolean Logic

Why Are We Matrices? Studying plenty Matrices have uses in Computer Science. E.g.: of

I nform ation & Referral I nform ation & Referral - Building on our Mandate Building on

Multimodal Corridor Planning & Engineering Analysis Project A1A MULTIMODAL CORRIDOR PLANNING