Deep Graph Random Process for Relational-Thinking-Based Speech - PowerPoint PPT Presentation

Deep Graph Random Process for Relational-Thinking-Based Speech Recognition HENGGUAN HUANG, FUZHAO XUE, HAO WANG, YE WANG 1

Conversational Speech Recognition Neurobiology ： Bayesian Deep learning Relational Thinking How many infected cases today? 2

Motivation: relational thinking 3

Motivation: relational thinking A type of human learning process, in which people spontaneously perceive meaningful patterns from the surrounding world . A relevant concept: percept ◦ Unconscious mental impressions while hearing, seeing… ◦ Relations between current sensory signals and prior knowledge Patricia A Alexander. Relational thinking and relational reasoning: harnessing the power of patterning. NPJ science of learning, 4 1:16004, 2016

Motivation: Relational thinking A type of human learning process, in which people spontaneously perceive meaningful patterns from the surrounding world . Two-step procedure: ◦ Step 1 : the generation of an infinite number of percepts ◦ Step 2 : these percepts are then combined and transformed into concept or idea Largely unexplored in AI ( focus of this project ) Patricia A Alexander. Relational thinking and relational reasoning: harnessing the power of patterning. NPJ science of learning, 5 1:16004, 2016

Overview ◦ Our Goal : relational thinking modeling and its application in acoustic modeling ◦ Challenges (if percepts are modelled as graphs): ◦ Edges in the graph are not annotated/available (no relational labels) ◦ Hard to optimize over an infinite number of graphs ◦ Existing works: ◦ GNNs (e.g. GVAE ) require input/output to have graph structure ◦ Can not handle an infinite number of graphs ◦ Current acoustic models (e.g. RNN-HMM, the model we works on) is limited in representing complex relationships 6

Overview ◦ Our Solution : ◦ Build a type of random process that can simulate generation of an infinite number of percepts (graphs) called deep graph random process (DGP) ◦ Provide a close-form solution for combining an infinite number of graphs (coupling of percepts) ◦ Apply DGP for acoustic modelling (transformation of percetps) ◦ Obtain an analytical ELBO for jointly training ◦ Advantages : ◦ Relation labels is not required during training ◦ Easy to apply for down-stream tasks, e.g. ASR ◦ Computationally efficient and better performance 7

Machine speech recognition Speech-to-text transcription ◦ Transform audio into words An utterance We’ll get through this ◦ Relational thinking process is ignored 8

Relational thinking as human speech recognition How many new infected cases today? 9

Relational thinking as human speech recognition How many new infected cases today? 10

Relational thinking as human speech recognition Voice too low, but it should How many new be a number. infected cases today? 11

Problem formulation ◦ Given the current utterance and its histories (of fixed size, for simplicity) ◦ We aim to simulate relational thinking process, which is embedded into ASR: ◦ Construct an infinite number of graphs ◦ where represent k-th percept for multiple utterances ◦ Then, these percept graphs are combined and further transformed via a graph transform ◦ Our ultimate goal: , with a close form solution ◦ So that, perception and transformation can be decoupled from speech (graph learning) 12

Percept simulator: Deep Graph random process Deep graph random process (DGP) ◦ a stochastic process to describe How many infected percept generation cases today? ◦ It contains a few nodes, each represents an utterance 13

Percept simulator: Deep Graph random process Deep graph random process (DGP) ◦ a stochastic process to describe How many infected percept generation cases today? ◦ It contains a few nodes, each represents an utterance DBP: ◦ Each edge is attached with a deep Bernoulli process (DBP) ◦ Special Bernoulli process we proposed ◦ Bernoulli parameter is assumed to be close to 0 14

Sampling from DGP DGP: How many infected cases today? Sampling DBP: 15

Coupling of innumerable percept graphs Coupling in DGP ◦ The goal is to extract a representation of an infinite number of percept graphs 16

Coupling of innumerable percept graphs Coupling in DGP ◦ The goal is to extract a representation of an infinite number of percept graphs ◦ Computationally intractable to summing over their adjacency matrices 17

Coupling of innumerable percept graphs Coupling in DGP Bernoulli ◦ Construct an equivalent graph variable 1 ◦ Summing over the original Bernoulli variables gives a Binomial distribution ◦ Can we inference and sampling from such distribution ? Binomial Bernoulli variable variable n 18

Inference and sampling of Binomial distribution with Gaussian estimated Gaussian proxy of from inputs KL KL ◦ Approximate above two distributions with bounded appr. errors (Theorem1): 19

Inference and sampling of Binomial distribution with ◦ Directly parameterization of and are avoided ◦ Sampling: this allows for the re-parametrization trick to be used 20

Transforming the general summary graph to be task-specific Gaussian graph transform ◦ Each entry of its transform matrix follows a conditional Gaussian distribution ◦ Conditioning on edges of summary graph 21

Application of DGP for acoustic modeling Relational thinking network (RTN) 22

Learning Variational inference is applied to jointly optimise DGP, the Gaussian graph transform, and the RNN-HMM acoustic model ◦ Challenge #1 : DGP contains too many latent variables ◦ Bernoullis and Binomials are equivalent , specifying one determine the whole DGP 23

Learning Variational inference is applied to jointly optimise DGP, the Gaussian graph transform, and the RNN-HMM acoustic model ◦ Challenge #1 : DGP contains too many latent variables ◦ Bernoullis and Binomials are equivalent , specifying one determine the whole DGP ◦ Challenge #2 : One of a KL term of our ELBO is computational intractable This is computational intractable, as n approaches infinity 24

The analytical evidence lower bound (ELBO) ◦ This theorem allows us to obtain a close form solution of ELBO . ◦ In particular: ◦ The solution is irrelevant to the infinity 25

Experiments: data sets We evaluated the proposed method on several ASR datasets: ASR tasks ◦ CHiME-2 (preliminary study, not a conversational ASR task): ◦ Noisy version of WSJ0 ◦ CHiME-5 (conversaitional ASR task) ◦ First large-scale corpus of real multi-speaker conversational speech ◦ Train: ~40 hours, Eval: ~5 hours. Quantitative/qualitative study of the generated graphs ◦ Synthetic Relational SWB ◦ SWB: telephony conversational speech ◦ SwDA: extends SWB with graph annotations for utterances ◦ Train: 30K utterances (without graphs) , Test: graphs involved in 110K utterances 26

Experiments: model configurations L : number of layers; N : number of hidden states per layer; P : number of model parameters T : training time per epoch (hrs) Hengguan Huang, Hao Wang, Brain Mak. Recurrent Poisson process unit for speech recognition. AAAI, 2019. 27

Robustness to input noise Detailed WER (%) on test set of CHiME-2 28

ASR Results on conversational task WER (%) Eval of CHiME5 Outperforms other baselines 29

Quantitative study: can we infer utterance relationships with the generated graphs Error rate(%) of relation prediction on Synthetic Relational SWB 30

We can capture relationships without relational data ! 31

Recognition results of the utterance 10 Ground truth: so so where do you go do you go to Berkeley SRU: so so what do you go do you go to Berkeley RTN (ours): so so where do you go do you go to Berkeley 34

Take-away Expand the variational family with a deep graph random process ◦ Enable relational thinking modelling ◦ Graph learning without any relational labelling ◦ Easy to be applied for a downstream task such as ASR ◦ Improvements on several speech recognition datasets ◦ Code (coming soon): https://github.com/GlenHGHUANG/Deep_graph_ran dom_process 36

Deep Graph Random Process for Relational-Thinking-Based Speech - PowerPoint PPT Presentation

Deep Graph Random Process for Relational-Thinking-Based Speech Recognition HENGGUAN HUANG, FUZHAO XUE, HAO WANG, YE WANG 1 Conversational Speech Recognition Neurobiology Bayesian Deep learning Relational Thinking How many infected cases

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Back to Random Walks on Graphs Random walk on a graph: Stationary distribution: Back to Random

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Algorithms for random k -SAT and k -colourings of a random graph Michael Molloy Dept of Computer

Random graph methods October 16, 2018 Random graph methods October 16, 2018 1 / 37 Graphs and

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Chapter 2: Random Variables In this chapter we will cover: 1. Discrete Random variables, ( 2.1

Random Numbers, Files, and Onwards Random Numbers Computers cannot produce truly random numbers.

Random Walks on Graphs Larry Fenn DATE Larry Fenn Random Walks on Graphs Introduction

Random Graphs Will Perkins February 5, 2013 Graph Terminology A graph G = ( V , E ) is a set of

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

FSLT Speech Some Applications Jrgen Trouvain Symbolic Annotations & Dictionaries

Large Scale Learning of Speaker Variation Eleanor Chodroff Co-mentors: Sanjeev Khudanpur

Automated Speech Recognition in Controller Communications applied to Workload Measurement Third

Natural Language for Communication ( cont .) -- Speech Recognition Chapter 23.5 Automatic

Unifying Speech Recognition and Generation with Machine Speech Chain Andros Tjandra , Sakriani

Chief Executive Officers Presentation to Shareholders DISCLAIMER The material in this

Speech recognition Brief history Technology Computer Literacy 1 Lecture 22 How does

Voice Controlled Smart Spaces Florian Gratzer Advisors: Marc-Oliver Pahl Stefan Liebald

Deep Graph Random Process for Relational-Thinking-Based Speech - PowerPoint PPT Presentation

Deep Graph Random Process for Relational-Thinking-Based Speech Recognition HENGGUAN HUANG, FUZHAO XUE, HAO WANG, YE WANG 1 Conversational Speech Recognition Neurobiology Bayesian Deep learning Relational Thinking How many infected cases

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Back to Random Walks on Graphs Random walk on a graph: Stationary distribution: Back to Random

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Algorithms for random k -SAT and k -colourings of a random graph Michael Molloy Dept of Computer

Random graph methods October 16, 2018 Random graph methods October 16, 2018 1 / 37 Graphs and

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Chapter 2: Random Variables In this chapter we will cover: 1. Discrete Random variables, ( 2.1

Random Numbers, Files, and Onwards Random Numbers Computers cannot produce truly random numbers.

Random Walks on Graphs Larry Fenn DATE Larry Fenn Random Walks on Graphs Introduction

Random Graphs Will Perkins February 5, 2013 Graph Terminology A graph G = ( V , E ) is a set of

Graph Indexing: Tree + Delta Delta &gt;= Graph &gt;= Graph Graph Indexing: Tree + Peixian Zhao,

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

FSLT Speech Some Applications Jrgen Trouvain Symbolic Annotations &amp; Dictionaries

Large Scale Learning of Speaker Variation Eleanor Chodroff Co-mentors: Sanjeev Khudanpur

Automated Speech Recognition in Controller Communications applied to Workload Measurement Third

Natural Language for Communication ( cont .) -- Speech Recognition Chapter 23.5 Automatic

Unifying Speech Recognition and Generation with Machine Speech Chain Andros Tjandra , Sakriani

Chief Executive Officers Presentation to Shareholders DISCLAIMER The material in this

Speech recognition Brief history Technology Computer Literacy 1 Lecture 22 How does

Voice Controlled Smart Spaces Florian Gratzer Advisors: Marc-Oliver Pahl Stefan Liebald

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,

FSLT Speech Some Applications Jrgen Trouvain Symbolic Annotations & Dictionaries