8-Speech Recognition Speech Recognition Concepts Speech Recognition - PowerPoint PPT Presentation

8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W) Network Types 1

7-Speech Recognition (Cont ’ d)  HMM Calculating Approaches  Neural Components  Three Basic HMM Problems  Viterbi Algorithm  State Duration Modeling  Training In HMM 2

Recognition Tasks  Isolated Word Recognition (IWR) Connected Word (CW) , And Continuous Speech Recognition (CSR)  Speaker Dependent, Multiple Speaker, And Speaker Independent  Vocabulary Size  Small <20  Medium >100 , <1000  Large >1000, <10000  Very Large >10000 3

Speech Recognition Concepts Speech recognition is inverse of Speech Synthesis Speech Text Speech Speech Synthesis NLP Processing Speech Phone Text Speech NLP Understanding Sequence Processing Speech Recognition 4

Speech Recognition Approaches  Bottom-Up Approach  Top-Down Approach  Blackboard Approach 5

Bottom-Up Approach Signal Processing Voiced/Unvoiced/Silence Feature Extraction Knowledge Sources Segmentation Sound Classification Rules Signal Processing Phonotactic Rules Feature Extraction Lexical Access Segmentation Language Model Segmentation Recognized Utterance 6

Top-Down Approach Inventory Word Task Grammar of speech Dictionary Model recognition units Syntactic Unit Lexical Semantic Feature Hypo Matching Hypo Hypo Analysis thesis thesis System thesis Utterance Verifier/ Matcher Recognized Utterance 7

Blackboard Approach Acoustic Lexical Processes Processes Black Environmental board Processes Semantic Processes Syntactic Processes 8

Recognition Theories  Articulatory Based Recognition  Use Articulatory system modeling for recognition  This theory is the most successful so far  Auditory Based Recognition  Use Auditory system for recognition  Hybrid Based Recognition  Is a combination of the above theories  Motor Theory  Model the intended gesture of speaker 9

Recognition Problem  We have the sequence of acoustic symbols and we want to find the words uttered by speaker  Solution : Find the most probable word sequence given Acoustic symbols 10

Recognition Problem  A : Acoustic Symbols  W : Word Sequence ˆ  we should find so that W ˆ  ( | ) max ( | ) P W A P W A W 11

Simple Language Model   w w w w w 1 2 3 n n    ( ) ( | ) P w P w w w w   1 2 1 i i i  1 i  ( ) ( | ) ( | , ) P W P W W P W W W 1 2 1 3 2 1 ( | , , )..... P W W W W 4 3 2 1 ( | , ,..., ) P W W W W   1 2 1 n n n  ( , , ,..., ) P W W W W   1 2 1 n n n Computing this probability is very difficult and we need a very big database. So we use Trigram and Bigram models. 14

Simple Language Model (Cont ’ d) n   ( ) ( | ) P w P w w w Trigram :   1 2 i i i  1 i n   ( ) ( | ) P w P w w Bigram :  1 i i  1 i n   ( ) ( ) P w P w Monogram : i  1 i 15

Simple Language Model (Cont ’ d) Computing Method :  Number of happening W3 after W1W2 ( | ) P w w w 3 2 1 Total number of happening W1W2 Ad hoc Method :       ( | ) ( | ) ( | ) ( ) P w w w f w w w f w w f w 3 2 1 1 3 2 1 2 3 2 3 3 16

Error Production Factor  Prosody (Recognition should be Prosody Independent)  Noise (Noise should be prevented)  Spontaneous Speech 17

P(A|W) Computing Approaches  Dynamic Time Warping (DTW)  Hidden Markov Model (HMM)  Artificial Neural Network (ANN)  Hybrid Systems 18

Dynamic Time Warping 19

Dynamic Time Warping Search Limitation : - First & End Interval - Global Limitation - Local Limitation 23

Dynamic Time Warping Global Limitation : 24

Dynamic Time Warping Local Limitation : 25

Artificial Neural Network x 0 w 0 w y  x   1 1 N . 1    ( ) y w i x i .  0 i . w  1 N x Simple Computation Element  1 N of a Neural Network 26

Artificial Neural Network (Cont ’ d)  Neural Network Types  Perceptron  Time Delay  Time Delay Neural Network Computational Element (TDNN) 27

Artificial Neural Network (Cont ’ d) Single Layer Perceptron x x  1 N 0 . . . . . . y y  0 1 M 28

Artificial Neural Network (Cont ’ d) Three Layer Perceptron . . . . . . . . . . . . 29

2.5.4.2 Neural Network Topologies 30

TDNN 31

2.5.4.6 Neural Network Structures for Speech Recognition 32

2.5.4.6 Neural Network Structures for Speech Recognition 33

Hybrid Methods  Hybrid Neural Network and Matched Filter For Recognition Acoustic Output Units Speech Features Delays PATTERN CLASSIFIER 34

Neural Network Properties  The system is simple, But too much iteration is needed for training  Doesn ’ t determine a specific structure  Regardless of simplicity, the results are good  Training size is large, so training should be offline 35

Pre-processing  Different preprocessing techniques are employed as the front end for speech recognition systems  The choice of preprocessing method is based on the task, the noise level, the modeling tool, etc. 36

MFCC شور MFCC نتبميم تاوصا زا ناسنا شوگ کاردا هوحن ربيدشاب.  شور MFCC اس هب تبسنيو ريحم رد اهيِگژياهطيزيونيم لمع رتهبيدنک.  شور MFCC اهدربراک تهج ًاساساياسانشييارا راتفگياسانش رد اما تسا هدش هيي  وگين هدنيبسانم نامدنار زيدراد. Mel ميز هطبار کمک هب هک دشابيم تسدب ريآيد:  نش دحاوي ناسنا شوگ راد 43

MFCC شور لحارم هلحرم1 :س تشاگني کمک هب سناکرف هزوح هب نامز هزوح زا لانگ FFT هاتوک نامز. z(n) :سي لانگراتفگ w(n) هرجنپ دننام هرجنپ عباتگنيمه : W F = e -j2 π /F m : 0, … ,F – 1; F :رف لوطيراتفگ مي. 44

MFCC شور لحارم هلحرم2 :يژرنا نتفايف کناب لاناک رهيرتل. M اهکناب دادعتينتبم رتليفيم لم رايعم ربيدشاب. ف عباتياهرتليتسا رتليف کناب. ( ) W j   0,1,...,1 k M k 45

لم رايعم رب ينتبم رتليف عيزوت 46

MFCC شور لحارم DCT هب لوصح تهج هلحرم4 :زاس هدرشفيدبت لامعا و فيطي ل  MFCC ارضي ب MFCC ميدشاب. n ارض هبترمي ب 0 L = ، ... ، لباب هطبار رد  47

|FFT| 2 Mel-scaling یدنب میرف Logarithm IDCT Cepstra Delta & Delta Delta Cepstra Low-order Differentiator coefficients 48

مورتسپک لم بیارض (MFC MFCC) 49

مورتسپک لم یاه یگژیو (MFCC)  ایراو هک یتهجرد لمرتلیف کناب یاه یژرنا تشاگن سن DCT ) دشاب ممیسکام اهنآ(زا هدافتسا اب  سن لماکریغ تروص هب راتفگ یاه یگژیو للبقتسا هب تب DCT ) رگیدکی(ریثات  زیمت یاهطیحم رد بسانم خساپ  یزیون یاهطیحم رد نآ ییاراک شهاک 50

Time-Frequency analysis  Short-term Fourier Transform  Standard way of frequency analysis: decompose the incoming signal into the constituent frequency components.  W(n): windowing function  N: frame length  p: step size 51

Critical band integration  Related to masking phenomenon: the threshold of a sinusoid is elevated when its frequency is close to the center frequency of a narrow-band noise  Frequency components within a critical band are not resolved. Auditory system interprets the signals within a critical band as a whole 52

Bark scale 53

Feature orthogonalization  Spectral values in adjacent frequency channels are highly correlated  The correlation results in a Gaussian model with lots of parameters: have to estimate all the elements of the covariance matrix  Decorrelation is useful to improve the parameter estimation. 54

8-Speech Recognition Speech Recognition Concepts Speech Recognition - PowerPoint PPT Presentation

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types 1 7-Speech Recognition (Cont d) HMM Calculating Approaches

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Speech recognition Brief history Technology Computer Literacy 1 Lecture 22 How does

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Speech Processing 15-492/18-492 Speech Recognition Template matching Speech Recognition by

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 23: Speech

GPU-Accelerated GPU-Accelerated Large Vocabulary Continuous Speech Recognition Large

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 1: Introduction

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 1: Introduction

Speech Processing 15-492/18-492 Speech Recognition Signal Processing Analog to Digital Speech

Effective Open Source Speech Recognition in Your Application #kde-speech Peter Grasch

Speech Processing 15-492/18-492 Speech Recognition Intro Acoustic modelling HMMs Speech

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic

3. Information-Theoretic Foundations Founder: Claude Shannon, 1940s Gives bounds for:

-Orientations Definition. Given G = ( V , E ) and : V I N. An -orientation of G is an

Stochastic Properties of disturbed Elementary Cellular Automata Micha Posiewnik Institute of

On delocalization in the six-vertex model Marcin Lis University of Vienna February 10, 2020 1 /

Stochastic Approximation in Hilbert Spaces Aymeric DIEULEVEUT Supervised by Francis BACH

Erlang in Production I wish I'd known that when I started Or This is nothing like the

Domain Specific Languages Domain Specific Languages in Erlang Dennis Byrne

Luerl - Lua in Erlang Scripting mechanisms for the BEAM ecosystem Jean Chassoul FOSDEM 2019

Sambuz

Useful Links

Newsletter

Mail Us