spoc
play

SPOC OC lab ab signal p proce cessin ing a and oral c - PowerPoint PPT Presentation

SPOC OC lab ab signal p proce cessin ing a and oral c communic icatio ion Observing how closed ed syst systems fa fail can be a valua luable m ble method in discovering how those systems wo work. Pa Paul Bro roca (left)


  1. SPOC OC lab ab signal p proce cessin ing a and oral c communic icatio ion

  2. • Observing how closed ed syst systems fa fail can be a valua luable m ble method in discovering how those systems wo work. • Pa Paul Bro roca (left) discovered, in 1861, that a lesio ion n in the lef left ventro-posterior fr fron ontal lo lobe be caused expressi ssive ve a aphasi sia. • This was the first dir direct evidence that lang la ngua uage f func unction was locali lized ed. • It hinted at a mechanis nistic ic view of spee peech pr h produ duction. Broca’s ’s are rea SPOC OC lab ab 2 Introd oduc uction on signal p proce cessin ing a and oral c communic icatio ion

  3. Neur euro-mot otor or articulatory disorders resulting in unint ntell elligible igible speech. 7.5 million Americans have dysarthria hria • Cerebral palsy, • Parkinson’s, • Amyotrophic lateral sclerosis) (National Institute of Health) SPOC OC lab ab 3 Introd oduc uction on signal p proce cessin ing a and oral c communic icatio ion

  4. • Ty Types of dysarthria are related to specif ific ic s sites es in the subcortical nervous system. Type ype Primary les lesio ion s sit ite Ataxic Cer ereb ebellu lum or its outflow pathways Flaccid Lower er mo motor ne neur uron ( ≥ 1 cranial nerves) Hypo- Basal sal g ganglia (esp. substantia nigra) kinetic Hyper- Basal sal g ganglia (esp. putamen or caudate) kinetic Spastic Uppe pper mo motor ne neur uron Spastic- Both uppe pper and lower mo motor ne neur urons ns flaccid (After Darley et al. , 1969) SPOC OC lab ab 4 Dysarth thria signal p proce cessin ing a and oral c communic icatio ion

  5. Ataxic Flaccid Hypo- Hyper- Hyper- Spastic Spastic- kinetic kinetic, kinetic, flaccid chorea dystonia (ALS) Monopitch Harshness Imprecise consonants Mono-loud Distorted vowels Slow rate Short phrases Hypernasal Prolonged intervals Low pitch Inappropriate silences Variable rate Breathy voice fear fair (After Darley et al. , 1969) Strain-strangled voice … SPOC OC lab ab 5 Dysarth thria signal p proce cessin ing a and oral c communic icatio ion

  6. The br broade der neuro-motor deficits associated with dysarthria can make tradit ditio iona nal human-computer interaction difficult. Can we use ASR for dysarthria? SPOC OC lab ab 6 Dysarth thria signal p proce cessin ing a and oral c communic icatio ion

  7. • Ergodic dic HMMs can be robus bust against recurring pa paus uses, and non on-speec eech events. • Polur and Miller (2005) repla laced ed GMM GMM densities wi with th neur neural net networks (after Jayaram and Abdelhamied, 1995), further increa easing a g accura uracy. b 0 b 1 b 2 (From Polur and Miller . , 2005) SPOC OC lab ab 7 Dysarth thria signal p proce cessin ing a and oral c communic icatio ion

  8. 90 80 Non-dysarthric No ric 70 %) (%) accuracy ( 60 50 ion a recognitio 40 30 Word r 20 Dysarthric ric 10 0 2 4 6 8 10 12 14 16 Nu Numb mber o of Gaussia ians SPOC OC lab ab 8 Dysarth thria signal p proce cessin ing a and oral c communic icatio ion

  9. (From Kain et al. , 2007) Non Non-dysarth thric ic Dysar Dy arthri ric This acous ustic behaviour is indicative of underlying articu icula latory behaviour. SPOC OC lab ab 9 Dysarth thria signal p proce cessin ing a and oral c communic icatio ion

  10. • TORGO was built to train augmented ASR systems. • 9 subjects with h cerebra ral pals lsy, 9 mat atched controls. • Each reads 500—1000 prompts over 3 ho hour urs that cover phone nemes mes and articu icula latory contra rasts (e.g., meat vs. beat ). • Elect ctromagnetic c articu culo logra raphy (and video) track points to <1 mm error. SPOC OC lab ab 10 TO TORGO signal p proce cessin ing a and oral c communic icatio ion

  11. Dysarthric ric Non-dysarthric No ric SPOC OC lab ab 11 TO TORGO signal p proce cessin ing a and oral c communic icatio ion

  12. Speak peaker er 𝑰 ( 𝑩𝑩𝑩𝑩𝑩 ) 𝑰 ( 𝑩𝑩𝒖𝒖𝑩 ) 𝑰 ( 𝑩𝑩 | 𝑩𝑩 ) M01 66.37 17.16 50.30 M04 33.36 11.31 26.25 Dysarthric F03 42.38 19.33 39.47 Average 47.34 15.93 38.68 MC01 24.40 21.49 1.14 MC03 18.63 18.34 3.93 Control FC02 16.12 15.97 3.11 Average 19.72 18.60 2.73 Dysarthric ac acoustic ics Dysarthric arti articula latio ion Dysarthric acoustics are far more statistic- is just as statistically are far less predi predictab able le ally disordered than ordered as the control from articulation. the control data data SPOC OC lab ab 12 TORGO TO signal p proce cessin ing a and oral c communic icatio ion

  13. l 1 l 2 l 3 q 1 q 2 q 3 o 1 o 1 o 1 Dynamic Bay Dy ayes nets Condi ditional al rando dom fields ds (DBN BN-F) F) (LDCRF) RF) ... ... Neural net Neur networks Sup upport vec ector ma machi hine nes SPOC OC lab ab 13 TO TORGO signal p proce cessin ing a and oral c communic icatio ion

  14. O O’ O’ O’’ ’’ O O’ O’ O’’ ’’ O O Ph Ph Ph Ph Ph Ph Ph Ph Ph Ph Ph Ph Q Q Q Q Q Q A A A A A A A’ A’ A’ A’ A’ A’ A’ A’ O O A’’ A’ A’’ A’ A’ A’’ A’ A’’ O O’ O’ O’’ ’’ O O’ O’ O O O’’ ’’ DBN BN-A DBN BN-A2 A2 DBN BN-A3 A3 SPOC OC lab ab 14 TO TORGO signal p proce cessin ing a and oral c communic icatio ion

  15. O O’ O’ O’’ ’’ O O’ O’ O’’ ’’ O O Ph Ph Ph Ph Ph Ph Ph Ph Ph Ph Ph Ph Q Q Q Q Q Q A A A A A A A’ A’ A’ A’ A’ A’ A’ A’ O O A’’ A’ A’’ A’ A’ A’’ A’ A’’ O O’ O’ O’’ ’’ O O’ O’ O O O’’ ’’ DBN BN-A DBN BN-A2 A2 DBN BN-A3 A3 SPOC OC lab ab 15 TO TORGO signal p proce cessin ing a and oral c communic icatio ion

  16. DB DBN NN NN Sever erit ity of of HMM MM LDCR DCRF DB DBN-F DB DBN-A ML MLP Elman an dysarth rthri ria Sever ere 14.1 15.2 15.0 16.4 6.4 15.5 15.6 Modera rate te 27.8 28.0 28.0 31 31.1 .1 28.6 30.5 Mild ild 51.6 51.8 51.6 54 54.2 .2 51.4 51.2 Control Con 72.8 73.5 73.3 73.6 72.6 72.7 Average % phoneme accuracy (frame-level) with speaker-dependent training SPOC OC lab ab 16 TORGO TO signal p proce cessin ing a and oral c communic icatio ion

  17. SPOC OC lab ab 17 signal p proce cessin ing a and oral c communic icatio ion

  18. We wish to classify dysarthric speech in a low-dimensional and informative space that incorporates go goal al-bas ased ed and long ng-ter erm dy m dynamic amics. T ongue body constriction degree ‘pub’ lip aperture glottis tim ime Task-dynamics : Represents speech as goal-based We require a th theo eoret etic ical framew l framework rk to represent reconfigurations of the vocal tract. relevant and continuous articulatory motion. 𝑁𝑨 ′′ + 𝐶𝑨 ′ + 𝐿 ( 𝑨 − 𝑨 0 ) SPOC OC lab ab 18 T ask d dynamics signal p proce cessin ing a and oral c communic icatio ion

  19. Ataxic Ataxic Flaccid Flaccid Hypo- Hypo- Hyper- Hyper- Hyper- Hyper- Spastic Spastic Spastic- Spastic- kinetic kinetic kinetic, kinetic, kinetic, kinetic, flaccid flaccid chorea chorea dystonia dystonia (ALS) (ALS) Monopitch Monopitch Harshness Harshness Imprecise consonants Imprecise consonants Mono-loud Mono-loud Distorted vowels Distorted vowels Slow rate Slow rate Short phrases Short phrases Hypernasal Hypernasal Prolonged intervals Prolonged intervals Low pitch Low pitch Inappropriate silences Inappropriate silences Task-dynamics : Variable rate Variable rate Breathy voice Breathy voice Strain-strangled voice Strain-strangled voice 𝑵𝑨 ′′ + 𝑪𝑨 ′ + 𝑳 ( 𝑨 − 𝒜 𝟏 ) … … SPOC OC lab ab 19 T ask d dynamics signal p proce cessin ing a and oral c communic icatio ion

  20. As we develop an extens ensio ion or al alternat ative to task • dynamics, we have to consider: 1. 1. Tim imin ing. a) Inter-articulator co-ordination. b) Rhythm. 2. Feedba 2. dback ck. a) Acoustic, proprioceptive, and tactile. 3. 3. High gher-leve vel f features a) Syntax and meaning SPOC OC lab ab 20 T ask d dynamics signal p proce cessin ing a and oral c communic icatio ion

  21. • In TD, pa pairs o of go goals are dy dynami mica cally co y coupl pled d in time. • Articulators are ph phase se-locked (0˚ or 180 ˚; Goldstein et al., 2005 ) 𝜏 ONS RIME • (C)CV )CV pairs stabilize in in-ph phase se. p ʌ b • V(C)C C)C pairs stabilize an anti-phase se. TBCD • Kin Kinematic err rror ors occur when co compe mpeting gestures are repeate ted LA and tend to stabilize incorrectl tly. • e.g., repeat koptop (Nam et al , 2010) . GLO tim ime SPOC OC lab ab 21 1. . Tim iming signal p proce cessin ing a and oral c communic icatio ion

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend