SPOC OC lab ab signal p proce cessin ing a and oral c - - PowerPoint PPT Presentation

spoc
SMART_READER_LITE
LIVE PREVIEW

SPOC OC lab ab signal p proce cessin ing a and oral c - - PowerPoint PPT Presentation

SPOC OC lab ab signal p proce cessin ing a and oral c communic icatio ion Observing how closed ed syst systems fa fail can be a valua luable m ble method in discovering how those systems wo work. Pa Paul Bro roca (left)


slide-1
SLIDE 1

SPOC OClab

ab

signal p proce cessin ing a and

  • ral c

communic icatio ion

slide-2
SLIDE 2

SPOC OClab

ab

signal p proce cessin ing a and

  • ral c

communic icatio ion

  • Observing how closed

ed syst systems fa fail can be a valua luable m ble method in discovering how those systems wo work.

Introd

  • duc

uction

  • n

2

  • Pa

Paul Bro roca (left) discovered, in 1861, that a lesio ion n in the lef left ventro-posterior fr fron

  • ntal

lo lobe be caused expressi ssive ve a aphasi sia.

  • This was the first dir

direct evidence that la lang ngua uage f func unction was locali lized ed.

  • It hinted at a mechanis

nistic ic view of spee peech pr h produ duction.

Broca’s ’s are rea

slide-3
SLIDE 3

SPOC OClab

ab

signal p proce cessin ing a and

  • ral c

communic icatio ion

Neur euro-mot

  • tor
  • r articulatory

disorders resulting in unint ntell elligible igible speech.

Introd

  • duc

uction

  • n

3

7.5 million Americans have dysarthria hria

  • Cerebral palsy,
  • Parkinson’s,
  • Amyotrophic

lateral sclerosis)

(National Institute of Health)

slide-4
SLIDE 4

SPOC OClab

ab

signal p proce cessin ing a and

  • ral c

communic icatio ion

  • Ty

Types of dysarthria are related to specif ific ic s sites es in the subcortical nervous system.

Dysarth thria

Type ype Primary les lesio ion s sit ite Ataxic Cer ereb ebellu lum or its outflow pathways Flaccid Lower er mo motor ne neur uron (≥1 cranial nerves) Hypo- kinetic Basal sal g ganglia (esp. substantia nigra) Hyper- kinetic Basal sal g ganglia (esp. putamen or caudate) Spastic Uppe pper mo motor ne neur uron Spastic- flaccid Both uppe pper and lower mo motor ne neur urons ns

(After Darley et al., 1969) 4

slide-5
SLIDE 5

SPOC OClab

ab

signal p proce cessin ing a and

  • ral c

communic icatio ion

Dysarth thria 5 (After Darley et al., 1969) Ataxic Flaccid Hypo- kinetic Hyper- kinetic, chorea Hyper- kinetic, dystonia Spastic Spastic- flaccid (ALS) Monopitch Harshness Imprecise consonants Mono-loud Distorted vowels Slow rate Short phrases Hypernasal Prolonged intervals Low pitch Inappropriate silences Variable rate Breathy voice Strain-strangled voice …

fear fair

slide-6
SLIDE 6

SPOC OClab

ab

signal p proce cessin ing a and

  • ral c

communic icatio ion

The br broade der neuro-motor deficits associated with dysarthria can make tradit ditio iona nal human-computer interaction difficult. Can we use ASR for dysarthria?

Dysarth thria 6

slide-7
SLIDE 7

SPOC OClab

ab

signal p proce cessin ing a and

  • ral c

communic icatio ion

Dysarth thria 7

b0 b1 b2

  • Ergodic

dic HMMs can be robus bust against recurring pa paus uses, and non

  • n-speec

eech events.

  • Polur and Miller (2005)

repla laced ed GMM GMM densities wi with th neur neural net networks

(after Jayaram and Abdelhamied, 1995),

further increa easing a g accura uracy.

(From Polur and Miller., 2005)

slide-8
SLIDE 8

SPOC OClab

ab

signal p proce cessin ing a and

  • ral c

communic icatio ion

10 20 30 40 50 60 70 80 90 2 4 6 8 10 12 14 16

Word r recognitio ion a accuracy ( (%) %) Nu Numb mber o

  • f Gaussia

ians

No Non-dysarthric ric Dysarthric ric

Dysarth thria 8

slide-9
SLIDE 9

SPOC OClab

ab

signal p proce cessin ing a and

  • ral c

communic icatio ion

Non Non-dysarth thric ic Dy Dysar arthri ric

This acous ustic behaviour is indicative of underlying articu icula latory behaviour.

Dysarth thria 9 (From Kain et al., 2007)

slide-10
SLIDE 10

SPOC OClab

ab

signal p proce cessin ing a and

  • ral c

communic icatio ion

  • TORGO was built to train augmented ASR systems.
  • 9 subjects with

h cerebra ral pals lsy, 9 mat atched controls.

  • Each reads 500—1000 prompts over 3 ho

hour urs that cover phone nemes mes and articu icula latory contra rasts (e.g., meat vs. beat).

  • Elect

ctromagnetic c articu culo logra raphy (and video) track points to <1 mm error.

TO TORGO 10

slide-11
SLIDE 11

SPOC OClab

ab

signal p proce cessin ing a and

  • ral c

communic icatio ion

TO TORGO 11

No Non-dysarthric ric Dysarthric ric

slide-12
SLIDE 12

SPOC OClab

ab

signal p proce cessin ing a and

  • ral c

communic icatio ion

TO TORGO 12

Speak peaker er 𝑰(𝑩𝑩𝑩𝑩𝑩 ) 𝑰(𝑩𝑩𝒖𝒖𝑩 ) 𝑰(𝑩𝑩 | 𝑩𝑩 ) Dysarthric M01 66.37 17.16 50.30 M04 33.36 11.31 26.25 F03 42.38 19.33 39.47 Average 47.34 15.93 38.68 Control MC01 24.40 21.49 1.14 MC03 18.63 18.34 3.93 FC02 16.12 15.97 3.11 Average 19.72 18.60 2.73 Dysarthric ac acoustic ics are far more statistic- ally disordered than the control data Dysarthric arti articula latio ion is just as statistically

  • rdered as the control

data Dysarthric acoustics are far less predi predictab able le from articulation.

slide-13
SLIDE 13

SPOC OClab

ab

signal p proce cessin ing a and

  • ral c

communic icatio ion

Condi ditional al rando dom fields ds (LDCRF) RF) Neur Neural net networks Sup upport vec ector ma machi hine nes

q1 q2 q3

  • 1
  • 1
  • 1

l1 l2 l3

Dy Dynamic Bay ayes nets (DBN BN-F) F)

... ...

TO TORGO 13

slide-14
SLIDE 14

SPOC OClab

ab

signal p proce cessin ing a and

  • ral c

communic icatio ion

DBN BN-A DBN BN-A2 A2 DBN BN-A3 A3

Q Q A A Ph Ph Ph Ph A’ A’ A’ A’ A’ A’’ A’ A’’ O O O’ O’ O’’ ’’ O’ O’ O’’ ’’ O O O’ O’ O’’ ’’ O’ O’ O’’ ’’ Q Q A A Ph Ph Ph Ph A’ A’ A’ A’ A’ A’’ A’ A’’ O O O O Q Q A A Ph Ph Ph Ph O O

TO TORGO 14

slide-15
SLIDE 15

SPOC OClab

ab

signal p proce cessin ing a and

  • ral c

communic icatio ion

DBN BN-A DBN BN-A2 A2 DBN BN-A3 A3

Q Q A A Ph Ph Ph Ph A’ A’ A’ A’ A’ A’’ A’ A’’ O O O’ O’ O’’ ’’ O’ O’ O’’ ’’ O O O’ O’ O’’ ’’ O’ O’ O’’ ’’ Q Q A A Ph Ph Ph Ph A’ A’ A’ A’ A’ A’’ A’ A’’ O O O O Q Q A A Ph Ph Ph Ph O O

TO TORGO 15

slide-16
SLIDE 16

SPOC OClab

ab

signal p proce cessin ing a and

  • ral c

communic icatio ion

TO TORGO 16

Sever erit ity

  • f
  • f

dysarth rthri ria HMM MM LDCR DCRF DB DBN NN NN DB DBN-F DB DBN-A ML MLP Elman an Sever ere 14.1 15.2 15.0 16.4 6.4 15.5 15.6 Modera rate te 27.8 28.0 28.0 31 31.1 .1 28.6 30.5 Mild ild 51.6 51.8 51.6 54 54.2 .2 51.4 51.2 Con Control 72.8 73.5 73.3 73.6 72.6 72.7

Average % phoneme accuracy (frame-level) with speaker-dependent training

slide-17
SLIDE 17

SPOC OClab

ab

signal p proce cessin ing a and

  • ral c

communic icatio ion

17

slide-18
SLIDE 18

SPOC OClab

ab

signal p proce cessin ing a and

  • ral c

communic icatio ion

‘pub’

We wish to classify dysarthric speech in a low-dimensional and informative space that incorporates go goal al-bas ased ed and long ng-ter erm dy m dynamic amics.

T

  • ngue body

constriction degree glottis lip aperture

We require a th theo eoret etic ical framew l framework rk to represent relevant and continuous articulatory motion.

tim ime

T ask d dynamics 18

Task-dynamics: Represents speech as goal-based reconfigurations of the vocal tract. 𝑁𝑨′′ + 𝐶𝑨′ + 𝐿(𝑨 − 𝑨0)

slide-19
SLIDE 19

SPOC OClab

ab

signal p proce cessin ing a and

  • ral c

communic icatio ion

Ataxic Flaccid Hypo- kinetic Hyper- kinetic, chorea Hyper- kinetic, dystonia Spastic Spastic- flaccid (ALS) Monopitch Harshness Imprecise consonants Mono-loud Distorted vowels Slow rate Short phrases Hypernasal Prolonged intervals Low pitch Inappropriate silences Variable rate Breathy voice Strain-strangled voice … T ask d dynamics 19 Ataxic Flaccid Hypo- kinetic Hyper- kinetic, chorea Hyper- kinetic, dystonia Spastic Spastic- flaccid (ALS) Monopitch Harshness Imprecise consonants Mono-loud Distorted vowels Slow rate Short phrases Hypernasal Prolonged intervals Low pitch Inappropriate silences Variable rate Breathy voice Strain-strangled voice …

Task-dynamics: 𝑵𝑨′′ + 𝑪𝑨′ + 𝑳(𝑨 − 𝒜𝟏)

slide-20
SLIDE 20

SPOC OClab

ab

signal p proce cessin ing a and

  • ral c

communic icatio ion

T ask d dynamics 20

  • As we develop an extens

ensio ion or al alternat ative to task dynamics, we have to consider:

1.

  • 1. Tim

imin ing.

a) Inter-articulator co-ordination. b) Rhythm.

2.

  • 2. Feedba

dback ck.

a) Acoustic, proprioceptive, and tactile.

3.

  • 3. High

gher-leve vel f features

a) Syntax and meaning

slide-21
SLIDE 21

SPOC OClab

ab

signal p proce cessin ing a and

  • ral c

communic icatio ion

1. . Tim iming 21

TBCD GLO LA tim ime

𝜏

ONS RIME

p ʌ b

  • In TD, pa

pairs o

  • f go

goals are dy dynami mica cally co y coupl pled d in time.

  • Articulators are ph

phase se-locked (0˚ or 180˚; Goldstein et al., 2005)

  • (C)CV

)CV pairs stabilize in in-ph phase se.

  • V(C)C

C)C pairs stabilize an anti-phase se.

  • Kin

Kinematic err rror

  • rs occur when

co compe mpeting gestures are repeate ted and tend to stabilize incorrectl tly.

  • e.g., repeat koptop (Nam et al, 2010).
slide-22
SLIDE 22

SPOC OClab

ab

signal p proce cessin ing a and

  • ral c

communic icatio ion

1. . Tim iming 22

  • Cerebellar at

ataxi axia often pr prohibi bits control over more than

  • ne articulator at a time.
  • Apraxi

axia generates incorrect motor pl plans, wholly distorting ing gestural go goals, hence timing.

  • Dy

Dysar arthric speech nea nearly equally lly consists of steady dy-states (49.95%) and transit itions ions (50.05%) (Vollmer, 1997).

  • Ty

Typical speech consists of ~82. 82.14 14% steady-states.

slide-23
SLIDE 23

SPOC OClab

ab

signal p proce cessin ing a and

  • ral c

communic icatio ion

1. . Tim iming 23

  • Rhyt

hythm (the distribution of emph mphasis) is not

  • t part of TD.
  • Trem

emor

  • r behaves as oscillations about an equilibrium.
  • There is evidenc

dence that people with Park rkins inson’ n’s coordinate volunt untary movement with inv involunt ntary tremors (Kent et al., 2000).

  • Rhyt

hythm in at ataxi axic dysarthria formalized by aberrations in a ‘scanning index’, 𝑇𝑇, consisting of syllable lengths 𝑇𝑗, 𝑻𝑻 = ∏ 𝑻𝒖

𝒐 𝒖=𝟐

∑ 𝑻𝒖

𝒐 𝒖=𝟐

𝒐

𝒐

(Ackermann and Hertrich, 1994))

slide-24
SLIDE 24

SPOC OClab

ab

signal p proce cessin ing a and

  • ral c

communic icatio ion

  • 2. F

Feedba dback 24

  • Dysarthria can affect senso

sory y cranial nerves.

  • Parki

kinson’s di disease reduces tempo mporal discrimination in tac actile, au auditory, and vi visua ual stimuli.

  • Likely explanation is that da

damage ge to the ba basal ga gangl nglia prohibit ibits the formation of sens ensory t targe gets (Kent et al., 2000).

  • The result is under

underestimated movement.

  • Cerebe

bellar di disease results in dy dysme smetria since the interna nal l mo mode del of the ske keletomuscu scular system m is dysfunc nctiona ional.

  • The cerebellum

ebellum is apparently used in the prepa para ratio ion and revis isio ion of movement ents.

slide-25
SLIDE 25

SPOC OClab

ab

signal p proce cessin ing a and

  • ral c

communic icatio ion

  • The DIVA model is suppo

upposed to model feedback, but is largely specula ulativ ive on neurolo logic gical aspects.

  • Here, sound

und targ rgets and somatose senso sory t y targets s are learned ned during ‘babbling’ and modif dify articulatory go goals ls.

  • 2. F

Feedba dback 25

Speech Sound Map (Premotor Cortex) Articulator Velocity and Position Cells (Motor Cortex) Auditory Error (Auditory Cortex) Somatosensory Error (Somatosensory Cortex) Auditory Goal Region Somatosensory Goal Region Somatosensory State Auditory State Feedforward Command To Muscles Auditory Feedback- Based Command Somatosensory Feedback- Based Command Speech Sound Map (Premotor Cortex) Articulator Velocity and Position Cells (Motor Cortex) Auditory Error (Auditory Cortex) Somatosensory Error (Somatosensory Cortex) Auditory Goal Region Somatosensory Goal Region Somatosensory State Auditory State Feedforward Command To Muscles Auditory Feedback- Based Command Somatosensory Feedback- Based Command

Mae aeda m mod

  • del
  • This is meant to imitate the

cerebellum (or basal ganglia).

slide-26
SLIDE 26

SPOC OClab

ab

signal p proce cessin ing a and

  • ral c

communic icatio ion

Aphas asia 26

Broca’s ’s aph aphasia Wernicke icke’s aphasia ia

  • Reduc

Reduced hierarchical syn syntax.

  • Anomia

ia.

  • Reduc

Reduced “mirroring” between

  • bs

bser ervation and execut utio ion of gestur ures es (Rizzolatti & Arbib, 1998).

  • Norm

rmal intonation/rhythm.

  • Meaningles

ningless words.

  • ‘Jumbled

bled’ syntax.

  • Reduc

Reduced comprehension.

slide-27
SLIDE 27

SPOC OClab

ab

signal p proce cessin ing a and

  • ral c

communic icatio ion

Dysart rthria hria is a prevalent disorder that would be mitigated to some extent by impro roved s ed speech t h techn hnolo logy gy. Some benef benefit can be derived by building in explic licit artic icula ulatory ry- acous ustic ic s statis istic ics into simple acous ustic m model dels for dysart rthria ia. About 3. 3.3% 3% improvement in pho phone neme err error ra rate for modera derately dysarthric given models trained with EMA data. Dysarthria presents with compl plex lo long ng-term erm e effec ects that are diffic icult ult to c capture ure in short-time models Ex Exten ensions ns to ta task-dy dynamic ics, e.g., should take into account some of these phenomena.