Jeremy *Materials in this presentation partially comes from Daniel - - PowerPoint PPT Presentation
Jeremy *Materials in this presentation partially comes from Daniel - - PowerPoint PPT Presentation
Understanding Dyadic Human Spoken Interactions Using Speech Processing Techniques: Case studies in Autism Spectrum Disorder (ASD) and behavioral Couple Therapy Jeremy *Materials in this presentation partially comes from Daniel Bone,
2
Picture credit to the USC SAIL lab: http://sail.usc.edu
3
Employ and advance signal processing and machine learning to sense human behaviors
- Aid in, and transform the traditional observational methods
- Focus on mental health research and practice
Many benefits: speedup, parallel observation capabilities, large scale trends etc.
- Significance: USA--‐10mil people receive psychotherapy every year, increasing!
- State of the art hasn’t changed for decades
What is BSP?
4
Mental health: traditional observational study
*Picture credit to
- S. Narayanan and P. G. Georgiou, "Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and
Language: Computational techniques are presented to analyze and model expressed and perceived human behavior-variedly characterized as typical, atypical, distressed, and disordered-from speech and language cues and their applications in health, commerce, education, and beyond," Proc IEEE Inst Electr Electron Eng, vol. 101, pp. 1203-1233, Feb 7 2013.
5
Mental health: putting BSP in the loop
*Picture credit to
- S. Narayanan and P. G. Georgiou, "Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and
Language: Computational techniques are presented to analyze and model expressed and perceived human behavior-variedly characterized as typical, atypical, distressed, and disordered-from speech and language cues and their applications in health, commerce, education, and beyond," Proc IEEE Inst Electr Electron Eng, vol. 101, pp. 1203-1233, Feb 7 2013.
6
7
Case Study I
Domain: behavioral couple therapy Specifics: problem solving interactions as part of IBCT Engineering Task: interaction modeling (vocal synchrony quantification)
Chi-Chun Lee, Athanasios Katsamanis, Matthew Black, Brian Baucom, Andrew Christensen, Panayiotis G. Georgiou, and Shrikanth S. Narayanan, "Compute Vocal Entrainment: A Signal-Derived PCA-based quantification Scheme with Application for Affect Analysis in Married Couple Interactions ", in: Journal of Computer Speech and Language, 28(2): 518-539 doi:10.1016/j.csl.2012.06.006
8
Couple therapy: Integrated Behavior Couple Therapy
9
- Collaborative work between UCLA and UW
- 134 seriously and chronically distressed REAL couples
- 10 minutes long problem-solving spoken interactions
- Audio-video recording (far-field microphone, varying noise conditions)
- 33 global ratings of behavioral codes for each spouse (SSIRS, CIRS)
- 372 Sessions 90 hours of data
- Manual transcripts available
Couple therapy database
Studying this large amount of data in a spontaneous fashion
Application Domain 2: Couples Therapy Research 10 / 55
Automatic pre-processing: automatic speaker segmentation
- Segment the sessions into meaningful regions
– Recursive automatic speech-text alignment technique [Moreno 1998] – Session split into regions: wife/husband/unknown – Segmented >60% of sessions’ words into wife/husband regions for 293/574 sessions
“… that she’s known for five months and didn’t tell me …” Example: Aligned Text
AM = Acoustic Model LM = Language Model Dict = Dictionary MFCC = Mel-Frequency Cepstral Coefficients ASR = Automatic Speech Recognition HYP = ASR Hypothesized Transcript *slide content credit to Dr. Matthew P. Black
Application Domain 2: Couples Therapy Research 11 / 55
Automatic acoustic feature extraction: LLDs computation
- Acoustic features shown to be relevant (e.g., [Gottman 1977, Yildirim et al. 2010])
- 11 low-level descriptors (LLDs) extracted every 10ms with 25ms window
– Voice Activity Detector (VAD), speaking rate, pitch, energy, harmonics-to-noise ratio, voice quality, 13 MFCCs, 26 MFBs, magnitude of spectral centroid, spectral flux
- Each session split into 3 “domains”: wife, husband, speaker-independent
- 13 statistics (mean, std. dev. …) across each domain for each LLD
– 2000 features capture the global acoustic properties for each spouse
*slide content credit to Dr. Matthew P. Black
12
- Definition:
- Naturally-spontaneous behavioral matching between dyadic social
interactions
- Purpose in human interactions
- Achieving communication efficiency* – unintentional effort
- Communicating interest and engagement* – conscious effort
- Psychological significance in theory and practice
- Learning and memory in child-parent interactions
- Regulating emotion processes*
- Precursor to empathy
- Mirroring neurons
What is vocal synchrony?
No quantification method is present! Can we do it even when it’s not possible for human perception – no ground truth
13
Unsupervised signal-derived method
14
Verification
15
Study of behavioral codes and vocal synchrony
16
Utilization as features for affect recognition application
04/11/2013 17
Utilization as quantitative metrics for clinical analysis via MLM Analysis
husband-to-wife entrainment wife demander/husband withdrawer polarization p<0.001, within-partner p<0.001 between-partner not significant p<0.01 between-partner
Wife-demander/Husband-withdrawer
wife-to-husband entrainment wife demander/husband withdrawer polarization husband-to-wife entrainment wife demander/husband withdrawer polarization wife-to-husband entrainment wife demander/husband withdrawer polarization
Clinical Implications Behavioral Informatics
18
Case Study II
Domain: Autism spectrum disorder Specifics: ADOS III Interview session Engineering Task: interaction modeling (atypical prosody quantification)
*slide content credit to Daniel Bone
Daniel Bone, Chi-Chun Lee, Matthew Black, Marian Williams, Sungbok Lee, Pat Levitt, and Shrikanth S. Narayanan, “The Psychologist as an Interlocutor in ASD Assessment: Insights from a Study of Spontaneous Prosody”, in: Journal of Speech, Language, and Hearing Research 2014 Feb 11. doi: 10.1044/2014_JSLHR-S-13-0062
19
Autism Spectrum Disorder: ADOS session
20
ADOS – Module 3: behavioral codes
21
- ADOS semi-structured assessment framework
- Used to help psychologists diagnose autism (one popular tool)
- Subject interacts with a psychologist for ~30-45 minutes
- Constrained developmentally-appropriate tasks
- 4 modules, depending on expressive language level and age
- Module 1 (less than phrase speech): Free play, response to joint attention
- Module 2 (some phrase speech): Joint interactive play, bubble play
- Module 3 (verbally fluent): Make-believe play, telling a story from a book
- Module 4 (verbally fluent adolescents/adults): More interview style
- Psychologist rate the child’s socio-communicative skills
- e.g., speech abnormalities (intonation/volume/rhythm/rate)
- e.g., reciprocal social interaction (unusual eye contact)
- Scores on sub-assessments added, and total score is used to diagnose ASD
- Psychologists trained to administer ADOS using stringent training protocol
22
Atypical Prosody
- Prosody refers to the way in which something is said (rhythm)
- Intonation, Volume, Rate, and Voice Quality
- Critical role in expressivity and social-affective reciprocity
- Variety of abnormalities
- Monotonous
- Atypical lexical stress and pragmatic prosody
- Speaking Rate
- “Bizarre” quality to speech
- Qualitative descriptions are general and contrasting, “bizarre”
"slow, rapid, jerky and irregular in rhythm, odd intonation or inappropriate pitch and stress, markedly flat and toneless, or consistently abnormal volume” -[Lord et al. 2003]
23
USC CARE Corpus
- Child-psychologist ADOS interactions
- ADOS- Autism Diagnostic Observation Schedule. [Lord et al., 2000]
- Multimodal: 2 HD video and 2 far-field microphones (ecological validity)
Experimental Setup: Subject Sample
24
- Analysis focused on subjects administered the ADOS Module 3
- Verbally fluent children and young adults
- 30 sessions total, 28 appropriate for analysis
- Manual transcription and segmentation
- Transcription: spoken words, non-verbal communication, and
vocalizations
- Segmentation: single speaker utterances, temporal markings
- Psychologists
- Three trained clinical psychologists conducted the ADOS sessions
- Each psychologist administered ~9 sessions
25
- Coding
- 60 minute session/14 subtasks
- 28 codes scored by psychologist that is interacting with child
- Not all codes used
- Code of Interest–Speech Abnormalities Associated with Autism
- Scored on an integer scale from ‘0’ (appropriate) to ‘2’ (clearly abnormal)
- Code of Interest-ADOS Totals
- ADOS totals relate to ‘Severity’ of autism spectrum disorder
- Three total codes: Communication, Social Interaction, and C.+S.I.
- Higher resolution, (min. 0, max. 8-22)
- Spearman’s ρ=0.74 (p<10e-6) for Speech Abnormality and C.+S.I. Total
Experimental Setup: Labels
26
- ASD literature: intonation, volume, rate, and voice quality
- 25 acoustic-prosodic features per speaker
- Intonation and volume (12 functionals)
- Mean (μ) and Stdv (σ) of 2nd-order coefficients
- Speaking rate and rhythm (9 functionals)
- Mean (μ) and 90% quantile (q90) of both turn-end and non-turn-end
syllabic-speaking rate
- Mean (μ) and Stdv (σ) of vowel and consonant durations
- Proportion of vowel speech to total speech
- Voice quality (4 functionals)
- Median and inter-quartile ratio (IQR) of jitter and shimmer
- Jitter and shimmer are extracted on extended vowels (at
least 3T0)
Experimental Setup: Acoustic features
27
- Word-level features:
- Phrase boundary prosody is most perceptually salient, so we extract
word-level features on turn-end words.
- We utilize lexical transcriptions and turn-level alignments.
- HTK alignment, Colorado Corpus children’s models, WSJ adult models
(also used for phonetic features)
- Intonation (pitch) and volume (intensity) contours:
- Extracted using Praat. Both in log-domain.
- Normalized per speaker by subtracting mean.
- Contour are bounded in range [-1,1], then fit by a 2nd-order polynomial
Experimental Setup: Acoustic features
28
29
Speech processing technology in behavioral science
Engineering pros:
Large scale, consistency, parallel observation (third-party) – interaction modeling
Scientific pros:
Quantitative insights, data-driven discovery, objective method
Challenges:
Large amount heterogeneity in behavioral production, perception, and effects coupling during interaction (while interaction is essential!)
Human-centered Behavioral Signal Processing
Behavioral Interface
Behavioral Informatics
Behavioral Signal Processing
Psychiatry Psychology Behavioral Couple therapy
- Automating manual observational coding
Autism Spectrum Disorder (ASD)
- ADOS diagnosis: Prosody modeling
- RapidABC: speech engagement modeling
- Virtual game: Physiology signal processing
Rapid Automatized Naming
- Quantification of eye-voice coordination
Affective Computing
- Multimodal emotion recognition
- Multimedia theater acting analysis
- Cross-corpora recognition
Interaction Synchrony
- Quantification of vocal behavior
synchronization
Automatic performance scoring
- Impromptu speech scoring
Human-Machine Interface Education
31