jeremy
play

Jeremy *Materials in this presentation partially comes from Daniel - PowerPoint PPT Presentation

Understanding Dyadic Human Spoken Interactions Using Speech Processing Techniques: Case studies in Autism Spectrum Disorder (ASD) and behavioral Couple Therapy Jeremy *Materials in this presentation partially comes from Daniel Bone,


  1. Understanding Dyadic Human Spoken Interactions Using Speech Processing Techniques: Case studies in Autism Spectrum Disorder (ASD) and behavioral Couple Therapy Jeremy 李祈均 *Materials in this presentation partially comes from Daniel Bone, Dr. Matt Black, Prof. Panos Georgiou, Prof. Shri Narayanan

  2. Picture credit to the USC SAIL lab: http://sail.usc.edu 2

  3. What is BSP? Employ and advance signal processing and machine learning to sense human behaviors • Aid in, and transform the traditional observational methods • Focus on mental health research and practice Many benefits: speedup, parallel observation capabilities, large scale trends etc. • Significance: USA-- ‐ 10mil people receive psychotherapy every year, increasing! • State of the art hasn’t changed for decades 3

  4. Mental health: traditional observational study * Picture credit to S. Narayanan and P. G. Georgiou, "Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language: Computational techniques are presented to analyze and model expressed and perceived human behavior-variedly characterized as typical, atypical, distressed, and disordered-from speech and language cues and their applications in health, 4 commerce, education, and beyond," Proc IEEE Inst Electr Electron Eng, vol. 101, pp. 1203-1233, Feb 7 2013.

  5. Mental health: putting BSP in the loop * Picture credit to S. Narayanan and P. G. Georgiou, "Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language: Computational techniques are presented to analyze and model expressed and perceived human behavior-variedly characterized as typical, atypical, distressed, and disordered-from speech and language cues and their applications in health, 5 commerce, education, and beyond," Proc IEEE Inst Electr Electron Eng, vol. 101, pp. 1203-1233, Feb 7 2013.

  6. 6

  7. Case Study I Domain: behavioral couple therapy Specifics: problem solving interactions as part of IBCT Engineering Task: interaction modeling (vocal synchrony quantification) Chi-Chun Lee, Athanasios Katsamanis, Matthew Black, Brian Baucom, Andrew Christensen, Panayiotis G. Georgiou, and Shrikanth S. Narayanan, "Compute Vocal Entrainment: A Signal-Derived PCA-based quantification Scheme with Application for Affect Analysis in Married Couple Interactions ", in: Journal of 7 Computer Speech and Language, 28(2): 518-539 doi:10.1016/j.csl.2012.06.006

  8. Couple therapy: Integrated Behavior Couple Therapy 8

  9. Couple therapy database • Collaborative work between UCLA and UW • 134 seriously and chronically distressed REAL couples • 10 minutes long problem-solving spoken interactions • Audio-video recording (far-field microphone, varying noise conditions) • 33 global ratings of behavioral codes for each spouse (SSIRS, CIRS) 372 Sessions  90 hours of data • • Manual transcripts available Studying this large amount of data in a spontaneous fashion 9

  10. Automatic pre-processing: automatic speaker segmentation • Segment the sessions into meaningful regions – Recursive automatic speech-text alignment technique [Moreno 1998] – Session split into regions: wife/husband/unknown – Segmented >60% of sessions’ words into wife/husband regions for 293/574 sessions Example: Aligned Text MFCC = Mel-Frequency Cepstral Coefficients AM = Acoustic Model “… that she’s known ASR = Automatic Speech Recognition LM = Language Model for five months and HYP = ASR Hypothesized Transcript Dict = Dictionary didn’t tell me …” Application Domain 2: Couples Therapy 10 / 55 *slide content credit to Dr. Matthew P. Black Research

  11. Automatic acoustic feature extraction: LLDs computation • Acoustic features shown to be relevant (e.g., [Gottman 1977, Yildirim et al. 2010]) • 11 low-level descriptors (LLDs) extracted every 10ms with 25ms window – Voice Activity Detector (VAD), speaking rate, pitch, energy, harmonics-to-noise ratio, voice quality, 13 MFCCs, 26 MFBs, magnitude of spectral centroid, spectral flux • Each session split into 3 “domains” : wife, husband, speaker-independent • 13 statistics (mean, std. dev. …) across each domain for each LLD – 2000 features capture the global acoustic properties for each spouse Application Domain 2: Couples Therapy 11 / 55 *slide content credit to Dr. Matthew P. Black Research

  12. What is vocal synchrony? • Definition: • Naturally-spontaneous behavioral matching between dyadic social interactions • Purpose in human interactions • Achieving communication efficiency* – unintentional effort • Communicating interest and engagement* – conscious effort • Psychological significance in theory and practice • Learning and memory in child-parent interactions • Regulating emotion processes* • Precursor to empathy • Mirroring neurons No quantification method is present! Can we do it even when it’s not possible for human perception – no ground truth 12

  13. Unsupervised signal-derived method 13

  14. Verification 14

  15. Study of behavioral codes and vocal synchrony 15

  16. Utilization as features for affect recognition application 16

  17. Utilization as quantitative metrics for clinical analysis via MLM Analysis p <0.001, not significant within-partner husband-to-wife wife-to-husband entrainment entrainment wife demander/husband wife demander/husband withdrawer polarization withdrawer polarization Wife-demander/Husband-withdrawer p <0.01 p <0.001 Clinical Implications between-partner between-partner Behavioral Informatics husband-to-wife wife-to-husband entrainment entrainment wife demander/husband wife demander/husband withdrawer polarization withdrawer polarization 04/11/2013 17

  18. Case Study II Domain: Autism spectrum disorder Specifics: ADOS III Interview session Engineering Task: interaction modeling (atypical prosody quantification) Daniel Bone, Chi-Chun Lee, Matthew Black, Marian Williams, Sungbok Lee, Pat Levitt, and Shrikanth S. Narayanan, “The Psychologist as an Interlocutor in ASD Assessment: Insights from a Study of Spontaneous Prosody”, in: Journal of Speech, Language, and Hearing Research 2014 Feb 11. doi: 10.1044/2014_JSLHR-S-13-0062 18 *slide content credit to Daniel Bone

  19. Autism Spectrum Disorder: ADOS session 19

  20. ADOS – Module 3: behavioral codes 20

  21. • ADOS semi-structured assessment framework • Used to help psychologists diagnose autism (one popular tool) • Subject interacts with a psychologist for ~30-45 minutes • Constrained developmentally-appropriate tasks • 4 modules, depending on expressive language level and age • Module 1 (less than phrase speech): Free play, response to joint attention • Module 2 (some phrase speech): Joint interactive play, bubble play • Module 3 (verbally fluent): Make-believe play, telling a story from a book • Module 4 (verbally fluent adolescents/adults): More interview style • Psychologist rate the child’s socio -communicative skills • e.g., speech abnormalities (intonation/volume/rhythm/rate) • e.g., reciprocal social interaction (unusual eye contact) • Scores on sub-assessments added, and total score is used to diagnose ASD • Psychologists trained to administer ADOS using stringent training protocol 21

  22. Atypical Prosody • Prosody refers to the way in which something is said (rhythm) • Intonation, Volume, Rate, and Voice Quality • Critical role in expressivity and social-affective reciprocity • Variety of abnormalities • Monotonous • Atypical lexical stress and pragmatic prosody • Speaking Rate • “Bizarre” quality to speech • Qualitative descriptions are general and contrasting, “bizarre” "slow, rapid, jerky and irregular in rhythm, odd intonation or inappropriate pitch and stress, markedly flat and toneless, or consistently abnormal volume” -[Lord et al. 2003] 22

  23. USC CARE Corpus • Child-psychologist ADOS interactions • ADOS- Autism Diagnostic Observation Schedule. [Lord et al., 2000] • Multimodal : 2 HD video and 2 far-field microphones (ecological validity) 23

  24. Experimental Setup: Subject Sample • Analysis focused on subjects administered the ADOS Module 3 • Verbally fluent children and young adults • 30 sessions total, 28 appropriate for analysis • Manual transcription and segmentation • Transcription : spoken words, non-verbal communication, and vocalizations • Segmentation : single speaker utterances, temporal markings • Psychologists • Three trained clinical psychologists conducted the ADOS sessions • Each psychologist administered ~9 sessions 24

  25. Experimental Setup: Labels • Coding • 60 minute session/14 subtasks • 28 codes scored by psychologist that is interacting with child • Not all codes used • Code of Interest – Speech Abnormalities Associated with Autism • Scored on an integer scale from ‘0’ (appropriate) to ‘2’ (clearly abnormal) • Code of Interest-ADOS Totals • ADOS totals relate to ‘Severity’ of autism spectrum disorder • Three total codes: Communication, Social Interaction, and C.+S.I. • Higher resolution, (min. 0, max. 8-22) • Spearman’s ρ =0.74 (p<10e-6) for Speech Abnormality and C.+S.I. Total 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend