Speaker Movement Correlates with Prosodic Indicators of Engagement - PowerPoint PPT Presentation

Speaker Movement Correlates with Prosodic Indicators of Engagement Rob Voigt, Robert J. Podesva, and Dan Jurafsky L INGUISTICS D EPARTMENT S TANFORD U NIVERSITY

Links Between Acoustic and Visual Prosody • Gestural apices align with pitch accents Jannedy and Mendoza-Denton (2006) • Production of “visual beats” increases the prominence of the co-occurring speech Krahmer and Swerts (2007) • Speakers move their head and eyebrows more during prosodically focused words Cvejic et al. (2010)

Question 1: Is the Relationship Between Acoustic and Visual Prosody Continuous? • Previous research • Identified discrete relationships • Our proposal • Examine scalar relationships • Particularly between movement and affective measures of engagement Yu et al. (2004), Mairesse et al. (2007), Gravano et al. (2011), Oertel et al. (2011), MacFarland et al. (2013), etc.

Question 2: Methodological Barriers to Studying Visual Prosody • Prior studies generally employ • Time-intensive annotation schemes or • Expensive or invasive experimental hardware • Thus face limitations • Small amounts of data • Prohibitive expense

Our Solution: New Data Source Automatically extract visual and acoustic data from YouTube • Potentially huge amounts of data • Ecologically valid (“in the wild”) • Allows replicability

Our Solution: New Data Source • We chose “first day of school” video blogs (“ vlogs ”) • 14 videos, 95 minutes of data • Static backgrounds and stable cameras • Generally engaged, animated

Our Solution: Automatic Phrasal Units Approximate pause-bounded units (PBUs) • Our unit of prosodic analysis • Calculated with a simple iterative algorithm • Find silences ( Praat ) with a threshold of -30.0dB; sounding portions are approximate PBUs • If average phrase length > 2 seconds, raise threshold by 3.0dB and re-extract

Our Solution: New Visual Feature Movement Amplitude • Assumes speaker is talking in front of a static background • Quantifies speaker movement as pixel-by-pixel difference between frames • Calculated in log space, z-scored per speaker

Visualization: Continuous Measurements • Video at 30 FPS allows observations at 30 Hz

Visualization: Movement-Only Video • Coarse, but reasonable overall estimation

Acoustic Features • Following prior work on prosodic engagement • Pitch (fundamental frequency) and Intensity (loudness) • Eight features per phrase • max, min, mean, standard deviation (std) for both pitch and intensity

Statistical Analysis • Movement amplitude measures (max, min, mean, std) are highly co-linear • PCA for dimensionality reduction • Two components explain 96% of variance in MA

Statistical Analysis • Series of linear regressions • Predicting acoustic variables from O VERALL M OVEMENT and M OVEMENT V ARIANCE • Controlling for speaker-specific variation by including speakers as random effects • Controlling for log(phrase length)

Experimental Pipeline • Download videos, extract frames and audio • Calculate approximate phrase units (PBUs) • Compute movement amplitude for each frame • Calculate MA principal components • Extract acoustic features • Run statistical models

Results ** is p < 0.01, *** is p < 0.001, — is no significant relationship

Results • During phrases with more O VERALL M OVEMENT , speakers use • higher and more variable pitch • louder and more variable intensity • M OVEMENT V ARIANCE was not predictive of any of our acoustic features

Visualization: Across Phrases • Notice light and dark vertical banding • Suggests sequence modeling as future work

Moving Forward • More advanced vision-based features • Face tracking • Gesture recognition • Expanding the data • Genre effects • Sociolinguistic variables • Movement in interaction

Discussion • Further empirical evidence for a rich link between acoustic and visual prosody • Adds dimension of quantity / continuous association, in addition to previously demonstrated temporal synchrony • Methodological contributions suggest new avenues for multi-modal analysis of prosody • Code and Corpus: nlp.stanford.edu/robvoigt/speechprosody

Speaker Movement Correlates with Prosodic Indicators of Engagement - PowerPoint PPT Presentation

Speaker Movement Correlates with Prosodic Indicators of Engagement Rob Voigt, Robert J. Podesva, and Dan Jurafsky L INGUISTICS D EPARTMENT S TANFORD U NIVERSITY Links Between Acoustic and Visual Prosody Gestural apices align with pitch

Speech Processing 15-492/18-492 Speaker ID Who is speaking? Speaker ID, Speaker Recognition

Individual-Based Correlates of Protection Identification of Protective Level by Looking at

Selected Bibliography for Statistical Methods (and Clinical Papers) for Assessing Correlates of

Module 8: Evaluating Immune Correlates of Protection Instructors: Ivan Chan, Peter Gilbert, Paul

Correlates of Immunity in Vaccinology Benjamin Kagina bm.kagina@uct.ac.za 11 th Nov 2014

Module 8: Evaluating Immune Correlates of Protection Instructors: Ivan Chan, Peter Gilbert, Paul

Prominence-based licensing in head movement and phrasal movement Brian Hsu LSA 2020 Annual

DIFFICULTIES IN CHILDREN Anna Barnett Everyday movement skills Everyday movement skills

Debate: Writing and Presentation Mr. Winand Debate Proposition America is losing its competitive

A New Adaptation Method for Speaker- -Model Model A New Adaptation Method for Speaker Creation

Corrective Exercise Solutions For Movement Dysfunctions Marty Miller, MS, ATC, PES Objectives

Movement Problems Alejandro Flores & Saurabh Kumar Movement Problems Introduction to

1 Introduction Observation: Certain movement operations, like wh-movement, seem to be unbounded.

The Heritage of the Inventor School Inventor School Movement in the GDR Movement in the GDR

UI I Design fr from Local to Global Shao Kun Sp Speaker Na Name Netease GUX Speaker Title

to Mobile Game Sp Speaker Na Name Speaker Title & Company Zhang Yang Netease GUX Session

Scalable, Good, Cheap a tale of sexiness, puppets, shell scripts, and python From this...

MOOSE to PyMOOSE: Interfacing MOOSE with Python Subhasis Ray National Centre for Biological

Another view Hidden Input CEC is constant error Hidden carrousel No vanishing gradients

Port au Port/Bay St. George Fracking Awareness Group. Who are we? Jessica Ernst Port au Port

Overview 1 Course Information Course

Autonomous Drone Tool Lanier Watkins, PhD Chair of Computer Science and Cybersecurity Programs

Distilling Effective Supervision from Severe Label Noise Zizhao Zhang, Han Zhang, Sercan .

Peer-review analysis Comprehensive exam Presentered by : Wenting Xiong Diane Litman Committees:

Sambuz

Useful Links

Newsletter

Mail Us

Speaker Movement Correlates with Prosodic Indicators of Engagement - PowerPoint PPT Presentation

Speaker Movement Correlates with Prosodic Indicators of Engagement Rob Voigt, Robert J. Podesva, and Dan Jurafsky L INGUISTICS D EPARTMENT S TANFORD U NIVERSITY Links Between Acoustic and Visual Prosody Gestural apices align with pitch

Speech Processing 15-492/18-492 Speaker ID Who is speaking? Speaker ID, Speaker Recognition

Individual-Based Correlates of Protection Identification of Protective Level by Looking at

Selected Bibliography for Statistical Methods (and Clinical Papers) for Assessing Correlates of

Module 8: Evaluating Immune Correlates of Protection Instructors: Ivan Chan, Peter Gilbert, Paul

Correlates of Immunity in Vaccinology Benjamin Kagina bm.kagina@uct.ac.za 11 th Nov 2014

Module 8: Evaluating Immune Correlates of Protection Instructors: Ivan Chan, Peter Gilbert, Paul

Prominence-based licensing in head movement and phrasal movement Brian Hsu LSA 2020 Annual

DIFFICULTIES IN CHILDREN Anna Barnett Everyday movement skills Everyday movement skills

Debate: Writing and Presentation Mr. Winand Debate Proposition America is losing its competitive

A New Adaptation Method for Speaker- -Model Model A New Adaptation Method for Speaker Creation

Corrective Exercise Solutions For Movement Dysfunctions Marty Miller, MS, ATC, PES Objectives

Movement Problems Alejandro Flores &amp; Saurabh Kumar Movement Problems Introduction to

1 Introduction Observation: Certain movement operations, like wh-movement, seem to be unbounded.

The Heritage of the Inventor School Inventor School Movement in the GDR Movement in the GDR

UI I Design fr from Local to Global Shao Kun Sp Speaker Na Name Netease GUX Speaker Title

to Mobile Game Sp Speaker Na Name Speaker Title &amp; Company Zhang Yang Netease GUX Session

Scalable, Good, Cheap a tale of sexiness, puppets, shell scripts, and python From this...

MOOSE to PyMOOSE: Interfacing MOOSE with Python Subhasis Ray National Centre for Biological

Another view Hidden Input CEC is constant error Hidden carrousel No vanishing gradients

Port au Port/Bay St. George Fracking Awareness Group. Who are we? Jessica Ernst Port au Port

Overview 1 Course Information Course

Autonomous Drone Tool Lanier Watkins, PhD Chair of Computer Science and Cybersecurity Programs

Distilling Effective Supervision from Severe Label Noise Zizhao Zhang, Han Zhang, Sercan .

Peer-review analysis Comprehensive exam Presentered by : Wenting Xiong Diane Litman Committees:

Sambuz

Useful Links

Newsletter

Mail Us

Movement Problems Alejandro Flores & Saurabh Kumar Movement Problems Introduction to

to Mobile Game Sp Speaker Na Name Speaker Title & Company Zhang Yang Netease GUX Session