Automatic Detection and Classification of Prosodic Events Thesis - - PowerPoint PPT Presentation
Automatic Detection and Classification of Prosodic Events Thesis - - PowerPoint PPT Presentation
Automatic Detection and Classification of Prosodic Events Thesis Proposal Andrew Rosenberg December 12, 2007 Introduction Intonation What v. How Dimensions of Prosodic Variation Speaking Rate Pitch Range Voice
- A. Rosenberg - Thesis Proposal - 12/12/07
Introduction
Intonation
- “What” v. “How”
- Dimensions of Prosodic Variation
- Speaking Rate
- Pitch Range
- Voice Quality
- Loudness
- Accenting*
- Phrasing*
2
- A. Rosenberg - Thesis Proposal - 12/12/07
Introduction
Prosodic Events
- Categorical Phenomena
- Accenting
- Acoustic excursion which makes a word
“prominent” or “stand out” from its surroundings
- Phrasing
- “Perceived disjuncture” between words
3
- A. Rosenberg - Thesis Proposal - 12/12/07
Introduction
Accenting
- Directs the listeners attention to a concept
- Contrast
- Topic
- Information Status
- Example: Eileen is pro-English.
- Expected accenting goes unnoticed
- Unexpected accenting leads to unexpected
meaning
- A: Is Eileen pro-French?
B: Eileen is pro-English.
4
- A. Rosenberg - Thesis Proposal - 12/12/07
Introduction
Phrasing
- Phrasing defines an acoustic unit
- Physiologically necessary
- Communicatively useful
- Attachment Example:
Anna will win Manny.
- Phrase final tones indicate:
- How phrases are composed
- Example:
I need some oregano (H-) and marjoram (H-) and some fresh basil (L-) okay?
- Pragmatic and discourse effect
- Example:
Mariana made the marmalade.
5
- A. Rosenberg - Thesis Proposal - 12/12/07
Introduction
Why Prosodic Events
- Consensus
- Understanding
- Availability
6
- A. Rosenberg - Thesis Proposal - 12/12/07
Introduction
Goals
- Provide prosodic information to SLP
systems
- Develop novel techniques for classification
and detection
- Increase understanding of the acoustic and
lexical influences on the use of prosodic event
7
- A. Rosenberg - Thesis Proposal - 12/12/07
Outline
- Detection of Prosodic Events
- Classification of Prosodic Events
- Applications
8
- A. Rosenberg - Thesis Proposal - 12/12/07
Outline
- Detection of Prosodic Events
- Pitch Accent
- Phrase Boundary
- Integrated Prosodic Event Detection
- Classification of Prosodic Events
- Applications
9
- A. Rosenberg - Thesis Proposal - 12/12/07
Pitch Accent Detection
- Recognition of Acoustic Excursion
- Acoustic Correlates
- Pitch
- Energy*
- Duration
- Previous Approaches
- [Wightman&Ostendorff 1994, Conkie et al. 1999, Sun 2002,
Marsi et al. 2003, Gregory 2004, Ananthakrishnan et al. 2005, Tamburini 2006, Chaolei 2007, Levow 2008, inter alia]
10
- A. Rosenberg - Thesis Proposal - 12/12/07
Pitch Accent Detection
Basic Assumptions
- Unit of Analysis: Syllable vs. Word
- Use of Lexical or Syntactic Information
- Supervised vs. Unsupervised Learning
11
- A. Rosenberg - Thesis Proposal - 12/12/07
Pitch Accent Detection
Experiments
- Feature Representation
- Pitch - min, max, stdev, mean, rms
- Energy - min, max, stdev, mean, rms
- Duration
- Context Normalization of max and mean
- Range and z-score normalization over nine static
context windows
- Speaker Normalization (z-score)
- Naïve Bayes, J48, SVM, Boosting, Bagging, Dagging*
12
- A. Rosenberg - Thesis Proposal - 12/12/07
Pitch Accent Detection
Results
13
75.0 77.5 80.0 82.5 85.0 87.5 90.0 BDC-spon BDC-read BU-RNC TDT
- 4
Naïve Bayes J48 Boosting Bagging SVM
Human Agreement
- A. Rosenberg - Thesis Proposal - 12/12/07
Pitch Accent Detection
Spectral Analysis
- Spectral Balance
- [Sluijter & Van Heuven 1996 1997, Fant 2000,
Heldner 1999]
14
[My name is Randy Keller]
- A. Rosenberg - Thesis Proposal - 12/12/07
Pitch Accent Detection
Spectral Analysis
- Spectral Balance
- [Sluijter & Van Heuven 1996 1997, Fant 2000,
Heldner 1999]
15
[My name is Randy Keller]
- A. Rosenberg - Thesis Proposal - 12/12/07
Pitch Accent Detection
Spectral Analysis
- Examined the predictive power of 210
frequency regions [Rosenberg & Hirschberg 2006]
16
etc.
[My name is Randy Keller]
- A. Rosenberg - Thesis Proposal - 12/12/07
Pitch Accent Detection
Spectral Analysis Findings
- There is significant difference in the
predictive power of energy information in frequency regions (14.8%)
- >99.9% of data points are correctly
classified by at least one classifier
- Majority voting leads to ~81.8% correct
classification using only energy features
- Worse than SVM, but better than J48 and
Boosting detection
17
- A. Rosenberg - Thesis Proposal - 12/12/07
Pitch Accent Detection
Correcting Classifier
- Can pitch and duration information be
combined with these results to improve pitch accent detection accuracy? [Rosenberg & Hirschberg 2007]
- For each of 210 energy-based classifiers,
train a second pitch and duration based classifier to correct the predictions of the energy classifiers
18
- A. Rosenberg - Thesis Proposal - 12/12/07
Pitch Accent Detection
Correcting Classifier Diagram
19
∑
Energy Classifiers Correctors Aggregator Filters ... ...
- A. Rosenberg - Thesis Proposal - 12/12/07
Pitch Accent Detection
Correcting Classifier Results
20
75.0 77.5 80.0 82.5 85.0 87.5 90.0 BDC-spon BDC-read TDT
- 4
Boosting Bagging SVM Energy Voting Corrected Voting
Human Agreement
- A. Rosenberg - Thesis Proposal - 12/12/07
Pitch Accent Detection
Proposed Work
- Define Word Boundaries using ASR
Transcripts
- Inclusion of Syntactic Features:
- 1. Extend the Feature Vector
- 2. Syntactic-Class-Dependent Modeling
- Penn Treebank, Collapsed Classes, Function v. Content
- 3. Model Combination
21
- A. Rosenberg - Thesis Proposal - 12/12/07
Outline
- Detection of Prosodic Events
- Pitch Accent
- Phrase Boundary
- Integrated Prosodic Event Detection
- Classification of Prosodic Events
- Applications
22
- A. Rosenberg - Thesis Proposal - 12/12/07
Phrase Boundary Detection
- “Perceived Disjuncture”
- Intermediate v. Intonational phrases
- Acoustic Features
- Silence
- Pre-boundary Lengthening*
- The final syllable in a phrase has increased duration
- Declination Line Reset
- Pitch and intensity decrease over the duration of a
phrase
23
- A. Rosenberg - Thesis Proposal - 12/12/07
Phrase Boundary Detection
Experiments
- Reuse the feature vector from pitch accent
detection experiments
- Include Pitch and Energy Reset Features
- Classify word boundaries as intonational
and intermediate phrase boundaries
- Naïve Bayes, J48, SVM*
24
- A. Rosenberg - Thesis Proposal - 12/12/07
10 20 30 40 50 60 70 80 90 100 Baseline Accuracy Difference
Phrase Boundary Detection
SVM Results - Full Intonational Phrases
25
BDC-read BDC-spon BU-RNC TDT
- 4
Communicator IBM TTS Trains
- A. Rosenberg - Thesis Proposal - 12/12/07
Phrase Boundary Detection
SVM Results - Full Intonational Phrases
26
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
F-Measure
Baseline* BDC-read BDC-spon BU-RNC TDT
- 4
Communicator IBM TTS Trains
- A. Rosenberg - Thesis Proposal - 12/12/07
Phrase Boundary Detection
Proposed Work
- Inclusion of Lexical Features
- Similar to pitch accent inclusion approaches
- Pre-boundary lengthening
- Requires syllable information
- Forced aligned from manual word boundaries
- ASR phone hypothesis
27
- A. Rosenberg - Thesis Proposal - 12/12/07
Outline
- Detection of Prosodic Events
- Pitch Accent
- Phrase Boundary
- Integrated Prosodic Event Detection
- Classification of Prosodic Events
- Applications
28
- A. Rosenberg - Thesis Proposal - 12/12/07
Integrated Prosodic Event Detection
- Pitch accents can improve phrase boundary
detection
[Wang & Hirschberg 1992]
- Hypothesis: Phrase boundaries can improve
pitch accent detection
- Accents “stand out” from context.
- Phrase boundaries define acoustic context.
29
- A. Rosenberg - Thesis Proposal - 12/12/07
Integrated Prosodic Event Detection
Proposed Approaches
- Simultaneous Detection
- 4-way classification {acc, non}x{phrase, non}
- Preliminary results show improved performance
- n pitch accent and phrase boundary on some
corpora
- Iterative Detection
- Detect pitch accents. Use these to detect phrase
- boundaries. Use these to detect accent. Repeat
- Classifier Fusion
- Dynamic Bayesian Model
30
- A. Rosenberg - Thesis Proposal - 12/12/07
Outline
- Detection of Prosodic Events
- Classification of Prosodic Events
- Pitch Accent Type
- Phrase-final Tone
- Applications
31
- A. Rosenberg - Thesis Proposal - 12/12/07
Outline
- Detection of Prosodic Events
- Classification of Prosodic Events
- Pitch Accent Type
- Phrase-final Tone
- Applications
32
- A. Rosenberg - Thesis Proposal - 12/12/07
Prosodic Event Categorization
Accent Types and Phrase-final Tones
- Intonation can be described by sequence of low (L) and
high (H) tones [Pierrehumbert 1980, Silverman 1992]
- Accents: L*, H*
- Complex tones: L+H*, L*+H, H+!H*
- Intermediate Phrase-final tones
(Phrase Accents):
- Phrase Accents: L-, H-
- Intonational Phrase-final tones
(Phrase Accent + Boundary Tone):
- L-L%, L-H%, H-L%, H-H%
33
- A. Rosenberg - Thesis Proposal - 12/12/07
Prosodic Event Classification
Communicative Uses
- Spoken Dialog Systems
- Cue-Phrases
- Backchannels are more likely to have H-H% than L-L%
- Turn Taking
- H-H% and L-L% more likely to cede the turn
- Discourse Analysis
- Segmentation
- H- “forward looking”
- Speech Act Identification
- H-H% interrogative, L-L% declarative
- Information Status
- H* “new”, L* “given”, !H* “inferable”
- Contrast detection
- L+H* indicative of contrast
34
- A. Rosenberg - Thesis Proposal - 12/12/07
Prosodic Event Classification
Preliminary Experiments
- Superset of features used previously, including
extrema location features to capture contour shape
- Naïve Bayes < J48 < SVM
- Pitch accent type experiments did not improve
- ver baseline
- Phrase-final tone classification shows some
genre bias
- Higher performance on spontaneous speech
- Error and feature analysis is necessary
35
- A. Rosenberg - Thesis Proposal - 12/12/07
Prosodic Event Classification
Proposed Work
- Contour Shape Representation
- TILT parameters
- Piecewise fit coefficients
- Syllable-level analysis
- Pitch accents are realized on lexically stressed
- syllables. Word-level feature extraction
introduces noise.
- Modeling Speaker Differences
- Speaker clustering, and model selection
36
- A. Rosenberg - Thesis Proposal - 12/12/07
Outline
- Detection of Prosodic Events
- Classification of Prosodic Events
- Applications
- Non-native Intonation
- Prosody Tutoring
- Accent Identification
- Speech Synthesis
- Extractive Speech Summarization
- Story Segmentation
37
- A. Rosenberg - Thesis Proposal - 12/12/07
Outline
- Detection of Prosodic Events
- Classification of Prosodic Events
- Applications
- Non-native Intonation
- Prosody Tutoring
- Accent Identification
- Speech Synthesis
- Extractive Speech Summarization
- Story Segmentation
38
- A. Rosenberg - Thesis Proposal - 12/12/07
Applications
Non-native Intonation
- Foreign accented speech is harder for
machines and humans to comprehend
- Due to both segmental and prosodic
differences
- Use prosodic event detection and
classification:
- Help machines be more robust to accented
speech
- Tutor human non-native speakers
39
- A. Rosenberg - Thesis Proposal - 12/12/07
Applications: Non-native Intonation
Goals
- Distinguish non-native from native prosodic
event production
- Distinguish non-native from native prosodic
event placement
40
- A. Rosenberg - Thesis Proposal - 12/12/07
Applications: Non-native Intonation
Data Collection
- Native Mandarin English speakers read transcripts
- Collect > 30 mins with > 4 speakers
- Train accent and phrase detectors and classifiers
- n non-native speech
- Production - compare fit with native v. non-native
detection models
- Placement - control for lexical influences and
compare “appropriate” event locations
- BU-RNC contains four native English speakers reading
identical transcripts
41
- A. Rosenberg - Thesis Proposal - 12/12/07
Applications: Non-native Intonation
Prosody Tutoring System
42
Read prompts Event Detection Placement Production Feedback Feedback
The quick brown fox jumped... The quick brown fox jumped...
*
|
The quick brown fox jumped...
* |
The quick brown fox jumped...
* | *
- A. Rosenberg - Thesis Proposal - 12/12/07
Applications: Non-native Intonation
Accent Identification
- ASR performance is significantly worse on
foreign accented speech
- Detect a speaker’s accent
- Select an acoustic model trained on accented
speech
- Reuse of placement and production
assessment components
- Evaluate a full utterance as “native” or
“non-native”
43
- A. Rosenberg - Thesis Proposal - 12/12/07
Outline
- Detection of Prosodic Events
- Classification of Prosodic Events
- Applications
- Non-native Intonation
- Speech Synthesis
- Story Segmentation
- Extractive Speech Summarization
44
- A. Rosenberg - Thesis Proposal - 12/12/07
Applications: Speech Synthesis
Goals
- Add accent placement control to the IBM
speech synthesizer
- Unit selection synthesis
- Stitch sub-phone units together to generate
speech
- Technique:
- Detect accent bearing units in the selection
corpus
- Select accented units when accent is requested
45
- A. Rosenberg - Thesis Proposal - 12/12/07
Applications: Speech Synthesis
Evaluation
- Is the requested accenting synthesized?
- Does this give the synthesizer the ability to
produce unconventional intonation?
46
- A. Rosenberg - Thesis Proposal - 12/12/07
Outline
- Detection of Prosodic Events
- Classification of Prosodic Events
- Applications
- Non-native Intonation
- Speech Synthesis
- Story Segmentation
- Extractive Speech Summarization
47
- A. Rosenberg - Thesis Proposal - 12/12/07
Applications
Story Segmentation
- NLP systems (IE, IR, summarization,
sentiment analysis) expect semantically homogenous input.
- Broadcast news shows are typically
comprised of many unrelated stories
- Task: Identify boundaries between stories
- [Rosenberg, Hirschberg, Sharifi 2007]
48
- A. Rosenberg - Thesis Proposal - 12/12/07
Applications: Story Segmentation
Example
The United States finished at the top with a total of ninety seven medals thirty nine of them gold. Russian China and Australia rounded up its four. Andrea run economy seemed to champion style welcome home even though she was stripped of her individual gold medal at the sydney olympics. In Armenian gymnast tested positive for a banned stimulant that was in a nonprescription cold medicine she took. From any as government is honoring her with its own gold medal inscribed everlasting
- lympic champion. The international olympic committee did
allow run a con to keep her team gold medal and the silver medal she won in the vote compass. A spokeswoman says Republican Senator Strom Thurmond is going very well after falling ill
- saturday. He spent the night it will to read army medical center in
washington
49
- A. Rosenberg - Thesis Proposal - 12/12/07
Applications: Story Segmentation
Example
50
the united states finished at the top with a total of ninety seven medals thirty nine of them gold russian china and australia rounded up its four andrea run economy seemed to champion style welcome home even though she was stripped of her individual gold medal at the sydney olympics in armenian gymnast tested positive for a banned stimulant that was in a nonprescription cold medicine she took from any as government is honoring her with its own gold medal inscribed everlasting olympic champion the international
- lympic committee did allow run a con to keep her team gold
medal and the silver medal she won in the vote compass senator strom thurmond is going very well after falling ill saturday he spent the night it will to read army medical center in washington
- A. Rosenberg - Thesis Proposal - 12/12/07
Applications: Story Segmentation
Experiments and Results
- Varied the candidate boundaries
- Hypothesized Sentence Boundaries
- ASR Word Boundaries
- Hypothesized Intonational Phrases
- Better candidates than hypothesized sentences
- Note: the model was trained on English
- 250ms* and 500ms pause-based chunks
- Ran segmentation experiments on Arabic,
English and Mandarin BN
51
- A. Rosenberg - Thesis Proposal - 12/12/07
Applications: Story Segmentation
Proposed Work
- Inclusion of accent rate, type and phrase final
tone hypotheses
- Story initial segments should show increased
accenting, and “less final” phrase-final tones
- Evaluate the impact of collapsing types
- Compare the contributions of continuous and
categorical features
- Incorporation of accent location to lexical
features
- Topic and discourse-new words are often accented
52
- A. Rosenberg - Thesis Proposal - 12/12/07
Outline
- Detection of Prosodic Events
- Classification of Prosodic Events
- Applications
- Non-native Intonation
- Speech Synthesis
- Story Segmentation
- Extractive Speech Summarization
53
- A. Rosenberg - Thesis Proposal - 12/12/07
Applications
Extractive Speech Summarization
- Summarize Broadcast News Stories
- Identify salient units.
54
GOOD EVENING EVERYONE THE FEDERAL RESERVE HAS DONE IT AGAIN LOWER INTEREST RATES BY HALF A PERCENT FOR THE SECOND TIME IN A MONTH DONE AGAIN WHAT IT THINKS IT CAN DO AT THE MOMENT TO GIVE THE ECONOMY A JOE THE FED SEES A WEAKNESS IN THE ECONOMY NOW AND IT WORRIES ABOUT IT CONTINUING GOOD EVENING EVERYONE THE FEDERAL RESERVE HAS DONE IT AGAIN LOWER INTEREST RATES BY HALF A PERCENT FOR THE SECOND TIME IN A MONTH DONE AGAIN WHAT IT THINKS IT CAN DO AT THE MOMENT TO GIVE THE ECONOMY A JOE THE FED SEES A WEAKNESS IN THE ECONOMY NOW AND IT WORRIES ABOUT IT CONTINUING
- A. Rosenberg - Thesis Proposal - 12/12/07
Applications: Extractive Speech Summarization
Experiments
- Evaluated the extraction of sentences,
intonational phrases, and pause-based chunks
- Automatic sentence boundary hypotheses
- Decision-tree based IP hypotheses
- 250ms and 500ms pause chunking
- Bayesian Network summarizer
- Only acoustic and structural features
[Maskey and Hirschberg 2006]
55
- A. Rosenberg - Thesis Proposal - 12/12/07
Applications: Extractive Speech Summarization
Results
56
0.1 0.2 0.3 0.4 0.5 0.6 F-Measure ROUGE-1 ROUGE-2 ROUGE-L
Sentences 250ms Pause 500ms Pause Intonational Phrases
- A. Rosenberg - Thesis Proposal - 12/12/07
Applications: Extractive Speech Summarization
Proposed Work
- Include event location and type into
acoustic feature vector
- Compare contributions of categorical and
continuous prosodic information
- Evaluate impact of collapsing type categories
57
- A. Rosenberg - Thesis Proposal - 12/12/07
Summary
- Detection
- High pitch accent detection accuracy
- Phrase boundary detection can be improved
- Classification
- Modest phrase-final tone accuracy
- Poor pitch accent type performance
- Applications
- Use of phrase boundary detection as a tool to
segment speech
58
- A. Rosenberg - Thesis Proposal - 12/12/07
Contributions
- Techniques to extract prosodic event
information from speech.
- Applications to spoken language processing
and understanding tasks.
- Feature analysis to advance understanding
- f the acoustic correlates to prosodic
events
59
- A. Rosenberg - Thesis Proposal - 12/12/07
Timeline
60
Jan 2008 Collect Non-Native Data Feb 2008 Annotate Non-native Data Lexical Experiments Mar 2008 Non-native Analysis Apr 2008 Accent Identification May 2008 Develop Prosody Tutoring System June 2008 Phrase Boundary Detection July 2008 Summarization and Segmentation Aug 2008 Integrated Prosodic Event Detection Sep 2008 Event Classification Oct 2008 Event Classification Nov 2008 Write Thesis Jan 2009 Prepare Defense Feb 2009 Thesis Defense
Thank you
and...
Julia Hirschberg, Dan Ellis, Kathy McKeown, Fadi Biadsy, Frank Enos, Agustín Gravano, Sameer Maskey, Stefan Benus, Martin Jansche, Mehrbod Sharifi, Rachelle Bergstein
- A. Rosenberg - Thesis Proposal - 12/12/07
Integrated Prosodic Event Detection
Classifier Fusion Diagram
- Coupled HMM
62
Pitch Accent Predictions Phrase Boundary Predictions Phrase Boundary Outputs Pitch Accent Outputs
- A. Rosenberg - Thesis Proposal - 12/12/07
Corpora
63
Genre Number of Speakers Length (mins) Length (words) BDC-spon Spontaneous 4 60 11,627 BDC-read Non-professional Read 4 50 10,822 BU-RNC Professional Read 6 141 23,830 IBM TTS Professional Read 1 131 21,196 Communicator Professional Read unk. 67 12,183 Trains Spontaneous 12 18.5 2,581 Games Spontaneous 13 362 73,837 TDT
- 4
Professional Read 30* 30 3,326 ETS Spontaneous 34 168 32,316