analysis by synthesis of speech prosody from data to
play

Analysis by Synthesis of Speech Prosody: from Data to Models. - PowerPoint PPT Presentation

Analysis by Synthesis of Speech Prosody: from Data to Models. Daniel Hirst Laboratoire Parole et Langage, CNRS & Universit de Provence, Aix en Provence, France With the past, present and future collaboration of: Caroline Bouzon


  1. Prosodic structure Foot Foot -pec- They ex- -ted his e- -lec- -tion Word Word Word Word Scuola Normale Superiore, Pisa 2009 March 13 06/03/10 ATILF Nancy Daniel Hirst

  2. Prosodic structure ● Narrow rhythm unit (Jassem): sequence of syllables beginning with a stressed syllable and ending at the following word boundary ● Anacrusis (Jassem): sequence of unstressed syllables not included in a narrow rhythm unit. 06/03/10 ATILF Nancy Daniel Hirst

  3. Prosodic structure Foot Foot Ana NRU Ana NRU -dic- They pre- -ted his e- -lec- -tion Word Word Word Word 06/03/10 ATILF Nancy Daniel Hirst

  4. Aix-Marsec database • SEC (Spoken English Corpus) Knowles et al. 1996 • Marsec (Machine Readable SEC) Roach et al. 1993 • Aix-Marsec Auran, Bouzon & Hirst 2004 06/03/10 ATILF Nancy Daniel Hirst

  5. SEC ● 5.5 hours of “authentic” speech ● 53 speakers, c. 55000 words 06/03/10 ATILF Nancy Daniel Hirst

  6. SEC ● 5.5 hours of “authentic” speech ● c. 55000 words, 53 speakers ● Prosodic markup:tonetic stress marks (Knowles & Williams) Scuola Normale Superiore, Pisa 2009 March 13 06/03/10 ATILF Nancy Daniel Hirst

  7. Marsec ● Tonetic stress markup > ASCII (Roach et al.) ● words aligned with signal 06/03/10 ATILF Nancy Daniel Hirst

  8. Aix-Marsec database ● Phonetic transcription ● Phonemes aligned with signal ● Prosodic structure (Praat TextGrids) ● Automatic analysis of intonation (Momel & INTSINT) ● Freely available from the authors 06/03/10 ATILF Nancy Daniel Hirst

  9. TextGrid from Aix-Marsec 06/03/10 ATILF Nancy Daniel Hirst

  10. Hypothesis ● size of whole :: compression of parts If a prosodic constituent is involved in the planning of speech rhythm we should expect the size of the constituent to have a negative effect on the duration of the phonemes which make it up. 06/03/10 ATILF Nancy Daniel Hirst

  11. Method ● Linear correlation and regression – Independent variable: size of constituent (number of phonemes) – Dependent variable: mean lengthening/compression of phonemes (Z score) z i / p = d i / p - m p s p 06/03/10 ATILF Nancy Daniel Hirst

  12. Results - 1 ● Very significant negative correlation of lengthening of phonemes (Z-score) with number of phonemes in – Word – Foot – Narrow Rhythm Unit 06/03/10 ATILF Nancy Daniel Hirst

  13. Results - 2 ● Little or no correlation of lengthening/compression of phonemes (Z-score) with number of phonemes in: – Syllable – Anacrusis 06/03/10 ATILF Nancy Daniel Hirst

  14. Interpretation ● Syllable and anacrusis have little effect on the lengthening of English phonemes ● Word, foot and narrow rhythm unit play significant role (in that order) 06/03/10 ATILF Nancy Daniel Hirst

  15. Prosodic structure Foot Foot Ana NRU Ana NRU -pec- They ex- -ted his e- -lec- -tion Word Word Word Word 06/03/10 ATILF Nancy Daniel Hirst

  16. Results - 3 ● No simple effect of stress !!! 06/03/10 ATILF Nancy Daniel Hirst

  17. Final lengthening 06/03/10 ATILF Nancy Daniel Hirst

  18. Excluding last two phonemes of intonation unit 06/03/10 ATILF Nancy Daniel Hirst

  19. Word-final lengthening? 06/03/10 ATILF Nancy Daniel Hirst

  20. Conclusions ● No compression at level of syllable (cf Jassem et al. 1978) ● Phonemes in stressed syllable have NO specific lengthening (cf Jassem 1952!) ● The solution to Klatt’s unsolved problem is the Narrow Rhythm Unit (for English) (cf Jassem 1952!!!) ● No evidence for specific word-final lengthening 06/03/10 ATILF Nancy Daniel Hirst

  21. Duration of NRU / number of phonemes in NRU 06/03/10 ATILF Nancy Daniel Hirst

  22. mean z-score of phoneme / position in NRU 06/03/10 ATILF Nancy Daniel Hirst

  23. modelling speech melody ● Perception models ● Production models ● Acoustic models 06/03/10 ATILF Nancy Daniel Hirst

  24. Raw f0 06/03/10 ATILF Nancy Daniel Hirst

  25. Raw f0 06/03/10 ATILF Nancy Daniel Hirst

  26. raw f0 06/03/10 ATILF Nancy Daniel Hirst

  27. Raw f0 06/03/10 ATILF Nancy Daniel Hirst

  28. Finnish 06/03/10 ATILF Nancy Daniel Hirst

  29. Kloker 1975 06/03/10 ATILF Nancy Daniel Hirst

  30. Gamma function: y = at b e ct 06/03/10 ATILF Nancy Daniel Hirst

  31. Hirst's law An acoustic model should not depend on which end of the table you are talking about. 06/03/10 ATILF Nancy Daniel Hirst

  32. f0 transition 06/03/10 ATILF Nancy Daniel Hirst

  33. First derivative of raw f0 But who stole Jane's bicycle? (ma'ma'ma...) 06/03/10 ATILF Nancy Daniel Hirst

  34. Quadratic spline function • Spline function ● Sequence of functions of degree n, derivatives of which up to n-1 are everywhere continuous • Quadratic spline ● Sequence of targets linked by two quadratic functions (y = ax 2 + bx +c) 06/03/10 ATILF Nancy Daniel Hirst

  35. Quadratic spline function y =h 1 +(h 2 -h 1 )(x-t 1 ) 2 y =h 2 +(h 1 -h 2 )(x-t 2 ) 2 (t k -t 1 )(t 2 -t 1 ) (t k -t 2 )(t 1 -t 2 ) 06/03/10 ATILF Nancy Daniel Hirst

  36. Quadratic spline function Il faut que je sois à Grenoble, Samedi vers quinze heures 06/03/10 ATILF Nancy Daniel Hirst

  37. Curves vs. straight lines • 't Hart 1991 2 4 2 0 0 2 0 0 5 1 9 5 1 9 5 1 9 0 1 9 0 1 8 5 1 8 5 3 1 8 0 1 8 0 1 7 5 1 7 5 1 7 0 1 7 0 1 1 6 5 1 1 6 5 2 1 6 0 1 6 0 1 5 5 1 5 5 1 5 0 1 5 0 06/03/10 ATILF Nancy Daniel Hirst

  38. Automatic Momel ● Hirst & Espesser 1993 Asymmetric quadratic modal regression • Modal • Quadratic • Asymmetric 06/03/10 ATILF Nancy Daniel Hirst

  39. Mean and Mode mode mean 06/03/10 ATILF Nancy Daniel Hirst

  40. Mean and Mode • Mean value minimising sum of squares of diferences from data • Mode value minimising number of cases more than ∆ from data Generalise to function • Linear regression function minimising sum of squares of diferences from data • Modal regression function minimising number of cases more than ∆ from data 06/03/10 ATILF Nancy Daniel Hirst

  41. Asymmetric regression • no values more than Δ above the function • Minimise number of values more than Δ below it • Here, function is f = at 2 + bt + c 06/03/10 ATILF Nancy Daniel Hirst

  42. Momel ● Hirst & Espesser 1993 06/03/10 ATILF Nancy Daniel Hirst

  43. Evaluation of Momel ● Estelle Campione, 2001 06/03/10 ATILF Nancy Daniel Hirst

  44. Improved algorithm 06/03/10 ATILF Nancy Daniel Hirst

  45. Improved algorithm 06/03/10 ATILF Nancy Daniel Hirst

  46. Momel – theory neutral? ● Theory friendly ● used for – Fujisaki model (Mixdorff) – ToBI (Maghbouleh, Wightman & Cambell, Cho (K-ToBI) – INTSINT 06/03/10 ATILF Nancy Daniel Hirst

  47. INTSINT ● An INternational Transcription System for INTonation ● Based on minimal pitch contrasts in descriptions of intonation patterns ● Used in Hirst & Di Cristo 1998 for 9 different languages – British English, Spanish, European Portuguese, Brazilian Portuguese, French, Romanian, Russian, Moroccan Arabic and Japanese ● Extension for duration and rhythm 06/03/10 ATILF Nancy Daniel Hirst

  48. Basic INTSINT ● Absolute tones T(op) M(id) B(ottom) ● Relative tones H(igher) S(ame) L(ower) ● Iterative relative tones U(pstepped) D(ownstepped) 06/03/10 ATILF Nancy Daniel Hirst

  49. 2 speaker parameters: Hirst 2005 T H S H U k D U S e M y S D L L B range 06/03/10 ATILF Nancy Daniel Hirst

  50. downdrift 2 0 0 1 5 0 1 0 0 5 0 0 M T L H L H L H B 06/03/10 ATILF Nancy Daniel Hirst

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend