The Future of Prosody
It’s about Time
Dafydd Gibbon
Bielefeld University Jinan University Speech Prosody 9, Poznań, 13 June 2018
The Future of Prosody Its about Time Dafydd Gibbon Bielefeld - - PowerPoint PPT Presentation
The Future of Prosody Its about Time Dafydd Gibbon Bielefeld University Jinan University Speech Prosody 9, Pozna, 13 June 2018 TIME The Future of Prosody relevant topics? Ethics of research responsibility for the use of our
Dafydd Gibbon
Bielefeld University Jinan University Speech Prosody 9, Poznań, 13 June 2018
SP9, Poznań, 13 June 2018
2
TIME
The Future of Prosody – relevant topics?
Ethics of research
– big data analytics, deep learning, natural speech – mass collection of personal communication habits, surveillance
Time
– more structure / pattern oriented, less function oriented
SP9, Poznań, 13 June 2018
3
Time as a core concept Time functions, trajectories YARD (Yet Another Rhythm Discussion)
Metatheoretical framework for time discussion Discussion of different paradigms Challenges for the future Evolution TIME
SP9, Poznań, 13 June 2018
4
Strawson’s acoustic Gedankenexperiment:
spaceless world with a time dimension only?
Conclusion:
Following this basic ontology:
(though in practice it is often convenient to forget this)
Strawson, Peter F. Individuals. An Essay in Descriptive Metaphysics. London: Methuen. 1959.
Time Types Major Time Domains Process Time Time Patterns
SP9, Poznań, 13 June 2018
5
In memoriam This work is dedicated to the memory of Wiktor Jassem, emeritus
prosodist, pioneer in spectral analysis and speech synthesis, and authority on Polish phonetics and phonology, mentor and long-time friend, with whom I discussed the seeds of many ideas reflected in this presentation over some 30 years, in particular the use of the difference spectra discussed in the present talk. I would especially like to remember Grzegorz Dogil, formerly of Lublin, Poznań, Bielefeld and Stuttgart, whom many of you have known personally. Greg passed away much too soon six months
including points related to the content of this address, was a source of inspiration for us, and has inspired many more phoneticians since that time.
SP9, Poznań, 13 June 2018
6
Acknowledgments
to the strong tradition of logical and computational phonetic and phonological research in Poland, particularly in Poznań:
and rhythm
In many publications
Batóg, Tadeusz. The Axiomatic Method in Phonology. London: Routledge and Kegan Paul, 1967.
Steffen-Batóg, Maria. The problem of automatic phonemic transcription of written Polish. Biuletyn Fonograficzny. 14, pp. 75–86, 1973.
And to Batóg & Steffen-Batóg on formalizing phonetic distance
Steffen-Batóg, Maria and Tadeusz Batóg. A distance function in phonetics. Lingua Posnaniensis, XXIII, 47–58. 1980.
Special thanks for valuable hints, comments, suggestions and data:
Petra Wagner, Plinio Barbosa, Rosemarie Tracy, Alexandra Gibbon Yu Jue, Liang Jie, Liu Huangmei, Chen Wenjun (Shanghai) Lin Xuewei, Li Peng, He Linfang, Feng Baoyin, Bi Dan (Guangzhou)
SP9, Poznań, 13 June 2018
7
Thinking outside the box:
– different methods and method combinations – cooperation with other disciplines
– computational perspectives
– ((exploration* confirmation*)* standardization*)* cycles – exploratory rather than confirmatory research
analogy, ... TIME
SP9, Poznań, 13 June 2018
8
Alternative Discourse Prosody
SP9, Poznań, 13 June 2018
9
Time Types Major Time Domains Processing Time Time Stamps: Annotation Mining Time Stamps: 1D Isochrony Time Stamps: 2D Relations Time Patterns: static & dynamic Time Stamps: 3D Time Trees AM: Multiple Oscillators, Production emulation Rhythm is AM & FM Spectral Zones AM: Spectral Zones, Perception emulation FM: Discourse Modulation
Time: Types, Domains, Processes, Patterns, Stamps, Modulations Heuristic methods of prosodic pattern analysis:
The background:
Explanatory methods of prosodic pattern analysis:
SP9, Poznań, 13 June 2018
10
Time Types Major Time Domains Processing Time Time Stamps: Annotation Mining Time Stamps: 1D Isochrony Time Stamps: 2D Relations Time Patterns: static & dynamic Time Stamps: 3D Time Trees AM: Multiple Oscillators, Production emulation Rhythm is AM & FM Spectral Zones AM: Spectral Zones, Perception emulation FM: Discourse Modulation
Time: Types, Domains, Processes, Patterns, Stamps, Modulations Heuristic methods of prosodic pattern analysis:
Explanatory methods of prosodic pattern analysis:
SP9, Poznań, 13 June 2018
11
Time Types Major Time Domains Processing Time Time Stamps: Annotation Mining Time Stamps: 1D Isochrony Time Stamps: 2D Relations Time Patterns: static & dynamic Time Stamps: 3D Time Trees AM: Multiple Oscillators, Production emulation Rhythm is AM & FM Spectral Zones AM: Spectral Zones, Perception emulation FM: Discourse Modulation
Time: Types, Domains, Processes, Patterns, Stamps, Modulations Explanatory methods of prosodic pattern analysis:
SP9, Poznań, 13 June 2018
12
Time: Types, Domains, Processes, Patterns, Stamps, Modulations
Time Types Major Time Domains Processing Time Time Stamps: Annotation Mining Time Stamps: 1D Isochrony Time Stamps: 2D Relations Time Patterns: static & dynamic Time Stamps: 3D Time Trees AM: Multiple Oscillators, Production emulation Rhythm is AM & FM Spectral Zones AM: Spectral Zones, Perception emulation FM: Discourse Modulation
SP9, Poznań, 13 June 2018
13
Time and Prosody: overview Events vs. objects Five Major Time Epochs Four Time Types:
syntagmatic
time stamps
real time Two kinds of processing time:
Time Types Major Time Domains Processing Time Time Patterns: static & dynamic
SP9, Poznań, 13 June 2018
14
Categorial Time (paradigmatic relations) ‘Rubber’ Time (syntagmatic relations) Various phonologies, but most explicitly Event Phonology
Time Types Major Time Domains Process Time Time Patterns
Thanks to Andras Kornai, for the concept ‘Rubber Time’
SP9, Poznań, 13 June 2018
15
Categorial Time (paradigmatic relations) ‘Rubber’ Time (syntagmatic relations) Clock Time Cloud Time Various phonologies, but most explicitly Event Phonology Speech Technology ‘front ends’:
Thanks to Andras Kornai, for the concepts ‘Rubber Time’ and ‘Clock Time’
Time Types Major Time Domains Process Time Time Patterns
SP9, Poznań, 13 June 2018
16
Categorial Time (paradigmatic relations) ‘Rubber’ Time (syntagmatic relations) Clock Time Cloud Time Various phonologies, but most explicitly Event Phonology Speech Technology ‘front ends’:
Word and sentence recognition systems, Text-to- Speech systems Thanks to Andras Kornai, for the concepts ‘Rubber Time’ and ‘Clock Time’
Time Types Major Time Domains Process Time Time Patterns
SP9, Poznań, 13 June 2018
17
Categorial Time (paradigmatic relations) ‘Rubber’ Time (syntagmatic relations) Clock Time Cloud Time Various phonologies, but most explicitly Event Phonology Speech Technology ‘front ends’:
Time Map Phonology Word and sentence recognition systems, Text-to- Speech systems Thanks to Andras Kornai, for the concepts ‘Rubber Time’ and ‘Clock Time’
Time Types Major Time Domains Process Time Time Patterns
SP9, Poznań, 13 June 2018
18
Categorial Time (paradigmatic relations) ‘Rubber’ Time (syntagmatic relations) Clock Time Cloud Time Various phonologies, but most explicitly Event Phonology Speech Technology ‘front ends’:
Time Map Phonology Word and sentence recognition systems, Text-to- Speech systems Thanks to Andras Kornai, for the concepts ‘Rubber Time’ and ‘Clock Time’
Time Types Major Time Domains Process Time Time Patterns
Categorial Time and ‘Rubber’ Time are, strictly speaking, metaphorical terms, and actually refer to abstract paradigmatic and syntagmatic structural relations. Time in the strict senses of Clock time and Cloud Time is not within the domain of phonology. (cf. Zhang’s critique in the Proceedings)
SP9, Poznań, 13 June 2018
19
1.Utterance in discourse:
– Milliseconds Micromotor activity: speech sounds – Seconds, minutes: Prosody
2.Individual language development:
– Years: Acquisition and learning
3.Social language change
– Pragmatic effects of language and speech ‘influencers’
4.Historical language & culture change - ‘dreamtime’
– Millennia: typological change, loss of inter-comprehensibility
5.Evolution:
– Multimillennia: differentiation of species communication
Time Types Major Time Domains Process Time Time Patterns
SP9, Poznań, 13 June 2018
20
Processing time: a remark on recursion from a computational linguistic point of view In the many discussions of recursion over the past 20 years or so, a crucial distinction which affects processing time has been neglected:
– linear recursion: left & right branching (computationally
equivalent to iteration), iteration, with finite working memory and linear processing time (a function of the length of the input)
– non-linear
recursion: centre-embedding, cross-serial dependencies with unrestricted memory and at least quadratic processing time
Time Types Major Time Domains Process Time Time Patterns
SP9, Poznań, 13 June 2018
21
Processing time: a remark on recursion from a computational linguistic point of view Food for thought:
– Arbitrary finite depth hierarchies cf. Phonological Hierarchy,
syllable phonotactics) also have linear processing time: realistic for speech
– Linear (right or left) recursion conditions: realistic for speech – Non-linear (centre-embedding) recursion: unrealistic for
speech
without a finite depth condition – though extra depth may be
made available through time and memory enhancement by means
Time Types Major Time Domains Process Time Time Patterns
SP9, Poznań, 13 June 2018
22
Categorial Time (paradigmatic relations) ‘Rubber’ Time (syntagmatic relations) Clock Time Cloud Time Various phonologies, but most explicitly Event Phonology Speech Technology ‘front ends’:
Time Map Phonology Word and sentence recognition systems, Text-to- Speech systems Thanks to Andras Kornai, for the concepts ‘Rubber Time’ and ‘Clock Time’
Time Types Major Time Domains Process Time Time Patterns
Time and Prosody: summary
syntagmatic
time stamps
real time
SP9, Poznań, 13 June 2018
23
Time Stamps: Annotation Mining Time Stamps: 1D Isochrony Time Stamps: 2D Relations Time Stamps: 3D Time Trees Time Stamps: Annotation Mining Time Stamps: 1D Isochrony Time Stamps: 2D Relations Time Stamps: 3D Time Trees
Annotation with time stamps: overview
similarity + isochrony + alternation
SP9, Poznań, 13 June 2018
24
Signal Annotation:
Data Repository DSP Hardware Software
Manual calculation LOcalc Excel SPSS Stata MatLab R Python Praat Speech engineering software development
Time Stamps
Time Stamps
Annotation Mining
Time Stamps
1D Duration Dispersion Isochrony
Time Stamps
2D Duration Dispersion Scatter Plots
Time Stamps
3D Duration Dispersion Time Trees
Analysis
SP9, Poznań, 13 June 2018
25
1-dimensional time-stamp duration analysis:
sequences (Var, PIM, PFD) – no compensation from tempo change pairs (PVI) – abstracts away from tempo change
Time Stamps
Annotation Mining
Time Stamps
1D Duration Dispersion Isochrony
Time Stamps
2D Duration Dispersion Scatter Plots
Time Stamps
3D Duration Dispersion Time Trees
SP9, Poznań, 13 June 2018
26
Wagner, Petra (2007). “Visualizing levels of rhythmic organisation.” Proc. International Congress of Phonetic Sciences, Saarbrücken 2007, pp. 1113-1116, 2007
2-dimensional time-stamp duration analysis:
Mandarin: means scattered relatively evenly around the centre English: e.g. count(short-short) > count(long-long)
Time Stamps
Annotation Mining
Time Stamps
1D Duration Dispersion Isochrony
Time Stamps
2D Duration Dispersion Scatter Plots
Time Stamps
3D Duration Dispersion Time Trees
LONG- LONG LONG- SHORT SHORT- SHORT SHORT- LONG
SP9, Poznań, 13 June 2018
27
Wagner, Petra (2007). “Visualizing levels of rhythmic organisation.” Proc. International Congress of Phonetic Sciences, Saarbrücken 2007, pp. 1113-1116, 2007
2-dimensional time-stamp duration analysis:
Mandarin: means scattered relatively evenly around the centre English: e.g. count(short-short) > count(long-long)
Time Stamps
Annotation Mining
Time Stamps
1D Duration Dispersion Isochrony
Time Stamps
2D Duration Dispersion Scatter Plots
Time Stamps
3D Duration Dispersion Time Trees
Mandarin Even clustering around the mean English Highly skewed: majority of short-short syllable relations, thus NOT BINARY
SP9, Poznań, 13 June 2018
28
Gibbon, Dafydd. 2006. “Time types and time trees: Prosodic mining and alignment of temporally annotated data”. In: Stefan Sudhoff, et al., eds. Methods in Empirical Prosody Research. Berlin: Walter de Gruyter, pp. 281–209, 2006.
3-dimensional time-stamp duration analysis: time-tree induction:
Time Stamps
Annotation Mining
Time Stamps
1D Duration Dispersion Isochrony
Time Stamps
2D Duration Dispersion Scatter Plots
Time Stamps
3D Duration Dispersion Time Trees
SP9, Poznań, 13 June 2018
29
Gibbon, Dafydd. 2006. “Time types and time trees: Prosodic mining and alignment of temporally annotated data”. In: Stefan Sudhoff, et al., eds. Methods in Empirical Prosody Research. Berlin: Walter de Gruyter, pp. 281–209, 2006.
3-dimensional time-stamp duration analysis: time-tree induction:
Time Stamps
Annotation Mining
Time Stamps
1D Duration Dispersion Isochrony
Time Stamps
2D Duration Dispersion Scatter Plots
Time Stamps
3D Duration Dispersion Time Trees
Duration value upward percolation
SP9, Poznań, 13 June 2018
30
Can be thought of as inverses of metrical generation algorithms (Compound and Nuclear Stress Rules) Inductive input-output relation (examples) Iambic (weak-strong) directionality, iNSR: ((miss . 3) (jones . 2) (came . 3) (home . 1)) → (r (w (w miss) (s jones)) (s (w came) (s home))) Trochaic (strong-weak) directionality, iCSR: ((light . 1) (house . 3) (keep . 2) (er . 3)) → ((r (s (s light) (w house)) (w (s keep) (w er))))
Gibbon, Dafydd. 2006. “Time types and time trees: Prosodic mining and alignment of temporally annotated data”. In: Stefan Sudhoff et al., eds. Methods in Empirical Prosody Research. Walter de Gruyter, pp. 281–209, 2006. Time Stamps
Annotation Mining
Time Stamps
1D Duration Dispersion Isochrony
Time Stamps
2D Duration Dispersion Scatter Plots
Time Stamps
3D Duration Dispersion Time Trees
parse trees, root at bottom
SP9, Poznań, 13 June 2018
31
Annotation with time stamps: summary
similarity + isochrony + alternation
Time Stamps: Annotation Mining Time Stamps: 1D Isochrony Time Stamps: 2D Relations Time Stamps: 3D Time Trees Time Stamps: Annotation Mining Time Stamps: 1D Isochrony Time Stamps: 2D Relations Time Stamps: 3D Time Trees
SP9, Poznań, 13 June 2018
32
Help! I can’t see the wood for the trees!
So what is THE rhythm of a language? … Is this the right question?
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
33
Rhythm as iteration
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
34
So what is THE rhythm of a language? … Is this the right question?
There are many rhythms in speech, of in many frequency ranges. Speech rhythms are unstable, ‘fuzzy’ hierarchies. Definitely not ‘quartz timing’. Because of this unstable property, rhythm types are spread
20Hz … 10Hz (50ms … 100ms)
10Hz … 4Hz (100ms … 250ms)
4Hz … 1Hz (250ms … 500ms)
< 1Hz
Can we measure – or a least visualise these? Yes, we can. But we need to think in Hz, not ms.
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
35 AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
Phonological and Phonetic Oscillators: overview
SP9, Poznań, 13 June 2018
36 AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
Phonological ‘oscillators’: overview
SP9, Poznań, 13 June 2018
37
Or as an equivalent regular expression: (( %H|%L ( H*|L*|H*+L|H+L*|L*+H|L+H* )+ H-|L- )+ H%|L% )+
Or as an equivalent right branching regular (type 3) grammar: IP → initb PiA PiA → pa PiA PiA → pa IntP IntP → interb PA IntP → interb IPend IPend → intonb IPend → intonb IP and vocabulary: initb : { H%, L% } interb : { H-, L- } intonb : { H%, L% } pa : { H*, L*, L*+H-, L-+H*, H*+L-, H-+L*, H*+H-} Pierrehumbert’s regular grammar as a finite state transition network
A phonological view of rhythm as iteration
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
38
Empirical overgeneration
1) Accents in a sequence tend to be all H* or all L* 2) Global contours tend to be rising with L* accents, falling with H* accents 3) Global contours may span more than 1 turn
Empirical undergeneration
1) Paratone hierarchy not included 2) No time constraints
A phonological view of rhythm as iteration
Pierrehumbert’s regular grammar as a finite state transition network
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
39
1-tape (1-level) transition network
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
40
2-tape (2-level) transition network
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
41
3-tape (3-level) transition network
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
42
Martin Jansche 1998 Tianjin Mandarin tone sandhi
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
43
Martin Jansche 1998 Tianjin Mandarin tone sandhi
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
Phonological ‘oscillators’: summary
SP9, Poznań, 13 June 2018
44 AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
Phonetic Oscillators: overview
SP9, Poznań, 13 June 2018
45
INFORMATION
AMPLITUDE MODULATION CARRIER FREQUENCY NOISE FREQUENCIES + ✕ FILTER COEFFICIENTS INFORMATION FREQUENCY MODULATION
SPECTRAL ANALYSES in different frequency zones, COORDINATION AMPLITUDE DEMODULATION in different time zones rectification, LP filtering envelope detection FREQUENCY DEMODULATION in different frequency zones pitch tracking, formant tracking
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
46
Diode rectifier in a crystal set
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
47
Selected Work on Amplitude Envelope Demodulation Spectra
[1] Cummins, Fred, Felix Gers and Jürgen Schmidhuber. “Language identification from prosody without explicit features.” Proc. Eurospeech. 1999. [2] He, Lei and Volker Dellwo. “A Praat-Based Algorithm to Extract the Amplitude Envelope and Temporal Fine Structure Using the Hilbert Transform.” In: Proc. Interspeech 2016, San Francisco, pp. 530-534, 2016. [3] Hermansky, Hynek. “History of modulation spectrum in ASR.” Proc. ICASSP 2010. [4] Leong, Victoria and Usha Goswami. “Acoustic-Emergent Phonology in the Amplitude Envelope of Child-Directed Speech.” PLoS One 10(12), 2015. [5] Leong, Victoria, Michael A. Stone, Richard E. Turner, and Usha Goswami. “A role for amplitude modulation phase relationships in speech rhythm perception.” JAcSocAm, 2014. [6] Liss, Julie M., Sue LeGendre, and Andrew J. Lotto. “Discriminating Dysarthria Type From Envelope Modulation Spectra.” Journal of Speech, Language and Hearing Research 53(5):1246–1255, 2010. [7] Ludusan, Bogdan Antonio Origlia, Francesco Cutugno. “On the use of the rhythmogram for automatic syllabic prominence detection.” Proc. Interspeech, pp. 2413-2416, 2011. [8] Ojeda, Ariana, Ratree Wayland, and Andrew Lotto. “Speech rhythm classification using modulation spectra (EMS).” Poster presentation at the 3rd Annual Florida Psycholinguistics Meeting, 21.10.2017, U Florida. 2017. [9] Tilsen Samuel and Keith Johnson. “Low-frequency Fourier analysis of speech rhythm.” Journal of the Acoustical Society of America. 2008; 124(2):EL34–EL39. [PubMed: 18681499] [10] Tilsen, Samuel and Amalia Arvaniti. “Speech rhythm analysis with decomposition of the amplitude envelope: Characterizing rhythmic patterns within and across languages.” The Journal of the Acoustical Society of America 134, p. 628 .2013. [11] Todd, Neil P. McAngus and Guy J. Brown. “A computational model of prosody perception.” Proc. ICSLP 94, pp. 127-130, 1994. [12] Varnet, Léo, Maria Clemencia Ortiz-Barajas, Ramón Guevara Erra, Judit Gervain, and Christian Lorenzi. “A cross-linguistic study of speech modulation spectra.” JAcSocAm 142 (4), 1976–1989, 2017. AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
☛ ☛ ☛
SP9, Poznań, 13 June 2018
48
Amplitude Envelope Modulation Spectrum (AEMS, AMS, EMS) Frequency Zones
Amplitude Envelope Modulation
Amplitude Envelope Demodulation
absolute value of Hilbert transform (or rectification & peak-picking / LP filtering)
Spectral slice (FFT)
Spectral Zone Edge Detection
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
49 AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
50
Rectified modulated signal (light green, top) Signal: 2s, 200×5 Hz AM carrier (light & dark green) Demodula ted FM (‘pitch’) track (red
AM and FM spectra
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
AM and FM spectra as heatmaps Frequency Zone Edge Detection Demodula ted AM envelope (red
SP9, Poznań, 13 June 2018
51 AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
52
English (RP) Edinburgh corpus “The North Wind and the Sun” Beijing Mandarin Yu corpus “bei3 feng1 gen1 tai4 yang2”
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
53
English (RP) Edinburgh corpus “The North Wind and the Sun” Beijing Mandarin Yu corpus “bei3 feng1 gen1 tai4 yang2” Short phrases Short IPUs Paraton e IPUs IPU hierarchy Phrases IPUs
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
1 Hz
SP9, Poznań, 13 June 2018
54
Spectral Frequency Zone Boundaries English Newsreading
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
English Story
SP9, Poznań, 13 June 2018
55
Spectral Frequency Zone Boundaries
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
56
L- strong, > L- strong, < R- strong, > R- strong, <
Frequency Trees: Spectral Zone Hierarchies
English Newsreading
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
57
L- strong, < AEMS Frequency Tree English Newsreading
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
58
L- strong, > L- strong, < R- strong, > R- strong, < English North Wind & Sun
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
Frequency Trees: Spectral Zone Hierarchies
SP9, Poznań, 13 June 2018
59
L- strong, < AEMS Frequency Tree English North Wind & Sun
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
60
L- strong, > L- strong, < R- strong, > R- strong, < Mandarin North Wind & Sun
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
Frequency Trees: Spectral Zone Hierarchies
SP9, Poznań, 13 June 2018
61
L- strong, < AEMS Frequency Tree Mandarin North Wind & Sun
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
62 AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
Next step: Distance analysis of AEMS of 5s adjacent audio clips, English & Mandarin Next but one step: Conventional analysis of AEMS edges in 5 second audio clips, English & Mandarin
SP9, Poznań, 13 June 2018
63 AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
Data:
“The North Wind and the Sun” Male, English: 40s Female, Mandarin: 40s
Method:
Comparison of non-overlapping adjacent 5s audio chunks
– offsets into recording: 0, 5, 10, 15, 20, 25, 30, 35 – AEMS for each chunk – Inter-speaker comparison (AEMS pointwise means, r=0.82) – Comparison by hierarchical similarity / distance
SP9, Poznań, 13 June 2018
64 AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
Similarity criterion: >3 adjacent same-speaker settings Largest per speaker score: (4+3)/16 Largest cluster: 4/16 Phonetic distance between consecutive chunks of AEM spectra Task: compare 7 hierarchical clustering algorithms
related to methods used in stylometry dialectometry typological language classification
SP9, Poznań, 13 June 2018
65 AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
Highest speaker- specific total:
1 Nrst.Pt. (4+3+3)/10
Largest cluster:
5 UPGMC 5/10
SP9, Poznań, 13 June 2018
66 AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
Phonetic Oscillators: summary
SP9, Poznań, 13 June 2018
67 AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
Phonological and Phonetic Oscillators: summary
SP9, Poznań, 13 June 2018
68
AM: Multiple Oscillators, Production emulation Rhythm is AM & FM Spectral Zones AM: Spectral Zones, Perception emulation FM: Discourse Modulation
FM and Discourse Modulation: overview
SP9, Poznań, 13 June 2018
69
People and Signs Denotation, Reference Cloud Time semiotic relation Categorial Time simple and structured forms Modality Interpretation hierarchical patterning Focus Contrast Emphasis
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
Ternary semiotic basis for signs at all ranks
linear patter ns
SP9, Poznań, 13 June 2018
70
Discourse: Monologue, Dialogue Utterance: turn, IPU, ... Sentence, clause, phrase Word: simple, inflected, compound, derived
Rank Interpretation Architecture
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
71
Discourse prosody, Case 1: AM vs. FM spectra If a spectrum can be derived from the AM envelope, why not derive a spectrum from the FM track and see whether they correlate?
Preliminary answer: Yes, they do correlate, but not overwhelmingly strongly, and depending on which subspectra are measured.
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
72
Mandarin, female 30 sec, < 20Hz
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
73
Mandarin, female 30 sec, < 5Hz
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
74
Mandarin, female 30 sec, < 1Hz
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
75
English, male 30 sec, < 20Hz
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
76
English, male 30 sec, < 5Hz
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
77
English, male 30 sec, < 1Hz
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
78
Discourse prosody, Case 2: Accent constraints
Constraint 1:
Pitch accents in the same sequence tend to be of the same type and collocate with specific global contours
Constraint 2:
Pitch accent sequences tend to match the final phrasal accent:
– low rising types tend to be followed by a rising final accent – high rising types tend to be followed by a rising final accent
Constraint 3:
Pitch accent sequence types tend to match information structure and
– low pitch accent sequences tend to be introductory or questioning – high pitch accent sequences tend to be closing or stating
with typologically relevant constraint violations in different languages and dialects
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
79
Answer: falling utterance contour L* sequence, global rise, final rise H* sequence, global fall H* sequence, global fall L* sequence, global fall, final rise Response Continuati
Interview start Question Questi
Cantonese area (Guangzhou) Cantonese area (Guangzhou)
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
80
Discourse prosody, Case 3: Long FM contours
Thesis: in evolution,
– frequency modulation and rhythm came first
Levinson, “Turn-taking in Human Communication – Origins and Implications for Language Processing”, 2015
Note: in infant speech,
– frequency modulation and rhythm also come first
Wermke, Sebastian-Galles
the infant ‘twin-talk’ videos on YouTube ☺
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
81
Answer: falling utterance contour Question+Answer: rising-falling adjacency pair contour
syntagmatic entrainment
Question: rising utterance contour
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
82
Discourse Prosody, Case 4: emotive FM contours
Thesis 1:
In the evolutionary time domain:
emotive modulations came before structural modulations
Thesis 2:
In the beginning was “Wow!” (Or “Aaah!”)
Thesis 3:
Or the wolf whistle (it’s not simply ‘cat-calling’)
Thesis 4:
In any case, other primates wowed, aahed and whistled first – we continued the custom
Is this why in some societies whistling is tabooed?
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
83
哇
Cantonese region (Guangzhou) Wu region (Shanghai) EMOTIVE EXCLAMATIONS
‘Tone 6’ ☺
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
Tone 4
Twin peaks: 2nd formant + pitch
SP9, Poznań, 13 June 2018
84
啊
Cantonese region (Shenzhen)
EMOTIVE EXCLAMATIONS
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
85
TELEGLOSSIA
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
86
Street whistle Cantonese shoolboy Primate coloratura soprano
TELEGLOSSIA
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
87
Street whistle Cantonese shoolboy In fact, it’s a black-handed gibbon
TELEGLOSSIA
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
88
Cantonese street whistles (child, middle school) Black-handed gibbon calls
TELEGLOSSIA
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
SP9, Poznań, 13 June 2018
89
Street whistle Cantonese shoolboy In fact, it’s a black-handed gibbon
TELEGLOSSIA
AM
Multiple Oscillators, production emulation
Rhythm
is
and iteration
AM
Spectral Zones, perception emulation
FM
discourse turns emotion
FM and Discourse Modulation: summary
SP9, Poznań, 13 June 2018
90
… thinking outside the box Summary: Conclusion:
SP9, Poznań, 13 June 2018
91