Perceiving Prosody in Sinewave Speech A Sine of the Times Yasmine - - PowerPoint PPT Presentation

▶

Feb 23, 2024 111 likes •359 views

Perceiving Prosody in Sinewave Speech A Sine of the Times Yasmine Sukola and Lissette Vizcarrondo Mentor: J. Nissenbaum Ph.D. NSF Grant No. 1659607 Formants. What are they and why are they so important? Formants are vocal tract resonances that

SLIDE 1

Perceiving Prosody in Sinewave Speech

A Sine of the Times

Yasmine Sukola and Lissette Vizcarrondo Mentor: J. Nissenbaum Ph.D. NSF Grant No. 1659607

SLIDE 2

Formants. What are they and why are they so important?

Formants are vocal tract resonances that represent the phonetic quality of a vowel. Each formant can be identified by a formant number. The formants we will be using are the three main formants F1, F2, and F3.

Image: www.pomaspace.com

SLIDE 3

Harmonics

Harmonics come from the vocal folds.
The lowest harmonic (the fundamental) is what we usually perceive as pitch.
There are multiple harmonics in every sound in nature, only computers can

create a singular harmonic (a sine wave).

SLIDE 4

Fundamental Frequency

The human voice is a complex tone which means it is composed of many

frequencies.

Fundamental frequency is how we perceive pitch.

○ i.e. a male’s voice is generally perceived lower than a female’s voice as this is due to a lower fundamental frequency.

We change our fundamental frequency for many reasons

○ When singing a musical scale ○ When asking a question ○ When a word is given stress for emphasis

SLIDE 5

Sine Wave Speech

Sine wave speech (SWS) is a form of computer generated sound designed to

be a highly abstract representation of speech.

SWS can be described as sounding like “whistles” or “sci-fi” in nature.
SWS can be perceived as speech, even in the absence of ordinary, or natural

acoustic cues such as broadband formants.

SLIDE 6

Why study Sine Wave speech?

Because listeners are able to understand SWS as

speech, it has proven useful as a tool for investigating perceptual primitives of speech.

SWS is important because it shows the bare

minimum the human brain needs in order to detect an acoustic signal as a speech utterance.

SLIDE 7

While sometimes intelligible, SWS contains none of

the information relevant for pitch perception, making it unsuitable for investigating prosody in English.

So, in SWS you may not have trouble distinguishing

what was said, but how it was said.

SWS and Pitch Perception

SLIDE 8

Intonation and Focus

In English, intonation patterns are how speakers adjust the pitch of their voice

to convey meaning. Remember, SWS is harmonically independent, there is no Fundamental Frequency, so prosodic features are not heard in SWS!

Let’s take the phrase: “After I told you not to”

○ Same sentence, but different meanings because of the intonation pattern

○

Intonation can distinguish between a statement and a question.

SLIDE 9

Broad Research Aim

Is it possible to overcome the limitation of SWS

○ Specifically, we want to be able to minimally change the way SWS is produced, in a way that both ■ Preserves the highly abstract character of SWS (useful for studying speech perception), but also ■ Provides a perceptual cue for pitch

SLIDE 10

Modification: Shepard-Risset Tone

Psychoacoustic illusion made from multiple sine waves that rise or fall

in pitch simultaneously. Each sine wave (in turn) drops an octave, and then continues to rise. When played on a continuous loop, listeners perceive an infinitely ascending or descending harmonic tone.

SLIDE 11

Procedure: Step One

SLIDE 12

Procedure: Step Two

SLIDE 13

Procedure: Step Three

SLIDE 14

Procedure: Step Four

SLIDE 15

Our Experiment

What is Question Answer Congruence (QAC)?

In response to a Wh- question, an appropriate answer will

have focus on the corresponding constituent.

SLIDE 16

Question-Answer Congruence Example

Question: Who is doing their homework? Answer 1: Eric is doing his homework. (Incongruent answer is appropriate to question) Answer 2: Eric is doing his homework. (Congruent answer is appropriate to question)

SLIDE 17

Eric is doing his homework.

Step 4 Step 1 Step 3 Step 2

SLIDE 18

Modifying Pitch Contours

Eric Modified Homework Modified

SLIDE 19

Who is doing their homework? cont.

Eric is doing his [homework] F [Eric] F is doing his homework.

Eric is doing his homework. modified

SLIDE 20

Our Experiment: Question-Answer Congruence

Participants will be presented with three different types of

stimulus blocks: natural speech, modified SWS, and unmodified SWS.

Test questions have three possible answers, each

containing a different focus word. One focus word is considered the appropriate focus word for the answer, the

ther would make the answer sound unnatural and

inappropriate, and the third is a completely unrelated answer to the question.

SLIDE 21

Our Experiment: Question-Answer Congruence Continued

After the test question and one of the possible answers

are presented to the participant as an auditory stimulus, the participant is shown a written question: “Is the answer appropriate?”

The available answers presented will be: “Appropriate” or

“Inappropriate”

SLIDE 22

Predicted Results: Question-Answer Congruence

For the recordings of natural speech, it is expected that listeners will

consistently choose “appropriate” or “inappropriate” according to the word that is focused in the answer sentence, signaled by a pitch peak.

For the unmodified SWS condition, listeners are expected to answer

“appropriate” 100% of the time.

If the method for creating SWS is successful, then participants’

perception of focus will be sensitive to the location of intended prosodic prominence.

SLIDE 23

Broader Impact and Future studies

Cochlear implants utilize noise-vocoded speech which can be described as a whispered but extremely distorted speech. While SWS is an abstract form of speech, we hope our research can be implemented in the use of cochlear implants to include pitch perception.

SLIDE 24

References

1. Remez, Robert, and Philip Rubin. 1990. On the perception of speech from time-varying acoustic information: contributions of amplitude variation. Perception and Psychophysics 48.4: 313–325. 2. Remez, Robert, and Philip Rubin. 1984. On the perception of intonation from sinusoidal sentences. Perception and Psychophysics 35.5: 429–440. 3. Remez, Robert, and Philip Rubin. 1993. On the intonation of sinusoidal sentences. Journal of the Acoustical Society of America SR-113, 33–40. 4. Risset, Jean-Claude. 1971. Paradoxes de hauteur: Le concept de hauteur sonore n'est pas la même pour tout le monde. In Proceedings of the Seventh International Congress on Acoustics, S10, 613– 616. 5. Krifka, M. 2006. Association with focus phrases. In Molnar, V. & S. Winkler, eds. The Architecture of Focus, Mouton de Gruyter, Berlin. 105-136.