Voice Capture and Analysis Cody Narber Computer and Information - - PowerPoint PPT Presentation

▶

Feb 12, 2023 379 likes •457 views

Voice Capture and Analysis Cody Narber Computer and Information Science Department Kansas State University Frequency Frequency is a measure of repeating events per unit time. In audio it is the measure of air pulses per second. The main unit

SLIDE 1

Voice Capture and Analysis

Cody Narber

Computer and Information Science Department Kansas State University

SLIDE 2

Frequency

Frequency is a measure of repeating events per unit time. In audio it is the measure

f air pulses per second. The main unit of measurement is Hertz (Hz), which is 1/t,

where t is the period of the wave (shown below). Every signal can be expressed as a sum of sine and cosine

terms. This is known as the

Fourier Theorem and is the basis for the Fourier Transform, which decomposes a signal into these

parts. Efficient algorithms

exist to approximate this decomposition (namely the FFT). Thus we can apply the FFT to an audio signal to extract the frequency terms that comprise the signal.

SLIDE 3

Spectrum

The frequency spectrum is the plotting of the frequency and the corresponding amplitudes that are present in the signal. The amplitude is the height of the peaks in the sinusoidal waves that compose the signal, or the strength of that frequency present. A spectrogram is a plotting of the frequency spectrum at each moment of time (darker areas are higher amplitudes, with the y-axis being frequency, and x-axis being time).

SLIDE 4

Formants

Formants are peaks in the frequency spectrum, or the frequencies that are most prevalent in the signal. Several formants exist in spoken samples and are used for vocal recognition (table below showing the average frequencies that are associated with vowels). These peaks correspond to resonance in sound sources like musical instruments, or anything with sound chambers (for humans this would be the nasal and oral cavity). The fundamental frequency is the first formant (F0) and is the pitch that humans detect. Vowel formant data from Peterson and Barney, 1952

SLIDE 5

Special Frequencies

There are certain frequencies of

sounds that are of special note. The hearing statistics are for healthy young adult. as people age their ability to hear the far end sounds decreases. Average Human Spoken Frequency

Male Female 120 Hz 210 Hz

Average Human Hearing Frequency

Lower High 20 Hz 20,000 Hz

Musical Notes using Equal-Tempered tuning [A4 = 440Hz]

Note Octave=1 Octave=2 Octave=3 Octave=4 Octave=5 Octave=6 A 55 110 220 440 880 1,760 A#/Bb 58 117 233 466 932 1,865 B 62 123 247 494 988 1,976 C 65 131 262 523 1,047 2,093 C#/Db 69 139 277 554 1,109 2,217 D 73 147 294 587 1,175 2,349 D#/Eb 78 156 311 622 1,245 2,489 E 82 165 330 659 1,319 2,637 F 87 175 349 698 1,397 2,794 F#/Gb 92 185 370 740 1,480 2,960 G 98 196 392 784 1,568 3,136 G#/Ab 104 208 415 831 1,661 3,322 A 110 220 440 880 1,760 3,520

SLIDE 6

Voice, Hearing, and Microphones

When speaking the vocal cords vibrate which closes the airway which stops and starts air flow. The air then resonates in the oral and nasal cavities. It is this stop and start of airflow that creates what are known as voiced sounds (ones that use the vocal cords, namely vowels). Latitudinal waves are created by this stopping and starting of airflow. The faster the cords vibrate the closer together the waves and thus higher frequency sounds are produced. Our eardrums pick up these compression/decompression waves by moving back and forth triggering neurons that send impulses to be deciphered our brain. Dynamic Microphones work in the same way, by having a plate that moves in and out, along a

magnet. This movement of wires along the magnet

creates electrical impulses, which is what is saved in the computer. image from http://www.mediacollege.com/audio/microphones/dynamic.html

SLIDE 7

Applications

The purpose of studying voice and it's constructive parts (frequency, energy, formants, etc.) is for the variety of applications that can be explored. Some of these topics have not had much research done, and are topics that are gaining a lot of interest recently with newer and newer technological improvements.

Voice Recognition (has improved a lot in the past couple of years)
Voice Synthesis (using emotion and inflections to make it more realistic)
Voice Emotional Analysis (clinical and wellness applications)
Voice Stress Detection (lie detection, and operator state)
Etc.

The reason voice analysis is becoming more and more popular is because of it's non-invasive data capture (much like that of vision analysis of facial expression).