Voice Capture and Analysis Cody Narber Computer and Information - - PowerPoint PPT Presentation
Voice Capture and Analysis Cody Narber Computer and Information - - PowerPoint PPT Presentation
Voice Capture and Analysis Cody Narber Computer and Information Science Department Kansas State University Frequency Frequency is a measure of repeating events per unit time. In audio it is the measure of air pulses per second. The main unit
Frequency
Frequency is a measure of repeating events per unit time. In audio it is the measure
- f air pulses per second. The main unit of measurement is Hertz (Hz), which is 1/t,
where t is the period of the wave (shown below). Every signal can be expressed as a sum of sine and cosine
- terms. This is known as the
Fourier Theorem and is the basis for the Fourier Transform, which decomposes a signal into these
- parts. Efficient algorithms
exist to approximate this decomposition (namely the FFT). Thus we can apply the FFT to an audio signal to extract the frequency terms that comprise the signal.
Spectrum
The frequency spectrum is the plotting of the frequency and the corresponding amplitudes that are present in the signal. The amplitude is the height of the peaks in the sinusoidal waves that compose the signal, or the strength of that frequency present. A spectrogram is a plotting of the frequency spectrum at each moment of time (darker areas are higher amplitudes, with the y-axis being frequency, and x-axis being time).
Formants
Formants are peaks in the frequency spectrum, or the frequencies that are most prevalent in the signal. Several formants exist in spoken samples and are used for vocal recognition (table below showing the average frequencies that are associated with vowels). These peaks correspond to resonance in sound sources like musical instruments, or anything with sound chambers (for humans this would be the nasal and oral cavity). The fundamental frequency is the first formant (F0) and is the pitch that humans detect. Vowel formant data from Peterson and Barney, 1952
Special Frequencies
There are certain frequencies of
sounds that are of special note. The hearing statistics are for healthy young adult. as people age their ability to hear the far end sounds decreases. Average Human Spoken Frequency
Male Female 120 Hz 210 Hz
Average Human Hearing Frequency
Lower High 20 Hz 20,000 Hz
Musical Notes using Equal-Tempered tuning [A4 = 440Hz]
Note Octave=1 Octave=2 Octave=3 Octave=4 Octave=5 Octave=6 A 55 110 220 440 880 1,760 A#/Bb 58 117 233 466 932 1,865 B 62 123 247 494 988 1,976 C 65 131 262 523 1,047 2,093 C#/Db 69 139 277 554 1,109 2,217 D 73 147 294 587 1,175 2,349 D#/Eb 78 156 311 622 1,245 2,489 E 82 165 330 659 1,319 2,637 F 87 175 349 698 1,397 2,794 F#/Gb 92 185 370 740 1,480 2,960 G 98 196 392 784 1,568 3,136 G#/Ab 104 208 415 831 1,661 3,322 A 110 220 440 880 1,760 3,520
Voice, Hearing, and Microphones
When speaking the vocal cords vibrate which closes the airway which stops and starts air flow. The air then resonates in the oral and nasal cavities. It is this stop and start of airflow that creates what are known as voiced sounds (ones that use the vocal cords, namely vowels). Latitudinal waves are created by this stopping and starting of airflow. The faster the cords vibrate the closer together the waves and thus higher frequency sounds are produced. Our eardrums pick up these compression/decompression waves by moving back and forth triggering neurons that send impulses to be deciphered our brain. Dynamic Microphones work in the same way, by having a plate that moves in and out, along a
- magnet. This movement of wires along the magnet
creates electrical impulses, which is what is saved in the computer. image from http://www.mediacollege.com/audio/microphones/dynamic.html
Applications
The purpose of studying voice and it's constructive parts (frequency, energy, formants, etc.) is for the variety of applications that can be explored. Some of these topics have not had much research done, and are topics that are gaining a lot of interest recently with newer and newer technological improvements.
- Voice Recognition (has improved a lot in the past couple of years)
- Voice Synthesis (using emotion and inflections to make it more realistic)
- Voice Emotional Analysis (clinical and wellness applications)
- Voice Stress Detection (lie detection, and operator state)
- Etc.
The reason voice analysis is becoming more and more popular is because of it's non-invasive data capture (much like that of vision analysis of facial expression).