Micropower Electro-Magnetic (EM) Sensors for Speech - - PowerPoint PPT Presentation

micropower electro magnetic em sensors for speech
SMART_READER_LITE
LIVE PREVIEW

Micropower Electro-Magnetic (EM) Sensors for Speech - - PowerPoint PPT Presentation

Micropower Electro-Magnetic (EM) Sensors for Speech Characterization: Recognition, Verification, and Other Applications presented to IBM Watson Research Laboratory Yorktown, New York J.F. Holzrichter Lawrence Livermore National Laboratory


slide-1
SLIDE 1

Micropower Electro-Magnetic (EM) Sensors for Speech Characterization: Recognition, Verification, and Other Applications

presented to

IBM Watson Research Laboratory Yorktown, New York

J.F. Holzrichter Lawrence Livermore National Laboratory holzrichter1@llnl.gov February 4, 1999

Work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract W-7405-Eng-48.

slide-2
SLIDE 2

JFH:2/4/99.tlb 2

We conjectured, a few years ago, that micropower EM sensors could provide useful additional information for many speech applications. This appears to be true.

  • EM sensors measure generalized positions versus time of

speech articulator interfaces – They are very effective as articulator gesture detectors – They work well where the articulator being detected is isolated in space and characterized in frequency

  • They are useful for many signal processing applications

– Enable a good voiced excitation function – Enable noise reduction & many “house keeping” activities

  • These sensors are compatible with most acoustic technologies

– Very low, human compatible output levels, < 0.2 mW – Low cost, low power use, FCC & FDA ok, and small

slide-3
SLIDE 3

JFH:2/4/99.tlb 3

Micropower EM sensors have been used primarily in the homodyne “field disturbance” mode

This image shows the present EM sensor, which costs about $15 in parts, and about $200 to replicate (margin costs). These are expected to cost a few dollars each, in production as chips.

slide-4
SLIDE 4

JFH:2/4/99.tlb 4

tongue

tip

palate

back

nasal tract velum

  • ral tract

pharynx glottis vocal folds Acoustic microphon e

Transmitted E M waves Refllected EM wave velulm closed position

EM sensor 3 EM sensor 1 EM sensor 2

Teeth Lips

Miniature micropower EM sensors can measure articulator positions and motions in real time

slide-5
SLIDE 5

JFH:2/4/99.tlb 5

There are many ways to include the EM sensors with normal speech transducers: e.g., hand held micro- phones, mike-booms, monitor, telephones, etc.

slide-6
SLIDE 6

JFH:2/4/99.tlb 6

Micropower EM sensors have many applicable modalities when used for acoustics and speech applications

  • EM sensor modalities used for speech experiments are

– Impulse transmit and impulse range gate – Wave packet transmit and impulse range gate – Homodyne transmit and receive – High pass filtering (i.e., field disturbance) or DC coupling

  • The homodyne sensors generate 10-ns EM wave trains at

2 GHz frequency (λ = 15 cm), with a 2-MHz prf (1/e tissue penetration is 3 to 5 cm)

  • The power levels are <1.0 nJ/pulse and

<0.2 mW of average radiated power (can be <0.02 mW)

  • The power and energy densities on human tissues are one
  • rder-of-magnitude below international continuous exposure

levels of 1 mW/cm2, and can be made 100-fold lower

slide-7
SLIDE 7

JFH:2/4/99.tlb 7

The interpretation of EM sensor information depends upon its use in near, intermediate,

  • r far field modes

Areaeff

ff

* Homodyne approximation Signal* = const. Δr x Aeff x (d/dr envelope) = 0.02 V/mm/cm2 at r = 6 cm

Near Field Mode* Far Field Mode* (i.e., Radar)

Signal# = const. Aeff x 1/r4

λ > Α1/2 r >>

Area

# Multiple pulse radar approximation

antenna antenna

Air/Material Interface Air/Material Interface

λ < Α1/2 r <

r r

slide-8
SLIDE 8

JFH:2/4/99.tlb 8

Good filtering makes small “relative motions”

  • f interfaces measurable, e.g., >50 dB

signal to noise

Sensitivity Envelope Position cm Position - Magnified

x 10 µM

E 10-6 V Interface Motions 2 4 5 6 time 1 time 2 Magnified 1.0 E field amplitude reflecting

slide-9
SLIDE 9

JFH:2/4/99.tlb 9

Multiple EM-wave cycles, homodyne detection, and AC filtering are used for glottal experiments

Sensitivity function versus distance of

  • bject from sensor

Transmit Antenna Receive Antenna Integrator High & Low pass filters

3 kHz

70 Hz

Wave Packet Glottal Signal

EM Waves

Mixer multiplies

local and received wave train

ε = 50.

slide-10
SLIDE 10

JFH:2/4/99.tlb 10

An EM sensor signal reflected from the glottal area, calibrated against high speed videos of glottal cycle, shows good signal timing agreement

slide-11
SLIDE 11

JFH:2/4/99.tlb 11

Glottal EM Sensor (GEMS) tissue measurements show strong correlation with vocal-fold electroglottography (EGG) signal *

* Experiments conducted at the U. of Iowa, National Center for Voice and Speech in collaboration with Prof. I. Titze & Dr.

  • B. Story, and W. Lea of Speech Sciences.
slide-12
SLIDE 12

JFH:2/4/99.tlb 12

EM-sensed glottal tissue data and pitch rate show distinct individuality on all users tested. Generalized signal structure agrees with EGG data.*

* Experiments conducted at the U. of Iowa, National Center for Voice and Speech in collaboration with Prof. I. Titze & Dr.

  • B. Story, and W. Lea of Speech Sciences.
slide-13
SLIDE 13

JFH:2/4/99.tlb 13

Micropower EM sensors vastly increase the amount

  • f information to characterize an individual’s

articulator conditions during speech

  • Provides generalized locations of vocal articulator interfaces during

human speech at >1 kHz rates — Measures vocal folds, tongue, lips, jaw, velum — Measures articulators not influencing the acoustics

  • Enables the measurement of the glottal cycle, definition of

synchronous frames, and an estimate of a voiced excitation function, with frequencies from 70 Hz to 7 kHz

  • Obtains physiological values of each individual’s speech organs

and their EM wave reflection coefficients

  • Enables these measurements non-invasively, safely, and

economically in the presence of acoustic noise

slide-14
SLIDE 14

JFH:2/4/99.tlb 14

EM sensors enable much improved pitch measurements, especially in noisy environments

Measure zero crossing time interval (no noise) EM Sensor-Based Pitch Measurement Three Acoustic Approaches

slide-15
SLIDE 15

Glottal & Pharynx motions for unvoiced speech

EM sensors enable glottal cycle measurements accurately, <+1 Hz, and automatically (e.g., onset)

slide-16
SLIDE 16

JFH:2/4/99.tlb 16

The EM sensor yields smoother, more accurate pitch contours relative to traditional methods

Hz Audio Radar

time time time

EM sensor EM sensor Audio Audio CEP CEP

They were all good men

slide-17
SLIDE 17

JFH:2/4/99.tlb 17

Unlike acoustic algorithms, EM sensor pitch algorithms work well in the presence of acoustic noise

Onset of Second Speaker EM Sensor Cepstral Autocorrelation Problems Problems

slide-18
SLIDE 18

JFH:2/4/99.tlb 18

From tuning fork experiments, we determine the “pitch-estimating” performance of EM sensors

Kflops 1 10 100 1000 10000 Cepstral Autocor GEMS Error Rate (%) 0.05 0.1 0.15 0.2 0.25 0.3 Cepstral Autocor GEMS

  • Using known signals, we

compute pitch using: Cepstral, Autocor, & EM signal zero-crossings

  • EM sensor pitch estimates:
  • 100 times faster in

computational efficiency

  • 5 to 20 times smaller errors
  • Insensitive to acoustic noise
  • Provide real-time pitch

synchronous processing

EM sensor EM sensor

slide-19
SLIDE 19

JFH:2/4/99.tlb 19

The human vocal tract has excitation source(s), E, followed by a sequence of tubes and resonators that can be described as H, using linear equations

Pressure Reservoir (lungs) 1 2 3 4 lips jaw tongue pharynx vocal folds soft palate subglottus EM sensor microphone

Horizontal vocal tract with 4 resonator chambers

  • spectral output or

Transfer function

E H A

= H(ω) = A(ω) / E(ω)

Jaw, Tongue, Palate Sensor

subglottis

slide-20
SLIDE 20

JFH:2/4/99.tlb 20

The EM Sensor gives access to the real-time excitation function to use in an ARMA model

h(t) H(z) = Y(z)/X(z)

x(t) y(t)

zeros poles

... + z A + z A + A ... + z B + B = H(z)

2

  • 2

1

  • 1
  • 1

1

Glottal excitation function Audio

Transfer function

Poles of the transfer function are the resonances of the vocal tract Zeros of the transfer function are the anti-resonances of the vocal tract

Vocal tract impulse response

slide-21
SLIDE 21

JFH:2/4/99.tlb 21

H(z)

X(z) Y(z)

H(z) H(z) H(z)

ARMA LPC Cepstral

Synthetic Audio

Using a synthetic transfer function, we see improved characterization of “zeros” in transfer functions when using pole/zero (e.g., ARMA) modeling

Frequency (Hz) 4 pole/2 zero Synthetic Transfer Function 4 pole/2 zero ARMA 4 pole LPC 16 coeffi ficients Cepstral

EM Excitation Synthetic Transfer Function Calculated Transfer Functions

slide-22
SLIDE 22

JFH:2/4/99.tlb 22

The ARMA model yields accurate and robust transfer functions, which compare well to traditional models

F1 F2

15 pole/15 zero ARMA model (blue) 20 Coeffi ficient Cepstral model (black) 15 Pole LPC model (red)

/i/

slide-23
SLIDE 23

JFH:2/4/99.tlb 23

The EM sensor glottal information enables pitch synchronous signal processing of speech

The sample below illustrates a commonly confused segment of the phrase “recognize speech”

slide-24
SLIDE 24

JFH:2/4/99.tlb 24

Another example of pitch synchronous ARMA processing

The sample below illustrates a commonly confused segment of the phrase “wreck a nice beach”

slide-25
SLIDE 25

JFH:2/4/99.tlb 25

Three separate EM radar sensors have been used to measure several articulator motions as the word “print” is spoken

slide-26
SLIDE 26

JFH:2/4/99.tlb 26

Lower frequency, 0.2 - 10 Hz EM sensor channels can provide generalized articulator timing information

palate nasal tract velum

  • ral tract

pharynx Acoustic microphone

velulm closed position

EM sensor

EM Sensors

/ng/ /a/ /a/

Spectrogram EM Sensor

slide-27
SLIDE 27

JFH:2/4/99.tlb 27

We believe the EM sensor/acoustic approach will be useful: it is low cost, safe, effective, and small

  • Speech Recognition and Low Bandwidth Coding
  • Onset, pause, and end-of-speech
  • Noise rejection
  • Articulator descriptions directly or via transfer functions
  • High quality, low bandwidth Vocoding
  • Speaker Validation
  • Highly accurate timing, articulator motions, & transfer functions
  • Noninvasive, fast, virtually impossible to fake
  • Speech Synthesis
  • Matched and personalized excitations and transfer functions
  • Straightforward intonation and prosody implementation
  • Speech Research, Disability Analysis, Speech Correction, as well

as Teaching Language and Singing

  • Non invasive, specific articulator targeting, etc.
  • Very accurate, low cost pitch measurement
slide-28
SLIDE 28

JFH:2/4/99.tlb 28

I would like to thank the following individuals for their contributions:

  • DAS/UC Davis and LLNL

Experiments and signal processing – G. C. Burnett, T. J. Gable, L. C. Ng, R. Leonard (UCD Medical)

  • R. R. Freeman

Sensors – S. G. Azevedo, E. T. Rosenbury , C. Z. Liang & N. C. Luhman

  • UC Berkeley

– N. Morgan & J. Ohala

  • Stanford University

– S. Peters & N. Scott

  • University of Iowa

– B. Story & I. Titze

  • Haskins Laboratory

– L. Goldstein & A. Lofquist

  • Speech Sciences

– W. Lea