Hardware Model and Software Validation for AcoustiGLASS (Autonomous - - PowerPoint PPT Presentation

hardware model and software validation for acoustiglass
SMART_READER_LITE
LIVE PREVIEW

Hardware Model and Software Validation for AcoustiGLASS (Autonomous - - PowerPoint PPT Presentation

Hardware Model and Software Validation for AcoustiGLASS (Autonomous Wearable Alert Device based on Sound Pattern Recognition) Kei Kojima March 2016 OBJECTIVE & DESIGN CRITERIA The objective is to build an autonomous hearing glass


slide-1
SLIDE 1

Hardware Model and Software Validation for AcoustiGLASS

(Autonomous Wearable Alert Device based

  • n Sound Pattern Recognition)

Kei Kojima March 2016

slide-2
SLIDE 2

OBJECTIVE & DESIGN CRITERIA

The objective is to build an autonomous ‘hearing glass’ prototype that performs a real-time object identification through a robust audio pattern recognition algorithm.

microphones RGB leds

  • !

The ‘hearing glass’ is wearable in physical size.!

  • The system is autonomous and performs the !

necessary tasks without human interactions. !

  • The time response of the system is fast enough, typically less than

0.5 seconds, in order to timely notify the user of the alarming

  • sound. !
  • The system is capable of discerning an approximate orientation,

i.e., from left-hand side or right-hand side, of the origin of the sound.

slide-3
SLIDE 3

‘HEARING’ GLASS PROTOTYPE

  • Two RGB LEDs are placed on the top right and left outer corners of the glass to

maintain clear vision for the user.!

  • Each RGB LED pin is connected to a specific GPIO pins of a micro-computer.!
  • Two Stereo microphones (SONY) are mounted on the glass and are placed by the

user’s ears for sound localization!

  • USB audio adapters allow microphones to convey the audio signals to Raspberry

Pi.!

  • It is powered by an external battery which feeds 5V into the Raspberry Pi’s power

port for >8 hours of operation.!

microphones RGB leds Raspberry Pi 2

slide-4
SLIDE 4

METHODS

Audio Spectrogram Analysis

  • A Buffer block overlaps the raw sound waves.!
  • A Periodogram block estimates the PSD (Power

Spectral Density) of the signal through a fast Fourier transform (FFT).!

  • A second buffer block constructs the sound data

into multi-dimensional spectrogram arrays.!

  • The spectrogram is normalized by taking the

mean value and dividing it by the max value of the signal.

  • Reference spectrograms are created from prerecorded audio or from wave files from

sound libraries.! !

After the spectrogram of the recorded audio has been constructed…!

  • Pre-recorded reference spectrograms are cross-correlated two-dimensionally with the

incoming spectrogram through 2-D convolution.!

  • The venctor mean of the cross-correlations are computed.

Two-Dimensional Cross-Correlation

Police Reference Spectrogram

slide-5
SLIDE 5

METHODS, CONTINUED

Peak Detection

  • The absolute value of the cross-

correlation is taken flipping all negative numbers to their positive counter parts.!

  • Utilizing

the Matlab function findpeaks, the peaks in the mean cross-correlations are detected and their prominence to other peaks in the correlation are computed.!

  • The cross-correlation results are

examined to determine the best thresholds for the heights, location, and prominence.

Sound Object Visualization (simulation)

When the cross-correlation result meets all the preset thresholds, the code sends a command to display a LED light pattern or text alert with a graphic icon for the specific sound object recognized.

Bluejay to Bluejay! 2-D cross- correlation

slide-6
SLIDE 6

SOUND OBJECT RECOGNITION! ALGORITHM (SIMULATION ON MAC)

slide-7
SLIDE 7

SOUND LOCALIZATION! ALGORITHM (DEPLOYED TO A HARDWARE)

slide-8
SLIDE 8

SOUND LOCALIZATION TEST

  • Two microphones record sound

independently, approximately 7 inches apart.!

  • The audio signal is squared and

then down sampled via. FIR (Finite Impulse Response) decimation.!

  • The down-sampled signals

undergo low pass filter to eliminate high-frequency components.!

  • A buffer block constructs the

sound data into multi- dimensional arrays (1x64) and a mean function takes the average

  • f each array to increase the

stability and accuracy of the sound localization.

10 - 4 10 - 3

Left Right

Steps to compute signal envelope

slide-9
SLIDE 9

0.045 0.100

Envelope of Audio Wave (RIGHT)

SOUND LOCALIZATION DATA PROCESSING

  • The signals will not be processed

unless their amplitude exceeds a preset threshold in order to keep background noise from setting off the LEDs (i.e., false alert).!

  • The magnitudes of the signal

envelop are compared to each other and the signal with the higher magnitude indicates the orientation

  • f the sound origin.!
  • One of the limitations of the system

is a lack of ability to discriminate exact orientation.

Envelope of Audio Wave (Left)

Just as human brain uses the difference in incoming sound volume with two ears, the present electronics detect the minute difference in sound volume with the two microphones for localization.

slide-10
SLIDE 10

REFERENCE SPECTROGRAMS

POLICE SIREN! Mechanical sounds makes the detection relatively.! Has three distinct frequencies between 1 and 3 kHz.

GUNSHOT! Has an short abrupt frequency ranging from 0 to 64 kHz BLUEJAY CALL! Has distinct and discrete frequency peaks between two distinct frequencies approximately 10 and 20 kHz

slide-11
SLIDE 11

2-D cross correlation computes element-by- element products and then sums them.

BASIC OF 2-D CROSS CORRELATION

2-D cross-correlation result

34 201 286 121 106 167 60 165 470 329 244 334 299 109 271 359 405 570 585 479 256 186 229 550 615 730 409 206 116 309 595 760 575 349 221 137 263 504 434 339 222 51 66 119 256 181 256 25 72

1 * 8 + 7 * 3 + 13 * 4 + 8 * 1 + 14 * 5 + 20 * 9 + 15 * 6 + 16 * 7 + 22 * 2 = 585

(5+3-1)-by-(5+3-1) or 7-by-7 matrix

5 by 5 matrix 3 by 3 matrix

Values of M1! matrix Values of M2! matrix

Alignment of center ! element of M2

slide-12
SLIDE 12

2-D CROSS CORRELATION IMAGE

Dog Growl (input) vs. Dog Growl (reference)

Strong correlation:!

  • Single, symmetric

peak.!

  • Higher peak height.!
  • Peak location is at the

center line.!

!

☑ Sound category identified!

Dog Growl (input) vs. Bird chirp (reference)

Weak correlation:!

  • Multiple, asymmetric

peaks.!

  • Lower peak height.!
  • Peak location(s) is off

center.!

!

☐ Sound category rejected!!

!

slide-13
SLIDE 13

PEAK DETECTION

max peak/ second peak! 2.582 peak location = 105.1

Averaged Police to Police cross-correlations(abs value)

  • The height of the max peak was 172.1 and

its location is 67 on the x axis. This is not in between the thresholds of 62 - 66.!

  • The ratio of the max peak to the second

highest peak was 2.0605. This is not in between the thresholds of 2.4 - 2.6.!

  • The location of the third peak was 101. This

is not between the thresholds of >102 and <110.

peak location = 101

Averaged Police to Dog cross-correlations(abs value)

max peak/ second! peak = ! 2.0605

  • The height of the max peak is 159.4 and its

location is 64 on the x axis(64 is the exact middle in a graph that contains 127 columns) and it is between 62 - 66. !

  • The ratio of the max peak to the second

highest peak was 2.582, which is between 2.4 - 2.6.!

  • The location of the third peak is 105.1,

which is between >102 and <110.

slide-14
SLIDE 14

TRUE/FALSE TEST

REJECTION

20 13 7 13 7 20 6 14

REJECTION

30 25 5 28 2 29 1 22 8

P: Police D: Dog S: Smoke alarm B: Bird chirping G: Gunshot P D B S G P D B S G P D B S G Simulation Test 1:!

  • Audio input: 22.05 kHz, 1 sec.!
  • FFT span: 128 samples !
  • Peak prominence 2.1 to 2.7!
  • 20 runs

Simulation Test 2 ! (improved peak detection scheme/thresholds):!

  • Audio input: 22.05 kHz, 1 sec.!
  • FFT span: 128 samples !
  • Peak prominence 2.2 to 2.8!
  • 30 runs
slide-15
SLIDE 15

CONCLUSION

!

  • The ‘Hearing’ Glass is a bridging technology that can fill the gaps in lives of deaf
  • people. !
  • A computer algorithm was developed in a graphic programming environment,

Simulink that utilizes spectrogram, and two-dimensional cross-correlation methods to identify sound objects of interest. !

  • A prototype was built on a low-cost computer, i.e., Raspberry Pi with two LEDs to relay

information was a success.!

  • Computer simulation of sound object recognition showed promising capability for 4

alarming sounds and 1 friendly sound.!

  • One of the major technical challenges is a noise treatment. Presently, a noisy

background is overcome through various thresholds the cross-correlation result must meet.!

  • The similarity in the lower frequency components of the dog growling and typical

background can cause false detections with increasing noise floor.!

  • The Sound Localization algorithm succeeded in having the ability to give the
  • rientation (right-left) of the sound object’s origin.
slide-16
SLIDE 16

OUTLOOK

Applications!

  • A. Environmental sound alert system for Deaf and People with hearing loss!
  • B. Machine failures detection based on changes in sounds!
  • C. Sound-based object recognition or situation sensors for rescue and military robots

(in addition to vision sensors).!

!

Bag of Features! The bag of features technique in which different features are taken from each reference spectrograms and the recorded spectrograms.!

!

Machine Learning! A endeavor into Machine Learning which trains the computer pr micro-controller to “learn” information directly from data without assuming a predetermined equation as a model, can be worthwhile in the longer outlook when detection with 2-D cross- correlation between the input signal and hundreds of reference spectrograms may be come computationally heavy.!

!

Add-on Systems! The development of an algorithm that is compatible with the iPhone or Google glass! would be suitable. The computation rates would be much better and the visualizing section (Google Glass)would be much better. Although the process of acquiring smart glasses would be costly, the insurances and state-owned funds would minimize the cost for the user’s.