analysis of everyday sounds
play

Analysis of Everyday Sounds Dan Ellis and Keansub Lee Laboratory - PowerPoint PPT Presentation

Analysis of Everyday Sounds Dan Ellis and Keansub Lee Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu 1. Personal and Consumer Audio 2. Segmenting &


  1. Analysis of Everyday Sounds Dan Ellis and Keansub Lee Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu 1. Personal and Consumer Audio 2. Segmenting & Clustering 3. Special-Purpose Detectors 4. Generic Concept Detectors 5. Challenges & Future 2007-07-24 p. /35 1 Analysis of Everyday Sounds - Ellis & Lee

  2. LabROSA Overview Information Extraction Music Environment Recognition Separation Retrieval Signal Machine Processing Learning Speech 2007-07-24 p. /35 2 Analysis of Everyday Sounds - Ellis & Lee

  3. 1. Personal Audio Archives • Easy to record everything you hear <2GB / week @ 64 kbps • Hard to find anything how to scan? how to visualize? how to index? • Need automatic analysis • Need minimal impact 2007-07-24 p. /35 3 Analysis of Everyday Sounds - Ellis & Lee

  4. Personal Audio Applications • Automatic appointment-book history fills in when & where of movements • “Life statistics” how long did I spend in meetings this week? most frequent conversations favorite phrases? • Retrieving details what exactly did I promise? privacy issues... • Nostalgia • ... or what? 2007-07-24 p. /35 4 Analysis of Everyday Sounds - Ellis & Lee

  5. Consumer Video • Short video clips as the evolution of snapshots 10-60 sec, one location, no editing browsing? • More information for indexing... video + audio foreground + background 2007-07-24 p. /35 5 Analysis of Everyday Sounds - Ellis & Lee

  6. Information in Audio • Environmental recordings contain info on: location – type (restaurant, street, ...) and specific activity – talking, walking, typing people – generic (2 males), specific (Chuck & John) spoken content ... maybe • but not: what people and things “looked like” day/night ... ... except when correlated with audible features 2007-07-24 p. /35 6 Analysis of Everyday Sounds - Ellis & Lee

  7. A Brief History of Audio Processing • Environmental sound classification draws on earlier sound classification work as well as source separation... Speech Recognition Source Separation One channel Multi-channel Speaker ID GMM-HMMs Model-based Cue-based Music Audio Genre & Artist ID Sountrack & Environmental Recognition 2007-07-24 p. /35 7 Analysis of Everyday Sounds - Ellis & Lee

  8. 2. Segmentation & Clustering • Top-level structure for long recordings: Where are the major boundaries? e.g. for diary application support for manual browsing • Length of fundamental time-frame 60s rather than 10ms? background more important than foreground average out uncharacteristic transients • Perceptually-motivated features .. so results have perceptual relevance broad spectrum + some detail 2007-07-24 p. /35 8 Analysis of Everyday Sounds - Ellis & Lee

  9. MFCC Features • Need “timbral” features: Mel-Frequency Cepstral Coeffs (MFCCs)   auditory-like  frequency    warping       log-domain         discrete    cosine    transform   = orthogonalization         2007-07-24 p. /35 9 Analysis of Everyday Sounds - Ellis & Lee

  10. Long-Duration Features Average Linear Energy Normalized Energy Deviation 60 20 120 20 freq / bark 15 freq / bark 15 40 100 10 10 80 20 5 5 60 dB dB Average Log Energy Log Energy Deviation 120 20 20 15 freq / bark 15 100 freq / bark 15 10 10 10 80 5 5 5 60 dB dB Average Spectral Entropy Spectral Entropy Deviation 0.9 20 20 0.5 0.8 freq / bark freq / bark 15 15 0.4 0.7 0.3 10 10 0.6 0.2 5 5 0.5 0.1 bits bits 50 100 150 200 250 300 350 400 450 time / min • Capture both average and variation • Capture a little more detail in subbands... 2007-07-24 p. /35 10 Analysis of Everyday Sounds - Ellis & Lee

  11. Spectral Entropy N F • Auditory spectrum: ∑ A [ n , j ] = w jk X [ n , k ] k = 0 • Spectral entropy ≈ ‘peakiness’ of each band: N F � w jk X [ n , k ] � w jk X [ n , k ] ∑ H [ n , j ] = − · log A [ n , j ] A [ n , j ] k = 0 FFT spectral magnitude 0 energy / dB -20 Auditory Spectrum -40 -60 0 1000 2000 3000 4000 5000 6000 7000 8000 0.5 rel. entropy / bits 0 per-band Spectral Entropies -0.5 -1 30 340 750 1130 1630 2280 3220 3780 4470 5280 6250 7380 freq / Hz 2007-07-24 p. /35 11 Analysis of Everyday Sounds - Ellis & Lee

  12. BIC Segmentation • BIC (Bayesian Info. Crit.) compares models: log L ( X 1 ; M 1 ) L ( X 2 ; M 2 ) ≷ λ 2 log( N )∆#( M ) L ( X ; M 0 ) 2004-09-10-1023_AvgLEnergy 20 AvgLogAudSpec 15 10 5 boundary passes BIC 0 no boundary BIC score with shorter context -100 -200 13:30 14:00 14:30 15:00 15:30 16:00 time / hr L ( X 2 ; M 2 ) L ( X 1 ; M 1 ) L ( X ; M 0 ) last segmentation candidate current boundary conte xt limit point 2007-07-24 p. /35 12 Analysis of Everyday Sounds - Ellis & Lee

  13. BIC Segmentation Results • Evaluate: 62 hr hand-marked dataset 8 days, 139 segments, 16 categories measure Correct Accept % @ False Accept = 2%: Feature Correct Accept μ dB 80.8% 0.8 o μ H 81.1% 0.7 σ H / μ H 81.6% Sensitivity 0.6 μ dB + σ H / μ H 84.0% 0.5 µ dB μ dB + σ H / μ H + μ H 83.6% 0.4 µ H � H /µ H mfcc 73.6% µ dB + � H /µ H 0.3 µ dB + µ H + � H /µ H 0.2 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 1 - Specificity 2007-07-24 p. /35 13 Analysis of Everyday Sounds - Ellis & Lee

  14. Segment Clustering • Daily activity has lots of repetition: Automatically cluster similar segments ‘affinity’ of segments as KL2 distances 4*5)#1-% 1))%'23 -"#"0-) ,"#,)# ()!%*#)/ ,'(('"#. ;01) ,#)"- ,0:('23 + ()!%*#)+ 4%#))% !"#$%"&' #)4%"*#"2% 768 (',#"#9 7 !"15*4 !15 (', #4% 4%# 666 2007-07-24 p. /35 14 Analysis of Everyday Sounds - Ellis & Lee

  15. Spectral Clustering • Eigenanalysis of affinity matrix: A = U•S•V ′ SVD components: u k •s kk •v k ' Affinity Matrix k=2 k=1 900 800 800 600 700 400 600 200 500 k=3 k=4 400 800 300 600 200 400 100 200 200 400 600 800 200 400 600 800 200 400 600 800 eigenvectors v k give cluster memberships • Number of clusters? 2007-07-24 p. /35 15 Analysis of Everyday Sounds - Ellis & Lee

  16. Clustering Results • Clustering of automatic segments gives ‘anonymous classes’ BIC criterion to choose number of clusters make best correspondence to 16 GT clusters • Frame-level scoring gives ~70% correct errors when same ‘place’ has multiple ambiences 2007-07-24 p. /35 16 Analysis of Everyday Sounds - Ellis & Lee

  17. Browsing Interface • Browsing / Diary interface links to other information (diary, email, photos) synchronize with note taking? (Stifelman & Arons) audio thumbnails • Release Tools + “how to” for capture '!!(D!%D&$ '!!(D!%D&( '!!(D!%D&) '!!(D!%D&* '!!(D!%D&+ !"#!! !"#$! !%#!! !%#$! ,-./01223 &!#!! 045. <..68=: C' 045. 045. &!#$! 25580. 25580. ,-./01223 <..68=:' 045. &&#!! ,2/63.0 276922- >2= EFG!$ EFG!( 3.067-. 3.067-. 045. &&#$! -9: 3.067-. :-27, 25580. &'#!! 02<,<6: 276922- 25580. 276922- 276922- &'#$! 276922- :-27, <..68=:' 25580. &$#!! 045. :-27, &$#$! 34; /.<8=4- 276922- 276922- C' 045. :-27, <..68=: 276922- 25580. 045. 25580. 25580. &(#!! <..68=:' ?4=7.3 045. ?8H. 25580. ,2/63.0 <..68=: 25580. <..68=: 3.067-.' &(#$! 276922- &)#!! 276922- 25580. @--2A2B F4<;4-64B :-414< :, 25580. 25580. :-27, &)#$! ,2/63.0 25580. :-27, 25580. 25580. &*#!! 25580. <..68=:' C.//.- H.4=/7; &*#$! 045. &+#!! 276922- 34; 2007-07-24 p. /35 17 Analysis of Everyday Sounds - Ellis & Lee

  18. 3. Special-Purpose Detectors: Speech • Speech emerges as most interesting content • Just identifying speech would be useful goal is speaker identification / labeling • Lots of background noise conventional Voice Activity Detection inadequate • Insight: Listeners detect pitch track (melody) look for voice-like periodicity in noise coffeeshop excerpt 4000 3000 Frequency 2000 1000 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time 2007-07-24 p. /35 18 Analysis of Everyday Sounds - Ellis & Lee

  19. Voice Periodicity Enhancement • Noise-robust subband autocorrelation • Subtract local average suppresses steady background e.g. machine noise 15 min test set; 88% acc (no suppression: 79%) also for enhancing speech by harmonic filtering 2007-07-24 p. /35 19 Analysis of Everyday Sounds - Ellis & Lee

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend