Analysis of Everyday Sounds Dan Ellis and Keansub Lee Laboratory - PowerPoint PPT Presentation

Analysis of Everyday Sounds Dan Ellis and Keansub Lee Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu 1. Personal and Consumer Audio 2. Segmenting & Clustering 3. Special-Purpose Detectors 4. Generic Concept Detectors 5. Challenges & Future 2007-07-24 p. /35 1 Analysis of Everyday Sounds - Ellis & Lee

LabROSA Overview Information Extraction Music Environment Recognition Separation Retrieval Signal Machine Processing Learning Speech 2007-07-24 p. /35 2 Analysis of Everyday Sounds - Ellis & Lee

1. Personal Audio Archives • Easy to record everything you hear <2GB / week @ 64 kbps • Hard to find anything how to scan? how to visualize? how to index? • Need automatic analysis • Need minimal impact 2007-07-24 p. /35 3 Analysis of Everyday Sounds - Ellis & Lee

Personal Audio Applications • Automatic appointment-book history fills in when & where of movements • “Life statistics” how long did I spend in meetings this week? most frequent conversations favorite phrases? • Retrieving details what exactly did I promise? privacy issues... • Nostalgia • ... or what? 2007-07-24 p. /35 4 Analysis of Everyday Sounds - Ellis & Lee

Consumer Video • Short video clips as the evolution of snapshots 10-60 sec, one location, no editing browsing? • More information for indexing... video + audio foreground + background 2007-07-24 p. /35 5 Analysis of Everyday Sounds - Ellis & Lee

Information in Audio • Environmental recordings contain info on: location – type (restaurant, street, ...) and specific activity – talking, walking, typing people – generic (2 males), specific (Chuck & John) spoken content ... maybe • but not: what people and things “looked like” day/night ... ... except when correlated with audible features 2007-07-24 p. /35 6 Analysis of Everyday Sounds - Ellis & Lee

A Brief History of Audio Processing • Environmental sound classification draws on earlier sound classification work as well as source separation... Speech Recognition Source Separation One channel Multi-channel Speaker ID GMM-HMMs Model-based Cue-based Music Audio Genre & Artist ID Sountrack & Environmental Recognition 2007-07-24 p. /35 7 Analysis of Everyday Sounds - Ellis & Lee

2. Segmentation & Clustering • Top-level structure for long recordings: Where are the major boundaries? e.g. for diary application support for manual browsing • Length of fundamental time-frame 60s rather than 10ms? background more important than foreground average out uncharacteristic transients • Perceptually-motivated features .. so results have perceptual relevance broad spectrum + some detail 2007-07-24 p. /35 8 Analysis of Everyday Sounds - Ellis & Lee

MFCC Features • Need “timbral” features: Mel-Frequency Cepstral Coeffs (MFCCs)   auditory-like  frequency    warping       log-domain         discrete    cosine    transform   = orthogonalization         2007-07-24 p. /35 9 Analysis of Everyday Sounds - Ellis & Lee

Long-Duration Features Average Linear Energy Normalized Energy Deviation 60 20 120 20 freq / bark 15 freq / bark 15 40 100 10 10 80 20 5 5 60 dB dB Average Log Energy Log Energy Deviation 120 20 20 15 freq / bark 15 100 freq / bark 15 10 10 10 80 5 5 5 60 dB dB Average Spectral Entropy Spectral Entropy Deviation 0.9 20 20 0.5 0.8 freq / bark freq / bark 15 15 0.4 0.7 0.3 10 10 0.6 0.2 5 5 0.5 0.1 bits bits 50 100 150 200 250 300 350 400 450 time / min • Capture both average and variation • Capture a little more detail in subbands... 2007-07-24 p. /35 10 Analysis of Everyday Sounds - Ellis & Lee

Spectral Entropy N F • Auditory spectrum: ∑ A [ n , j ] = w jk X [ n , k ] k = 0 • Spectral entropy ≈ ‘peakiness’ of each band: N F � w jk X [ n , k ] � w jk X [ n , k ] ∑ H [ n , j ] = − · log A [ n , j ] A [ n , j ] k = 0 FFT spectral magnitude 0 energy / dB -20 Auditory Spectrum -40 -60 0 1000 2000 3000 4000 5000 6000 7000 8000 0.5 rel. entropy / bits 0 per-band Spectral Entropies -0.5 -1 30 340 750 1130 1630 2280 3220 3780 4470 5280 6250 7380 freq / Hz 2007-07-24 p. /35 11 Analysis of Everyday Sounds - Ellis & Lee

BIC Segmentation • BIC (Bayesian Info. Crit.) compares models: log L ( X 1 ; M 1 ) L ( X 2 ; M 2 ) ≷ λ 2 log( N )∆#( M ) L ( X ; M 0 ) 2004-09-10-1023_AvgLEnergy 20 AvgLogAudSpec 15 10 5 boundary passes BIC 0 no boundary BIC score with shorter context -100 -200 13:30 14:00 14:30 15:00 15:30 16:00 time / hr L ( X 2 ; M 2 ) L ( X 1 ; M 1 ) L ( X ; M 0 ) last segmentation candidate current boundary conte xt limit point 2007-07-24 p. /35 12 Analysis of Everyday Sounds - Ellis & Lee

BIC Segmentation Results • Evaluate: 62 hr hand-marked dataset 8 days, 139 segments, 16 categories measure Correct Accept % @ False Accept = 2%: Feature Correct Accept μ dB 80.8% 0.8 o μ H 81.1% 0.7 σ H / μ H 81.6% Sensitivity 0.6 μ dB + σ H / μ H 84.0% 0.5 µ dB μ dB + σ H / μ H + μ H 83.6% 0.4 µ H � H /µ H mfcc 73.6% µ dB + � H /µ H 0.3 µ dB + µ H + � H /µ H 0.2 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 1 - Specificity 2007-07-24 p. /35 13 Analysis of Everyday Sounds - Ellis & Lee

Segment Clustering • Daily activity has lots of repetition: Automatically cluster similar segments ‘affinity’ of segments as KL2 distances 4*5)#1-% 1))%'23 -"#"0-) ,"#,)# ()!%*#)/ ,'(('"#. ;01) ,#)"- ,0:('23 + ()!%*#)+ 4%#))% !"#$%"&' #)4%"*#"2% 768 (',#"#9 7 !"15*4 !15 (', #4% 4%# 666 2007-07-24 p. /35 14 Analysis of Everyday Sounds - Ellis & Lee

Spectral Clustering • Eigenanalysis of affinity matrix: A = U•S•V ′ SVD components: u k •s kk •v k ' Affinity Matrix k=2 k=1 900 800 800 600 700 400 600 200 500 k=3 k=4 400 800 300 600 200 400 100 200 200 400 600 800 200 400 600 800 200 400 600 800 eigenvectors v k give cluster memberships • Number of clusters? 2007-07-24 p. /35 15 Analysis of Everyday Sounds - Ellis & Lee

Clustering Results • Clustering of automatic segments gives ‘anonymous classes’ BIC criterion to choose number of clusters make best correspondence to 16 GT clusters • Frame-level scoring gives ~70% correct errors when same ‘place’ has multiple ambiences 2007-07-24 p. /35 16 Analysis of Everyday Sounds - Ellis & Lee

Browsing Interface • Browsing / Diary interface links to other information (diary, email, photos) synchronize with note taking? (Stifelman & Arons) audio thumbnails • Release Tools + “how to” for capture '!!(D!%D&$ '!!(D!%D&( '!!(D!%D&) '!!(D!%D&* '!!(D!%D&+ !"#!! !"#$! !%#!! !%#$! ,-./01223 &!#!! 045. <..68=: C' 045. 045. &!#$! 25580. 25580. ,-./01223 <..68=:' 045. &&#!! ,2/63.0 276922- >2= EFG!$ EFG!( 3.067-. 3.067-. 045. &&#$! -9: 3.067-. :-27, 25580. &'#!! 02<,<6: 276922- 25580. 276922- 276922- &'#$! 276922- :-27, <..68=:' 25580. &$#!! 045. :-27, &$#$! 34; /.<8=4- 276922- 276922- C' 045. :-27, <..68=: 276922- 25580. 045. 25580. 25580. &(#!! <..68=:' ?4=7.3 045. ?8H. 25580. ,2/63.0 <..68=: 25580. <..68=: 3.067-.' &(#$! 276922- &)#!! 276922- 25580. @--2A2B F4<;4-64B :-414< :, 25580. 25580. :-27, &)#$! ,2/63.0 25580. :-27, 25580. 25580. &*#!! 25580. <..68=:' C.//.- H.4=/7; &*#$! 045. &+#!! 276922- 34; 2007-07-24 p. /35 17 Analysis of Everyday Sounds - Ellis & Lee

3. Special-Purpose Detectors: Speech • Speech emerges as most interesting content • Just identifying speech would be useful goal is speaker identification / labeling • Lots of background noise conventional Voice Activity Detection inadequate • Insight: Listeners detect pitch track (melody) look for voice-like periodicity in noise coffeeshop excerpt 4000 3000 Frequency 2000 1000 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time 2007-07-24 p. /35 18 Analysis of Everyday Sounds - Ellis & Lee

Voice Periodicity Enhancement • Noise-robust subband autocorrelation • Subtract local average suppresses steady background e.g. machine noise 15 min test set; 88% acc (no suppression: 79%) also for enhancing speech by harmonic filtering 2007-07-24 p. /35 19 Analysis of Everyday Sounds - Ellis & Lee

Analysis of Everyday Sounds Dan Ellis and Keansub Lee Laboratory - PowerPoint PPT Presentation

Analysis of Everyday Sounds Dan Ellis and Keansub Lee Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu 1. Personal and Consumer Audio 2. Segmenting &

SOUND M Bethancourt What is Sound? Sounds as Physical Phenomena Sounds as Organized Beauty

Phase 1 Environmental sounds Environmental Sounds and Instrumental Sounds . To develop the

Letters and Sounds Phonics information for parents What is Letters and Sounds ? Letters and

Session 2 Session 2 Tool Time Tuesday Tool Time Tuesday Soothing Sounds, WebEx Sounds, Security

Concepts of Print Jolly Phonics and Active Literacy Learning the letter sounds The main sounds

sounds or phonemes. They are then taught how to blend these sounds together to read the

RWI Phonics Parent Meeting An overview of RWI Speed Sounds Children who read at home do well at

Welcome to Reading Books to Children 5 Basic Skills 1. Learning the letter sounds 2. Letter

and writing using the letter sounds. We follow the Letters and Sounds order of 2 teaching.

PRONUNCIATION UNIT 1 UNIT 3 /s/ , /z/ , /z/ sounds Vowel sounds: // and /i:/ Aim: Students

Overview/Questions How do we hear sounds? How can audio information (sounds) be

English sounds John Goldsmith September 27, 2011 John Goldsmith () English sounds September

soundFishing Claudio Lucio Midolo This project is about a portable tool that captures sounds

1 Give some examples of Sounds in everyday life. -Speech -Ultrasound machine used to break up

Wheelchair use in everyday life Stephen Sprigle Why understand wheelchair use in everyday

Everyday Efficiencies Todd L. Montgomery @toddlmontgomery Everyday Efficiencies Why should we

Project Plan ERP Air Force: Drone Elephant Recognition and Tracking The Capstone Experience

Balancing Privacy and Safety: Protecting Driver Identity in Naturalistic Driving Video Data

LEARNING INNOVATIONS Welcome! Please visit

How to use Microsoft Teams to participate in the HNES 2020 online conference 1. Click this link to

INTERNATIONAL DIALOGUE ON MIGRATION (IDM) 2014 Inter- sessional Workshop, 78 October 2014

CFD Topological Optimization of a Car Water-Pump Inlet using TOSCA Fluid and STAR- CCM+ Dr.

Families at the Swiss Embassy Rick Ward Embassy Tax Services LLC October 26, 2016 Embassy

Identification of voices in disguised speech Jessica Clark* & Paul Foulkes** * University of

Analysis of Everyday Sounds Dan Ellis and Keansub Lee Laboratory - PowerPoint PPT Presentation

Analysis of Everyday Sounds Dan Ellis and Keansub Lee Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu 1. Personal and Consumer Audio 2. Segmenting &

SOUND M Bethancourt What is Sound? Sounds as Physical Phenomena Sounds as Organized Beauty

Phase 1 Environmental sounds Environmental Sounds and Instrumental Sounds . To develop the

Letters and Sounds Phonics information for parents What is Letters and Sounds ? Letters and

Session 2 Session 2 Tool Time Tuesday Tool Time Tuesday Soothing Sounds, WebEx Sounds, Security

Concepts of Print Jolly Phonics and Active Literacy Learning the letter sounds The main sounds

sounds or phonemes. They are then taught how to blend these sounds together to read the

RWI Phonics Parent Meeting An overview of RWI Speed Sounds Children who read at home do well at

Welcome to Reading Books to Children 5 Basic Skills 1. Learning the letter sounds 2. Letter

and writing using the letter sounds. We follow the Letters and Sounds order of 2 teaching.

PRONUNCIATION UNIT 1 UNIT 3 /s/ , /z/ , /z/ sounds Vowel sounds: // and /i:/ Aim: Students

Overview/Questions How do we hear sounds? How can audio information (sounds) be

English sounds John Goldsmith September 27, 2011 John Goldsmith () English sounds September

soundFishing Claudio Lucio Midolo This project is about a portable tool that captures sounds

1 Give some examples of Sounds in everyday life. -Speech -Ultrasound machine used to break up

Wheelchair use in everyday life Stephen Sprigle Why understand wheelchair use in everyday

Everyday Efficiencies Todd L. Montgomery @toddlmontgomery Everyday Efficiencies Why should we

Project Plan ERP Air Force: Drone Elephant Recognition and Tracking The Capstone Experience

Balancing Privacy and Safety: Protecting Driver Identity in Naturalistic Driving Video Data

LEARNING INNOVATIONS Welcome! Please visit

How to use Microsoft Teams to participate in the HNES 2020 online conference 1. Click this link to

INTERNATIONAL DIALOGUE ON MIGRATION (IDM) 2014 Inter- sessional Workshop, 78 October 2014

CFD Topological Optimization of a Car Water-Pump Inlet using TOSCA Fluid and STAR- CCM+ Dr.

Families at the Swiss Embassy Rick Ward Embassy Tax Services LLC October 26, 2016 Embassy

Identification of voices in disguised speech Jessica Clark* &amp; Paul Foulkes** * University of

Identification of voices in disguised speech Jessica Clark* & Paul Foulkes** * University of