AND REFERENCE CANCELLATION APPLIED TO FORENSIC AUDIO ENHANCEMENT - PowerPoint PPT Presentation

MUSIC AND NOISE FINGERPRINTING AND REFERENCE CANCELLATION APPLIED TO FORENSIC AUDIO ENHANCEMENT ANIL ALEXANDER 1 , OSCAR FORTH 1 AND DONALD TUNSTALL 2 1 Oxford Wave Research Ltd, United Kingdom {anil|oscar}@oxfordwaveresearch.com 2 Digital Audio Corporation, USA dtunstall@dacaudio.com Audio Engineering Society 46th Conference on Audio Forensics Denver, Colorado June 14-16, 2012

Introduction  In surveillance audio recordings, it is common to come across:  Interfering music or a television playing in the background in locations like pubs, cafes, cars, etc.  Other speakers in the background who mask the speech of interest  Target speakers who turn on their music players or their televisions, as they begin to speak, especially when they suspect they are being monitored, in order to mask their speech.  The loud music or background noise drowns out the words or makes the speech of the speakers hard to decipher and transcribe.

Research Questions Is it possible to reduce or remove: I - interfering music from non- contemporaneous reference material and to bring the voice of the speaker to the forefront? II- background noises, and speech of other speakers, music, etc. from contemporaneous recordings made in the same acoustic environment to bring the voice of the main speaker to the forefront?

Example (1,2): Car or Hotel Room Hotel Room In a Car Noise sources: Radio, Noise sources: road noise, car television, music player radio, other passengers

Example (3): Pub/Hall with Music Noise Sources: Television, Jukebox, Radio, Bar Noise, Other Speakers

Research Question (I) Is it possible to reduce or remove interfering music from non-contemporaneous reference material and to bring the voice of the speaker to the forefront? (Alexander and Forth, 2011)

Why is this difficult ? “ Is it possible to reduce or remove interfering music and to bring the voice of the speaker to the forefront?”  Straightforward subtraction of the audio will not remove the music as the effects of the room are not considered  Cancellation is sensitive to clipping and compression.  Has often to be applied on a single channel of audio (without simultaneous reference recordings).  The exact song that is playing has to be identified and perfectly time-aligned  time and labour intensive.

Reducing Background Music Tasks involved:  Identifying the music/song being played  Aligning the tracks to the exact moment in time, within the file being analysed, that the song or music begins  Applying a noise- and distortion-robust echo cancellation algorithm to remove or reduce the music while mostly leaving the target speech intact.

Automatic Music Identification  Commercial applications of acoustic fingerprinting are in areas of identifying tunes, songs, videos, advertisements and radio broadcasts and anti-piracy initiatives.  Recent proliferation of music identification systems such as Shazam ™.  A short segment of audio (noisy, distorted or otherwise poor) is sent through to an internet-based recognition server for identification.  The server compares feature of this recording to a pre-indexed database of songs.  It selects the most probable candidate(s) for the song.

Noise-Robust Audio Fingerprinting Query audio  A ttributes for a ‘fingerprint’ 4000 [Wang (2003)] 3000 2000  Temporally localized 1000  Translation invariant 0 5 10 15 20 25  Robust Match: 1-05 The Road To Hell (Part 2) at 179.744 sec 4000  Sufficiently Entropic 3000 2000  Spectral peak pairs are thus 1000 temporally localized, robust 0 180 185 190 195 200 205 to noise and transmission distortions

Landmark-based Audio Fingerprinting Algorithm (1) • Peaks’ chosen based having higher energy than neighbours • Spectrogram is reduced into a ‘constellation map’ containing spectral peaks. • Pairs of peaks selected as landmark ‘hashes’ that provide reference anchor points in time and frequency. • Landmark hash extraction is performed on query audio.

Landmark-based Audio Fingerprinting Algorithm (2) • Constellation maps are then Query Audio compared to obtain the position in time when some of the hashes Landmark hashes match, between the query and reference audio. • The file with the largest number of hash matches is selected as Reference audio containing music or noise the reference audio file. ∆t • An accurate estimate of the time of match is also returned by this Time of match (t) algorithm. Matching hashes

Landmark-based Audio Fingerprinting Algorithm (3) Query audio 4000 3000 2000 1000 0 5 10 15 20 25 Match: 1-05 The Road To Hell (Part 2) at 179.744 sec 4000 3000 2000 1000 0 180 185 190 195 200 205 Ellis (2009) Robust Landmark-Based Audio Fingerprinting

Result Example - Time Domain Original Signal (Speech and Music) 0.5 0 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4.0 Time (s) Identified Music Signal 0.5 0 -0.5 0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Time (s) Resulting Speech (Original - Music) 0.5 0 -0.5 1 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Time (s) Marked reduction in the noise floor

Result Example- Frequency Domain

Echo Cancellation (1)  Echo cancellation suffers from a similar problem –  playback from the speakers and simultaneous recordings from the microphones  t he playback should not ‘seep in’ to the recording in the microphone  An acoustic echo canceller could provide a good solution to the problem  Echo cancellation algorithms are generally LMS (Least Mean Square- based) – either time domain or frequency domain approaches can be used  In this application we use an echo canceller software module (compliant with ITU-T G.167, G.168) specifications using Intel Performance Primitives (IPP) library and the DAC CARDINAL.

LMS-based Echo Cancellation (S+N) Speech + noise/music Speech + + residual noise/music (S+N’ –N”) - (N ’) Identified Electronic time-aligned noise/music Response ( N”) Residual estimate

, LMS / NLMS Coefficient Update Each FIR coefficient h , index n , updated each sample interval i as follows: h n ( i + 1) = h n ( i ) + Δ h n ( i ) Update increment, Δ h n ( i ) , computed by LMS algorithm as follows: Δ h n ( i ) = µ ∙ e ( i ) ∙ x ( i - n ) NLMS uses a slightly different µ value, as follows: Δ h n ( i ) = µ ’ ∙ e ( i ) ∙ x ( i - n ) where µ’ is the specified µ value (or “adapt rate”), scaled inversely to the average input signal power

Electronic Response Estimate  FIR filter coefficients represent 15’ an electronic simulation of the room’s acoustical environment  Filter must have a sufficient number of taps, N, to not only account for direct acoustic path (A), but also the longest significant reverberation path (B)  At 16000 Hz sample rate, required N for example at left would be 0.070s * 16000/s = 1120 taps A – Direct Path (13’)  We typically estimate the B – Longest significant path (70’) minimum required filter length in milliseconds as 5 times largest dimension of the room in feet Sound: 1 ft ~ 1 msec

Time Alignment Drift  If there is a speed differential between the primary and reference tracks, the time alignment will “drift” as the processing progresses  This can be observed in the FIR coefficient response as a movement of the “big spike” (the large coefficient associated with the direct path signal correlation), either to the right or the left  If drift is significantly fast (e.g. more than 1-2 coefficients every 5-10 seconds), the LMS algorithm will never be able to converge the FIR coefficients to an optimal solution  Also, should the spike drift beyond either the beginning or the end of the filter, all cancellation will be lost

Research Question (II)  “Is it possible to reduce or remove, from contemporaneous recordings made in the same acoustic environment, interfering music, background noises, and speech of other speakers, to bring the voice of the main speaker to the forefront?”  Will having two microphones in the same environment allow for effective cancellation?

Applying ‘Audio Fingerprinting’ to Background Noise  Having two microphones in the same acoustic environment perfectly time aligned can greatly help bringing out the voice of one speaker over the other  Rarely happens in practise  Aligning noise is a more difficult problem as sufficient spectral peaks may not be available in both recordings.  Applying a less stringent criteria for matching, we can time- align audio from the two independent recorders in the same acoustic environment accurately.

Applications to Noise Identification (S+N) Speech + noise/music Speech + + residual noise/music (S+N’ –N”) - (N ’) Identified Electronic time-aligned noise/music ( N”) Response Residual estimate

Scenarios  Scenario 1: Two independent recordings using two smartphones in the same acoustic environment  Scenario 2: Two fixed microphones in the same acoustic environment  Scenario 3: White noise interference

AND REFERENCE CANCELLATION APPLIED TO FORENSIC AUDIO ENHANCEMENT - PowerPoint PPT Presentation

MUSIC AND NOISE FINGERPRINTING AND REFERENCE CANCELLATION APPLIED TO FORENSIC AUDIO ENHANCEMENT ANIL ALEXANDER 1 , OSCAR FORTH 1 AND DONALD TUNSTALL 2 1 Oxford Wave Research Ltd, United Kingdom {anil|oscar}@oxfordwaveresearch.com 2 Digital Audio

ZARISKI CANCELLATION FOR SURFACES Mikhail ZAIDENBERG (joint with Hubert FLENNER and Shulim

Constructing non-positively curved spaces and groups Day 3: Artin groups and small-cancellation

LLRF and beam loading cancellation Fumihiko Tamura J-PARC Ring RF group June 2015 ICFA

Early Indian Cancellations and Postmarks (1852 1872) Part - 1 What is a Cancellation ? A

Chapter 7 Cancellation and Shutdown Alfred Theorin, Ph.D. student Department of Automatic

Low complexity MMS E interference cancellation for LTE/ LTE-A uplink MIMO receiver Bei Yin and

Paratransit, Inc. Cancellation Policies, Practices & Results In October 2009 our

Wireless Communication Systems @CS.NCTU Lecture 11: Successive Interference Cancellation

Wireless Communication Systems @CS.NCTU Lecture 8: Successive Interference Cancellation

Adaptive Noise Cancellation Ashwin Karthik Tamilselvan : at3103 Gikku Stephen Geephilip: gg2624

Implications of dedicated seismometer cancellation for Advanced LIGO M. Coughlin measurements

On the Scene: Reference for On the Scene: Reference for Film, TV, Music and Video Film, TV,

Copying Objects: Reference Copy Copies: Reference vs. Shallow vs. Deep Reference Copy c1 := c2

Enter history ! We want to make of the Carnac Yacht Club a reference in the world of dinghy

Reference Architecture A Reference Architecture for Web Servers by Hassan, Holt SWAG

A Reference Model for Autonomic Networking draft-behringer-anima-reference-model-03.txt 93 rd

EchoPanel EchoPanel Acoustics 101 NRC = Noise Reduction Coefficient Noisy office/restaurant

Master Commercial Developer Industry Forum Agenda 9:00 am Welcom ome Phil Washington, Metro

Materials Science in Foaming Ernesto Di Maio and Salvatore Iannace Department of Chemical,

Ultrasonic Bat Deterrent Technology Dr. Kevin Kinzie Myron Miller General Electric Dr. Amanda

High Seas Fisheries Acoustic Surveys NZHSG decades of experience Orange Roughy and Alfonsino-

Landmark-Based Speech Recognition Mark Hasegawa-Johnson Jim Baker Steven Greenberg Katrin

Environmental Acoustics Team Vito Murillo, Laurie Prinz, Anthony Bianco, Ral Huertas This

Acoustics Overview and Aerospace Test Systems A. W. Mayne, III October 14, 2015 Huntsville, AL

AND REFERENCE CANCELLATION APPLIED TO FORENSIC AUDIO ENHANCEMENT - PowerPoint PPT Presentation

MUSIC AND NOISE FINGERPRINTING AND REFERENCE CANCELLATION APPLIED TO FORENSIC AUDIO ENHANCEMENT ANIL ALEXANDER 1 , OSCAR FORTH 1 AND DONALD TUNSTALL 2 1 Oxford Wave Research Ltd, United Kingdom {anil|oscar}@oxfordwaveresearch.com 2 Digital Audio

ZARISKI CANCELLATION FOR SURFACES Mikhail ZAIDENBERG (joint with Hubert FLENNER and Shulim

Constructing non-positively curved spaces and groups Day 3: Artin groups and small-cancellation

LLRF and beam loading cancellation Fumihiko Tamura J-PARC Ring RF group June 2015 ICFA

Early Indian Cancellations and Postmarks (1852 1872) Part - 1 What is a Cancellation ? A

Chapter 7 Cancellation and Shutdown Alfred Theorin, Ph.D. student Department of Automatic

Low complexity MMS E interference cancellation for LTE/ LTE-A uplink MIMO receiver Bei Yin and

Paratransit, Inc. Cancellation Policies, Practices &amp; Results In October 2009 our

Wireless Communication Systems @CS.NCTU Lecture 11: Successive Interference Cancellation

Wireless Communication Systems @CS.NCTU Lecture 8: Successive Interference Cancellation

Adaptive Noise Cancellation Ashwin Karthik Tamilselvan : at3103 Gikku Stephen Geephilip: gg2624

Implications of dedicated seismometer cancellation for Advanced LIGO M. Coughlin measurements

On the Scene: Reference for On the Scene: Reference for Film, TV, Music and Video Film, TV,

Copying Objects: Reference Copy Copies: Reference vs. Shallow vs. Deep Reference Copy c1 := c2

Enter history ! We want to make of the Carnac Yacht Club a reference in the world of dinghy

Reference Architecture A Reference Architecture for Web Servers by Hassan, Holt SWAG

A Reference Model for Autonomic Networking draft-behringer-anima-reference-model-03.txt 93 rd

EchoPanel EchoPanel Acoustics 101 NRC = Noise Reduction Coefficient Noisy office/restaurant

Master Commercial Developer Industry Forum Agenda 9:00 am Welcom ome Phil Washington, Metro

Materials Science in Foaming Ernesto Di Maio and Salvatore Iannace Department of Chemical,

Ultrasonic Bat Deterrent Technology Dr. Kevin Kinzie Myron Miller General Electric Dr. Amanda

High Seas Fisheries Acoustic Surveys NZHSG decades of experience Orange Roughy and Alfonsino-

Landmark-Based Speech Recognition Mark Hasegawa-Johnson Jim Baker Steven Greenberg Katrin

Environmental Acoustics Team Vito Murillo, Laurie Prinz, Anthony Bianco, Ral Huertas This

Acoustics Overview and Aerospace Test Systems A. W. Mayne, III October 14, 2015 Huntsville, AL

Paratransit, Inc. Cancellation Policies, Practices & Results In October 2009 our