Acoustic Fingerprinting Soundz Jake Runzer June 28, 2018 Jake - - PowerPoint PPT Presentation

acoustic fingerprinting
SMART_READER_LITE
LIVE PREVIEW

Acoustic Fingerprinting Soundz Jake Runzer June 28, 2018 Jake - - PowerPoint PPT Presentation

Acoustic Fingerprinting Soundz Jake Runzer June 28, 2018 Jake Runzer Acoustic Fingerprinting June 28, 2018 1 / 35 Outline What is Acoustic Fingerprinting 1 Fingerprinting for Music Identification 2 Spectrograms 3 History 4 My


slide-1
SLIDE 1

Acoustic Fingerprinting

Soundz Jake Runzer June 28, 2018

Jake Runzer Acoustic Fingerprinting June 28, 2018 1 / 35

slide-2
SLIDE 2

Outline

1

What is Acoustic Fingerprinting

2

Fingerprinting for Music Identification

3

Spectrograms

4

History

5

My Implementation

6

Demo

7

References

Jake Runzer Acoustic Fingerprinting June 28, 2018 2 / 35

slide-3
SLIDE 3

Overview of Acoustic Fingerprints

An audio fingerprint is a compact signature that summarizes an audio signal.

Jake Runzer Acoustic Fingerprinting June 28, 2018 3 / 35

slide-4
SLIDE 4

Requirements for Acoustic Fingerprints

A fingerprint should have the following properties Is unique to that specific audio signal Does not depend on the binary representation of the audio Represents how humans hear the audio

Jake Runzer Acoustic Fingerprinting June 28, 2018 4 / 35

slide-5
SLIDE 5

Overview of Music Identification (I)

Use a database of fingerprints belonging to known sources to identify a fingerprint belonging to an unknown source.

Jake Runzer Acoustic Fingerprinting June 28, 2018 5 / 35

slide-6
SLIDE 6

Overview of Music Identification (II)

Jake Runzer Acoustic Fingerprinting June 28, 2018 6 / 35

slide-7
SLIDE 7

Examples

You have probably have used or know of apps that used audio fingerprinting for music identification Shazam Soundhound

Jake Runzer Acoustic Fingerprinting June 28, 2018 7 / 35

slide-8
SLIDE 8

Requirements for Music Identification

Music identification is often done on mobile devices in noisy environments Size of data generated is as small as possible Low computational footprint Length of audio required to get match is short (<10 sec) Noise/distortion agnostic

Jake Runzer Acoustic Fingerprinting June 28, 2018 8 / 35

slide-9
SLIDE 9

Typical Pipeline

1 Capture audio on mobile device 2 Create fingerprint on device and send to matching server 3 Database is normally inverted index of fingerprint -> song 4 Approximate nearest neighbour search is performed to find best

candidates

5 Temporal alignment step applied to most similar matches 6 Return best matched song to mobile device Jake Runzer Acoustic Fingerprinting June 28, 2018 9 / 35

slide-10
SLIDE 10

Why Spectrograms?

Almost all fingerprinting techniques rely on audio spectrograms More closely represents how humans hear audio compared to the binary representation Time and frequency resolution can be adjusted to make algorithm more robust to noise

Jake Runzer Acoustic Fingerprinting June 28, 2018 10 / 35

slide-11
SLIDE 11

What are Spectrograms?

Visual representation of the spectrum of frequencies of sound as they vary with time.

Jake Runzer Acoustic Fingerprinting June 28, 2018 11 / 35

slide-12
SLIDE 12

Short-Time Fourier Transform (STFT)

Used to determine the frequency of local sections of a signal as it changes

  • ver time. An overlapping window is moved over the audio. At each step

the Fourier Transform is computed using FFT.

Jake Runzer Acoustic Fingerprinting June 28, 2018 12 / 35

slide-13
SLIDE 13

STFT Parameters

These values can be configured and modified to change how the spectrogram is generated Window length FFT Length Overlap amount

Jake Runzer Acoustic Fingerprinting June 28, 2018 13 / 35

slide-14
SLIDE 14

Approaches

A few common approaches that have been used. All rely on audio spectrograms. Computer vision based Wavelet based Peak based

Jake Runzer Acoustic Fingerprinting June 28, 2018 14 / 35

slide-15
SLIDE 15

Computer Vision for Music Identification

Intuition: 1D audio signals can be processed as conventional images when viewed in the time-frequency spectrogram representation. Spectrogram is treated as set of overlapping images Train AdaBoost classifiers on box-filters Output of classifier is binary value representing the differences between values aggregated in two sub-rectangular regions Use concatenated output of classifier as fingerprint https://ieeexplore.ieee.org/document/1467322/

Jake Runzer Acoustic Fingerprinting June 28, 2018 15 / 35

slide-16
SLIDE 16

Wavelet-Based

Compute overlapping spectrogram images Decompose images using multi-resolution Haar wavelets Retain only top-t wavelets, where t is much smaller than the size of spectrogram Only keep sign information Compare two spectrograms by computing byte wise Hamming distance https://www.sciencedirect.com/science/article/pii/S0031320308001702

Jake Runzer Acoustic Fingerprinting June 28, 2018 16 / 35

slide-17
SLIDE 17

Peak-Pair Hashing

The original Shazam algorithm. Look only at spectrogram peaks Peaks are more likely to survive ambient noise A peak analysis of music and noise together will contain spectral peaks due to the music and noise as if they were analyzed separately Look at pairs of peaks and create lots of fingerprints per audio sample https://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf

Jake Runzer Acoustic Fingerprinting June 28, 2018 17 / 35

slide-18
SLIDE 18

Peak-Pair Improvements

Improvement on Wang’s algorithm (peak-pair hashing) "Fingerprints are generated using a modulated complex lapped transform-based non- repeating foreground audio extraction and an adaptive threshold method for promi- nent peak detection".

Jake Runzer Acoustic Fingerprinting June 28, 2018 18 / 35

slide-19
SLIDE 19

My Implementation

I implemented the music identification using peak-pair hashing (Shazam

  • riginal algorithm) in Python using the Numpy and Scipy libraries.

Jake Runzer Acoustic Fingerprinting June 28, 2018 19 / 35

slide-20
SLIDE 20

Architecture

Jake Runzer Acoustic Fingerprinting June 28, 2018 20 / 35

slide-21
SLIDE 21

Working Example

Throughout the next slides we will look at the song "Kids" by "MGMT" https://www.youtube.com/watch?v=aBd46BbdTfs

Jake Runzer Acoustic Fingerprinting June 28, 2018 21 / 35

slide-22
SLIDE 22

Spectrogram Creation

I use the following parameters to create the spectrogram Window: Hamming Window size: 1024 Overlap: 0.5 FFT size: 1024

Jake Runzer Acoustic Fingerprinting June 28, 2018 22 / 35

slide-23
SLIDE 23

Constellations

Time-frequency peaks are found using an image local maxima filter with a neighbourhood of 15 pixels (freq + time axes). For Kids, there are 14425 peaks.

Jake Runzer Acoustic Fingerprinting June 28, 2018 23 / 35

slide-24
SLIDE 24

Finding Pair

For each peak, the closest 15 neighbouring peaks within 200 seconds create a pair. For Kids, there are 8514 fingerprints.

Jake Runzer Acoustic Fingerprinting June 28, 2018 24 / 35

slide-25
SLIDE 25

Creating Hashes (I)

A hash is created for each pair (not a cryptographic hash). Each has is composed of the frequency of point 1 the frequency of point 2 the difference in their times The hash is combined with the time offset of the first point, as it will be necessary for matching, to create a fingerprint. fingerprint = hash:time = [f1, f2, t2 - t1]:t1

Jake Runzer Acoustic Fingerprinting June 28, 2018 25 / 35

slide-26
SLIDE 26

Creating Hashes (II)

Jake Runzer Acoustic Fingerprinting June 28, 2018 26 / 35

slide-27
SLIDE 27

Database

Information about the source songs and each fingerprint are stored in a PostgreSQL database.

Song

Id Artist Album Title Track Year Duration

Fingerprint

Id Hash Time offset Song Id

Jake Runzer Acoustic Fingerprinting June 28, 2018 27 / 35

slide-28
SLIDE 28

Identification

When an unknown audio sample needs to be identified, Fingerprints are created Matching fingerprints are retrieved from the database Fingerprints are aligned Song associated with best matched set of fingerprints is returned

Jake Runzer Acoustic Fingerprinting June 28, 2018 28 / 35

slide-29
SLIDE 29

Fingerprint Aligning (I)

We cannot know the time offset the unknown audio was recorded at We can find matched fingerprints that occur successively after each

  • ther

The time offsets from the unknown fingerprints are subtracted from the time offsets of the matched fingerprints

Jake Runzer Acoustic Fingerprinting June 28, 2018 29 / 35

slide-30
SLIDE 30

Fingerprint Aligning (II)

Diagonal is present where matched fingerprints occur successively after each other.

Jake Runzer Acoustic Fingerprinting June 28, 2018 30 / 35

slide-31
SLIDE 31

A Match!

Jake Runzer Acoustic Fingerprinting June 28, 2018 31 / 35

slide-32
SLIDE 32

Source

The source code can be found on Github. github.com/coffee-cup/soundz

Jake Runzer Acoustic Fingerprinting June 28, 2018 32 / 35

slide-33
SLIDE 33

Demo

And now. . . a demo!

Jake Runzer Acoustic Fingerprinting June 28, 2018 33 / 35

slide-34
SLIDE 34

Thanks

Thanks for listening!

Jake Runzer Acoustic Fingerprinting June 28, 2018 34 / 35

slide-35
SLIDE 35

References

A review of audio fingerprinting Computer Vision for Music Identification Waveprint: Efficient wavelet-based audio fingerprinting A Review of algorithms for audio fingerprinting Survey and evaluation of audio fingerprinting schemes for mobile query-by-example applications Landmark-based music recognition systems optimisation using genetic algorithms An Industrial Strength Audio Search Algorithm Robust audio fingerprinting use peak-pair-based hash of non-repeating foreground audio in a real environment

Jake Runzer Acoustic Fingerprinting June 28, 2018 35 / 35