Sounds in Visual Space Yuan Hao Dept. of Computer Science & - - PowerPoint PPT Presentation

sounds in visual space
SMART_READER_LITE
LIVE PREVIEW

Sounds in Visual Space Yuan Hao Dept. of Computer Science & - - PowerPoint PPT Presentation

Monitoring and Mining Animal Sounds in Visual Space Yuan Hao Dept. of Computer Science & Engineering University of California, Riverside Task Task Monitoring animals by examining the sounds they produce Build animal sound


slide-1
SLIDE 1

Monitoring and Mining Animal Sounds in Visual Space

Yuan Hao

  • Dept. of Computer Science & Engineering

University of California, Riverside

slide-2
SLIDE 2

Task Task

  • Monitoring animals by examining the sounds they

produce

  • Build animal sound recognition/classification

framework

Forty seconds Frequency (kHz) 3 Common Virtuoso Katydid (Amblycorypha longinicta)

2

slide-3
SLIDE 3

Outline Outline

  • Motivation
  • Our approach
  • Experimental evaluation
  • Conclusion & future work

3

slide-4
SLIDE 4

Motivation Motivation-application

application

Monitoring animals:

Outdoors

  • The density and variety of animal sounds can act as a measure
  • f biodiversity

Laboratory setting

  • Researchers create control groups of animals, expose them to

different settings, and test for different outcomes

Commercial application:

Acoustic animal detection can save money

4

slide-5
SLIDE 5

Motivation Motivation-difficulties

difficulties Most current bioacoustic classification tools have significant limitations They…

  • require careful tuning of many parameters
  • are too computationally expensive for sensors
  • are not accurate enough
  • too specialized

5

slide-6
SLIDE 6

Related Related Work Work

  • Dietrich et al (MCS 01), several classifications methods for

insect sounds

– Preprocessing and complicated feature extraction – Up to eighteen parameters – Learned on a data set containing just 108 exemplars

  • Brown et al (J. Acoust. Soc 09), analyze Australian anurans

(frogs and toads) – Identify the species of the frogs with an average accuracy of 98% – Requires extracting features from syllables – “Once the syllables have been properly segmented, a set of features can be calculated to represent each syllable”

6

slide-7
SLIDE 7

Outline Outline

  • Motivation
  • Our approach

– Visual space-spectrogram – CK distance measure – Sound fingerprint searching

  • Experimental evaluation
  • Conclusion & future work

7

slide-8
SLIDE 8

Intuition of our Approach Intuition of our Approach

  • Classify the animal sounds in the visual space, by treating the

texture of their spectrograms as an “acoustic fingerprint”, using a recently introduced parameter-free texture measure as a distance measure

One second subset of a common cricket’ sound spectrogram Can be considered the “fingerprint” for this sound

8

slide-9
SLIDE 9

Intuition of our Approach Intuition of our Approach

  • Classify the animal sounds in the visual space, by treating the

texture of their spectrograms as an “acoustic fingerprint”, using a recently introduced parameter-free texture measure as a distance measure

One second subset of a common cricket’ sound spectrogram Can be considered the “fingerprint” for this sound

9

slide-10
SLIDE 10

Our Our Approach Approach

T = 0.43

minLen maxLen

P U

10

slide-11
SLIDE 11

Visual Visual Space Space

Spectrogram

  • Algorithmic analysis needed instead of manual inspection
  • Significant noise artifacts
  • Avoid any type of data cleaning or explicit feature extraction,

and use the raw spectrogram

Forty seconds Frequency (kHz) 3 Common Virtuoso Katydid (Amblycorypha longinicta)

11

slide-12
SLIDE 12

CK CK Distance Distance M Measure easure

  • Distance measure of texture similarity
  • Robustly extracting features from noisy field recordings is

non-trivial

  • Expands the scope of the compression-based similarity

measurements to real-valued images by exploiting the compression technique used by MPEG video encoding.

  • Effective on images as diverse as moths, nematodes, wood

grains, tire tracks etc (SDM 10)

( | ) ( | ) ( , ) 1 ( | ) ( | )

CK

C x y C y x d x y C x x C y y    

12

slide-13
SLIDE 13

Sanity Sanity Check Check

CK as a tool for taxonomy

  • 0.4

0.4

  • 0.2

0.2

Gryllus rubens Gryllus firmus

Gryllus rubens Gryllus firmus Gryllidae

National Geographic article “the sand field cricket (Gryllus firmus) and the southeastern field cricket (Gryllus rubens) look nearly identical and inhabit the same geographical areas”

13

slide-14
SLIDE 14

Outline Outline

  • Motivation
  • Our approach

– Visual space-spectrogram – CK distance measure – Sound fingerprint searching

  • Experimental evaluation
  • Conclusion & future work

14

slide-15
SLIDE 15

Difficulties Difficulties

  • Do not have carefully extracted prototypes for each class

– Only have a collection of sound files

  • Do not know the call duration
  • Do not know how many occurrences of it appear in each file
  • May have mislabeled data
  • Noisy: most of the recordings are made in the wild

15

slide-16
SLIDE 16

Example: Discrete Text Strings Example: Discrete Text Strings

Assume three observations that correspond to a particular species P = {rrbbcxcfbb, rrbbfcxc, rrbbrrbbcxcbcxcf}

Given access to the universe of sounds that are known not to contain any example in P

U = {rfcbc, crrbbrcb, rcbbxc, rbcxrf,..,rcc }

Our task is equivalent to asking: Is there substring that appears

  • nly in P and not in U?

16

slide-17
SLIDE 17

Example: Discrete Text Strings Example: Discrete Text Strings

Assume three observations that correspond to a particular species P = {rrbbcxcfbb, rrbbfcxc, rrbbrrbbcxcbcxcf}

Given access to the universe of sounds that are known not to contain any example in P

U = {rfcbc, crrbbrcb, rcbbxc, rbcxrf,..,rcc }

Our task is equivalent to asking: Is there substring that appears

  • nly in P and not in U?

T1 = rrbb, T2 = rrbbc, T3 = cxc

17

slide-18
SLIDE 18

Case Case Studies Studies

3 4 2 1 8 10 11 5 12 9 6 7 One Second

Grylloidea Tettigonioidea

11 12 7 8 9 10 1 2 3 4 5 6 One Second

Six pairs of recordings of various Orthoptera. Visually determined and extracted one-second similar regions One size does not fit all, when it comes to the length

  • f the sound sequence.

18

slide-19
SLIDE 19

Sound Sound Fingerprint Fingerprint

Given U and P P: Contains examples only from the “positive” species class U: Non-target species sounds To find a subsequence of one of the objects in P, which is close to at least one subsequence in each element of P, but far from all subsequences in every element of U

Potential sound fingerprint

19

slide-20
SLIDE 20

Example Example

To find a subsequence of one of the objects in P, which is close to at least one subsequence in each element of P, but far from all subsequences in every element of U

1

Candidate being tested

1 2 3 4 5 A B C D

Split point (threshold)

20

slide-21
SLIDE 21

How How Hard Hard is is This This ?

max min

{ }

( 1)

i

L i l L S P

M l

 

 

 

where l is a certain length of candidate

is the length of any sound sequence in P

and is possible user defined length

  • f sound fingerprint

i

M

i

S

min

L

max

L

1

Candidate being tested

1 2 3 4 5 A B C D

Split point (threshold)

21

slide-22
SLIDE 22

Brute Brute Force Force S Search earch

Step 1: Given P and U, generate all possible subsequences from the objects in P of length m as the sound fingerprint candidates. Step 2: Using a sliding window with the same size

  • f candidate’ s, locate the minimum distance

for each object in P and U Step 3: Evaluation mechanism for splitting datasets into two groups Step 4: Sound fingerprint with the best splitting point, which is the one can produce the largest information gain to separate two classes

2 3 4 5 6 7 8

1 2 3 4 5

. . .

1

Generate and Evaluate

22

slide-23
SLIDE 23

Evaluation Evaluation Mechanism Mechanism

Step3: Information gain to evaluate candidate splitting rules E(D) = -p(X)log(p(X))-p(Y)log(p(Y)) where X and Y are two classes in D Gain = E(D) – E’(D) where E(D) and E’(D) are the entropy before and after partitioning D into D1 and D2 respectively. E’(D) = f(D1)E(D1) + f(D2)E(D2) where f(D1) is the fraction of objects in D1, and f(D2) is the fraction of objects in D2.

23

slide-24
SLIDE 24

Example Example

A total of nine objects, five from P, and four from U. This gives us the entropy for the unsorted data [-(5/9)log(5/9)-(4/9)log(4/9)] = 0.991

Information Gain = 0.991- 0.401 = 0.590

Four objects from P are the only four objects on the left side of the split point. Of the five objects to the right of the split point we have four objects from U and just one from P (4/9)[-(4/4)log(4/4)]+(5/9)[-(4/5)log(4/5)-(1/5)log(1/5)] = 0.401

1

Candidate being tested

1 2 3 4 5 A B C D

Split point (threshold)

24

slide-25
SLIDE 25

Outline Outline

  • Motivation
  • Our approach

– Visual space-spectrogram – CK distance measure – Sound fingerprint searching

  • Experimental evaluation

– Brute force search evaluation

– Speed up and efficiency

  • Conclusion & future work

25

slide-26
SLIDE 26

A demonstration of brute force search algorithm and the discrimination ability of the CK measure. One short template of insect sounds is scanned along a long sequence of sound, which contains one example of the target sound, plus three examples commonly confused insect sounds

Example Example

0.2 0.4 0.6

Recognition Threshold

Distance value Distance value

0.1 0.2 0.3 0.4 0.5 0.6 0.7 4

The distance ordering The sound fingerprint

P U

26

slide-27
SLIDE 27

100 200 300 400 500 600 700 800 900

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Brute-force search terminates

Running time: 7.5 hours

Information gain Distance value

0.1 0.2 0.3 0.4 0.5 0.6 0.7 4

The distance ordering The sound fingerprint

P U

P = Atlanticus dorsalis

27

slide-28
SLIDE 28

Speedup by Speedup by Entropy Entropy-based based P Pruning runing

After split: (3/9)[-(3/3)log(3/3)]+(6/9)[-(4/6)log(4/6)-(2/6)log(2/6)] = 0.612 Before split: [-(5/9)log(5/9)-(4/9)log(4/9)] = 0.991 Upper bound Information Gain = 0.991- 0.612= 0.379 Best-so-far Information Gain 0.991- 0.401 = 0.590

<

1 1

U

28

slide-29
SLIDE 29

100 200 300 400 500 600 700 800 900

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Brute-force search terminates Entropy pruning search terminates

Running time: 7.5 hours Running time: 1.9 hours

Information gain Distance value

0.1 0.2 0.3 0.4 0.5 0.6 0.7 4

The distance ordering The sound fingerprint

P U

P = Atlanticus dorsalis

29

slide-30
SLIDE 30 100 200 300 400 500 600 700 800 900 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Brute- force search terminates Entropy pruning search terminates

In brute-force search, we search left to right, top to bottom Is there a better order? How can we find a good candidate earlier? The earlier we find a good candidate, the information gain is higher, the more instances we can prune. But how do we resolve this “chicken and egg” paradox? Speedup intuition

  • Euclidean distance is much faster than CK
  • So let us use Euclidean distance to approximate the

best search order for CK

  • This will only work if Euclidean distance is a good

proxy for CK…. (next slide)

30

slide-31
SLIDE 31

Euclidean Euclidean Distance Distance M Measure easure P Pruning runing

0.4 0.5 0.6 0.7 0.8 0.9 1 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 CK Euclidean

31

slide-32
SLIDE 32

Performance of Optimization Performance of Optimization

100 200 300 400 500 600 700 800 900

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Brute-force search terminates Entropy pruning search terminates All

  • ptimizations

search terminates

Running time: 7.5 hours Running time: 1.9 hours Running time: 1.1 hours

Information gain

32

slide-33
SLIDE 33

Case Case Study (1) Study (1)

33

0.2 0.4 0.6

Distance value Recognition Threshold

For more visual understanding, please take a look at the video on YouTube

slide-34
SLIDE 34

Case Case Study (2) Study (2)

0.2 0.4 0.6

Recognition Threshold Distance value

34

slide-35
SLIDE 35

Classification

35 species-level problem genus-level problem default rate fingerprint default rate fingerprint 10 species 0.10 0.70 0.70 0.93 20 species 0.05 0.44 0.60 0.77

Benchmark of insect classification: The data consists of twenty species of insects, eight of which are Gryllidae (crickets) and twelve of which are Tettigoniidae (katydids) Problems: either a twenty-species level problem, or two-class genus level problem. Method: predicted the testing exemplars class label (as the pink

  • ne shown on the left ) by sliding each fingerprint across it and

recording the fingerprint that produced the minimum value as the exemplar’s nearest neighbor (the pink fingerprint ).

20 sound fingerprints Testing dataset

Insect classification accuracy

slide-36
SLIDE 36

Scalability of Fingerprint Discovery

36

1500 0.2 0.4 0.6 0.8 1

Brute-force search terminates Entropy pruning search terminates Search with reordering

  • ptimization terminates

Number of calls to the CK distance measure Information Gain To test the speedup of our toy problem shown on the left, we reran these experiments with a more realistically-sized universe U, containing 200-objects from other insects, birds, trains, helicopters, etc. The result is shown on above.

slide-37
SLIDE 37

Mislabeled Data Sanity Check

Same dataset for mislabel check Left: assume all labeled correctly Right: two instances in positive class mislabeled

Distance value

0.1 0.2 0.3 0.4 0.5 0.6 0.7

The distance ordering The sound fingerprint

P U

P = Atlanticus dorsalis

37

The sound fingerprint

P U

Distance value The distance ordering

0.1 0.2 0.3 0.4 0.5 0.6 0.7

slide-38
SLIDE 38

Mislabeled Data Sanity Check

Same dataset for mislabel check Top: assume all labeled correctly Bottom: two instances in positive class mislabeled

38

0.5 1

Distance value

P U

0.5 1

Distance value

P U

Recognition Threshold Recognition Threshold 200 400 600 800 1000 1200 1400 1600 1800

slide-39
SLIDE 39

No noise Noise: +5dB Noise: -5dB Noise: -4dB

0.2 0.4 0.6

Distance value

0.2 0.4 0.6

Distance value

0.2 0.4 0.6

Distance value

0.2 0.4 0.6

Distance value

Noise background experiment

Recognition Threshold Recognition Threshold Recognition Threshold Recognition Threshold

slide-40
SLIDE 40

Classification

40

species-level problem genus-level problem

default rate fingerprint default rate fingerprint 10 species 0.10 0.70 0.70 0.93 20 species 0.05 0.44 0.60 0.77

Twenty insect species datasets: Eight of them are Grylliadae (crickets) Twelve of them are Tettigoniidae (katydids)

slide-41
SLIDE 41

Other animals-Frogs

41

0.2 0.4 0.6

CK Distance value

Recognition Threshold

slide-42
SLIDE 42

Outline Outline

  • Motivation
  • Our approach

– Visual space-spectrogram – CK distance measure – Sound fingerprint searching

  • Experimental evaluation

– CK as a tool for taxonomy – Speed up and efficiency

  • Conclusion & future work

42

slide-43
SLIDE 43

Conclusion & Conclusion & Future Work Future Work

  • Our approach to analyze insect sound in visual

space is parameter free

  • Our optimizations can speedup the brute-force

search

  • We will test more species and dataset
  • We will further speedup the algorithm

43

slide-44
SLIDE 44

Thank you Thank you

Code

Code and Data: and Data: http http://www.cs.ucr.edu/~ ://www.cs.ucr.edu/~yhao/animalsoundfingerprint.html yhao/animalsoundfingerprint.html

44