Monitoring and Mining Animal Sounds in Visual Space
Yuan Hao
- Dept. of Computer Science & Engineering
University of California, Riverside
Sounds in Visual Space Yuan Hao Dept. of Computer Science & - - PowerPoint PPT Presentation
Monitoring and Mining Animal Sounds in Visual Space Yuan Hao Dept. of Computer Science & Engineering University of California, Riverside Task Task Monitoring animals by examining the sounds they produce Build animal sound
University of California, Riverside
Forty seconds Frequency (kHz) 3 Common Virtuoso Katydid (Amblycorypha longinicta)
2
3
4
5
– Preprocessing and complicated feature extraction – Up to eighteen parameters – Learned on a data set containing just 108 exemplars
(frogs and toads) – Identify the species of the frogs with an average accuracy of 98% – Requires extracting features from syllables – “Once the syllables have been properly segmented, a set of features can be calculated to represent each syllable”
6
7
One second subset of a common cricket’ sound spectrogram Can be considered the “fingerprint” for this sound
8
One second subset of a common cricket’ sound spectrogram Can be considered the “fingerprint” for this sound
9
T = 0.43
minLen maxLen
P U
10
Forty seconds Frequency (kHz) 3 Common Virtuoso Katydid (Amblycorypha longinicta)
11
CK
12
0.4
0.2
Gryllus rubens Gryllus firmus
Gryllus rubens Gryllus firmus Gryllidae
National Geographic article “the sand field cricket (Gryllus firmus) and the southeastern field cricket (Gryllus rubens) look nearly identical and inhabit the same geographical areas”
13
14
15
16
17
3 4 2 1 8 10 11 5 12 9 6 7 One Second
Grylloidea Tettigonioidea
11 12 7 8 9 10 1 2 3 4 5 6 One Second
Six pairs of recordings of various Orthoptera. Visually determined and extracted one-second similar regions One size does not fit all, when it comes to the length
18
Potential sound fingerprint
19
1
Candidate being tested
1 2 3 4 5 A B C D
Split point (threshold)
20
max min
{ }
i
L i l L S P
where l is a certain length of candidate
and is possible user defined length
i
M
i
S
min
L
max
L
1
Candidate being tested
1 2 3 4 5 A B C D
Split point (threshold)
21
Step 1: Given P and U, generate all possible subsequences from the objects in P of length m as the sound fingerprint candidates. Step 2: Using a sliding window with the same size
for each object in P and U Step 3: Evaluation mechanism for splitting datasets into two groups Step 4: Sound fingerprint with the best splitting point, which is the one can produce the largest information gain to separate two classes
2 3 4 5 6 7 8
1 2 3 4 5
. . .
1
22
23
Information Gain = 0.991- 0.401 = 0.590
1
Candidate being tested
1 2 3 4 5 A B C D
Split point (threshold)
24
25
A demonstration of brute force search algorithm and the discrimination ability of the CK measure. One short template of insect sounds is scanned along a long sequence of sound, which contains one example of the target sound, plus three examples commonly confused insect sounds
0.2 0.4 0.6
Recognition Threshold
Distance value Distance value
0.1 0.2 0.3 0.4 0.5 0.6 0.7 4
The distance ordering The sound fingerprint
P U
26
100 200 300 400 500 600 700 800 900
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Brute-force search terminates
Running time: 7.5 hours
Information gain Distance value
0.1 0.2 0.3 0.4 0.5 0.6 0.7 4
The distance ordering The sound fingerprint
P U
P = Atlanticus dorsalis
27
After split: (3/9)[-(3/3)log(3/3)]+(6/9)[-(4/6)log(4/6)-(2/6)log(2/6)] = 0.612 Before split: [-(5/9)log(5/9)-(4/9)log(4/9)] = 0.991 Upper bound Information Gain = 0.991- 0.612= 0.379 Best-so-far Information Gain 0.991- 0.401 = 0.590
1 1
U
28
100 200 300 400 500 600 700 800 900
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Brute-force search terminates Entropy pruning search terminates
Running time: 7.5 hours Running time: 1.9 hours
Information gain Distance value
0.1 0.2 0.3 0.4 0.5 0.6 0.7 4
The distance ordering The sound fingerprint
P U
P = Atlanticus dorsalis
29
In brute-force search, we search left to right, top to bottom Is there a better order? How can we find a good candidate earlier? The earlier we find a good candidate, the information gain is higher, the more instances we can prune. But how do we resolve this “chicken and egg” paradox? Speedup intuition
best search order for CK
proxy for CK…. (next slide)
30
0.4 0.5 0.6 0.7 0.8 0.9 1 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 CK Euclidean
31
100 200 300 400 500 600 700 800 900
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Brute-force search terminates Entropy pruning search terminates All
search terminates
Running time: 7.5 hours Running time: 1.9 hours Running time: 1.1 hours
Information gain
32
33
0.2 0.4 0.6
Distance value Recognition Threshold
For more visual understanding, please take a look at the video on YouTube
0.2 0.4 0.6
Recognition Threshold Distance value
34
35 species-level problem genus-level problem default rate fingerprint default rate fingerprint 10 species 0.10 0.70 0.70 0.93 20 species 0.05 0.44 0.60 0.77
Benchmark of insect classification: The data consists of twenty species of insects, eight of which are Gryllidae (crickets) and twelve of which are Tettigoniidae (katydids) Problems: either a twenty-species level problem, or two-class genus level problem. Method: predicted the testing exemplars class label (as the pink
recording the fingerprint that produced the minimum value as the exemplar’s nearest neighbor (the pink fingerprint ).
…
20 sound fingerprints Testing dataset
Insect classification accuracy
36
1500 0.2 0.4 0.6 0.8 1
Brute-force search terminates Entropy pruning search terminates Search with reordering
Number of calls to the CK distance measure Information Gain To test the speedup of our toy problem shown on the left, we reran these experiments with a more realistically-sized universe U, containing 200-objects from other insects, birds, trains, helicopters, etc. The result is shown on above.
Same dataset for mislabel check Left: assume all labeled correctly Right: two instances in positive class mislabeled
Distance value
0.1 0.2 0.3 0.4 0.5 0.6 0.7
The distance ordering The sound fingerprint
P U
P = Atlanticus dorsalis
37
The sound fingerprint
P U
Distance value The distance ordering
0.1 0.2 0.3 0.4 0.5 0.6 0.7
Same dataset for mislabel check Top: assume all labeled correctly Bottom: two instances in positive class mislabeled
38
0.5 1
Distance value
P U
0.5 1
Distance value
P U
Recognition Threshold Recognition Threshold 200 400 600 800 1000 1200 1400 1600 1800
No noise Noise: +5dB Noise: -5dB Noise: -4dB
0.2 0.4 0.6
Distance value
0.2 0.4 0.6
Distance value
0.2 0.4 0.6
Distance value
0.2 0.4 0.6
Distance value
Recognition Threshold Recognition Threshold Recognition Threshold Recognition Threshold
40
species-level problem genus-level problem
default rate fingerprint default rate fingerprint 10 species 0.10 0.70 0.70 0.93 20 species 0.05 0.44 0.60 0.77
Twenty insect species datasets: Eight of them are Grylliadae (crickets) Twelve of them are Tettigoniidae (katydids)
41
0.2 0.4 0.6
CK Distance value
Recognition Threshold
42
43
44