K-shot Learning of Acoustic Context Ivan Bocharov, Tjalling - - PowerPoint PPT Presentation

k shot learning of acoustic context
SMART_READER_LITE
LIVE PREVIEW

K-shot Learning of Acoustic Context Ivan Bocharov, Tjalling - - PowerPoint PPT Presentation

K-shot Learning of Acoustic Context Ivan Bocharov, Tjalling Tjalkens and Bert de Vries Eindhoven University of Technology, the Netherlands Email bert.de.vries@tue.nl NIPS-2017 ML4AUDIO workshop, 8-Dec 2017 Use Case / Problem Statement


slide-1
SLIDE 1

K-shot Learning of Acoustic Context

NIPS-2017 ML4AUDIO workshop, 8-Dec 2017

Ivan Bocharov, Tjalling Tjalkens and Bert de Vries

Eindhoven University of Technology, the Netherlands Email bert.de.vries@tue.nl

slide-2
SLIDE 2

Use Case / Problem Statement

slide-3
SLIDE 3

ACOUSTIC MODEL SPECIFICATION – Define a generative probabilistic model for acoustic signals that contains scenes as latent states. TRAINING

  • 1. “Representation training”: Unsupervised offline training
  • n a large database of acoustic signals across many

scenes

  • 2. Train new scenes: Continue with supervised training on

an online recorded small set of scene-labeled waveforms CLASSIFICATION – Goal: assign future streaming acoustic data to the correct (or similar) scenes

5

Approach: probabilistic modeling

slide-4
SLIDE 4

𝑑 𝑑 = 1, … , 𝐷

scenes (“classes”) ∫

𝑦𝑒1 𝑦1 𝑒1

𝑦𝑒1+𝑒2 𝑦𝑒1+1 𝑒2

𝑦 𝑗=1

𝐿

𝑒𝑗

𝑦 𝑗=1

𝐿−1 𝑒𝑗+1

𝑒𝐿

features

(60 MFCC per 40 ms 20 ms hop)

𝑨1 𝑨1 𝑨𝐿 𝑨0 𝜄

segments (s)

(Mixture of) Hidden Semi-Markov Models

HSMM

samples small, hierarchically structured, with duration modeling

slide-5
SLIDE 5

dynamics: parameters: class prior: generative model:

slide-6
SLIDE 6
  • Collected by Tampere

University of Technology

  • 15 acoustic scenes
  • ~40 min. of audio per class

Data Preparation

  • Data set 1: draw one example

(30secs) from each of 11 randomly chosen scenes

  • Data set 2: draw one example

from remaining (4) classes.

  • Classify: test on remaining

examples of data set 2

9

Data set: TUT Acoustic Scenes 2016

slide-7
SLIDE 7

𝑑 𝑑 = 1, … , 𝐷

scenes ∫

𝑦1 𝑒1

𝑒2

𝑦 𝑗=1

𝐿

𝑒𝑗

𝑒𝐿

features (MFCC)

𝑨1 𝑨1 𝑨𝐿 𝑨0 𝜄

segments

Step 1: Train Duration Models

HSMM

samples

slide-8
SLIDE 8

𝑑 𝑑 = 1, … , 𝐷

scenes ∫

𝑦1 𝑒1

𝑒2

𝑦 𝑗=1

𝐿

𝑒𝑗

𝑒𝐿

features (MFCC)

𝑨1 𝑨1 𝑨𝐿 𝑨0 𝜄

segments

Step 1: Train Duration Models

HSMM

samples

slide-9
SLIDE 9

14

Duration distributions (initialization Pois(.) )

slide-10
SLIDE 10

15

Duration distributions (after training)

slide-11
SLIDE 11

𝑑 𝑑 = 1, … , 𝐷

scenes ∫

𝑦1 𝑒1

𝑒2

𝑦 𝑗=1

𝐿

𝑒𝑗

𝑒𝐿

features (MFCC)

𝑨1 𝑨1 𝑨𝐿 𝑨0 𝜄

segments

Step 2: One-shot Training

HSMM

samples

slide-12
SLIDE 12

𝑑 𝑑 = 1, … , 𝐷

scenes ∫

𝑦1 𝑒1

𝑒2

𝑦 𝑗=1

𝐿

𝑒𝑗

𝑒𝐿

features (MFCC)

𝑨1 𝑨1 𝑨𝐿 𝑨0 𝜄

segments

Step 2: One-shot Training

HSMM

samples

slide-13
SLIDE 13

𝑑 𝑑 = 1, … , 𝐷

scenes ∫

𝑦1 𝑒1

𝑒2

𝑦 𝑗=1

𝐿

𝑒𝑗

𝑒𝐿

features (MFCC)

𝑨1 𝑨1 𝑨𝐿 𝑨0 𝜄

segments

Classification

HSMM

samples

?

slide-14
SLIDE 14

21

Results

slide-15
SLIDE 15
  • Ongoing research on in-situ one-shot learning of a personalized

acoustic scene classifier

  • Use case is hearing aids personalization, but also applicable to

urban monitoring, elderly care, etc.

  • Generative modeling approach, inspired by one-shot learning work
  • f (a.o.) Brendan Lake et al (2014), Matthew Johnson et al. (2013)
  • An HSMM-based probabilistic classifier shows promising

performance on one-shot learning task compared to 1NN-DTW.

  • Specifically, learned priors for segment duration models parameters

helps the classifier to recognize new classes from a single example.

  • Future work includes more thorough analysis and exploration of

competing models.

22

Summary and Future Plans

slide-16
SLIDE 16

Thank you

  • Matthew Johnson et al. for Package Pyhsmm

(@ https://github.com/mattjj/pyhsmm)

Acknowledgements