Voice Activity Detection Voice Activity Detection Speaker - - PowerPoint PPT Presentation

voice activity detection
SMART_READER_LITE
LIVE PREVIEW

Voice Activity Detection Voice Activity Detection Speaker - - PowerPoint PPT Presentation

Voice Activity Detection Introduction Voice Activity Detection Voice Activity Detection Speaker Recognition Feature Extraction Algorithms Victor Lenoir Threshold VAD Gaussian Mixture Model VAD LRDE Experiments Laboratoire de Recherche


slide-1
SLIDE 1

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

Voice Activity Detection

Victor Lenoir

LRDE Laboratoire de Recherche et D´ eveloppement de l’EPITA

July 3, 2011 http://lrde.epita.fr/

1 / 31 Victor Lenoir

slide-2
SLIDE 2

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

Outline

Introduction Voice Activity Detection Speaker Recognition Feature Extraction Algorithms Threshold VAD Gaussian Mixture Model VAD Experiments Results Discuss

2 / 31 Victor Lenoir

slide-3
SLIDE 3

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

Outline

Introduction Voice Activity Detection Speaker Recognition Feature Extraction Algorithms Experiments

3 / 31 Victor Lenoir

slide-4
SLIDE 4

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

Voice Activity Detection

In audio processing, we often want to remove noise and silence. Voice activity Detection is used to detect human speech in an audio recording.

4 / 31 Victor Lenoir

slide-5
SLIDE 5

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

Applications

Voice Activity detection has many applications. Such as :

◮ Speech encoding (GSM) ◮ Audio conferencing ◮ Speech/Speaker recognition 5 / 31 Victor Lenoir

slide-6
SLIDE 6

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

Speaker Recognition

6 / 31 Victor Lenoir

slide-7
SLIDE 7

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

Outline

Introduction Feature Extraction Algorithms Experiments

7 / 31 Victor Lenoir

slide-8
SLIDE 8

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

Feature Extraction : Short-term Analysis

8 / 31 Victor Lenoir

slide-9
SLIDE 9

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

Features

◮ Energy ◮ Zero-Crossing Rate ◮ Spectral Flatness Measure ◮ Mel-Frequency Cepstral Coefficients (MFCCs) 9 / 31 Victor Lenoir

slide-10
SLIDE 10

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

Outline

Introduction Feature Extraction Algorithms Threshold VAD Gaussian Mixture Model VAD Experiments

10 / 31 Victor Lenoir

slide-11
SLIDE 11

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

Threshold VAD : Structure

11 / 31 Victor Lenoir

slide-12
SLIDE 12

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

VAD Threshold : Features Extraction

New Feature for each frames : ∆(X) = E(X) δ + SFM(X) ∗ Z(X) Where:

◮ E(X) Energy ◮ Z(X) Number of zero crossing ◮ SFM(X) Spectral Flatness Measure ◮ δ = cte to prevent division by 0 12 / 31 Victor Lenoir

slide-13
SLIDE 13

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

VAD Threshold : Initialization

13 / 31 Victor Lenoir

slide-14
SLIDE 14

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

VAD Threshold : Initialization

Speech Threshold = 45

14 / 31 Victor Lenoir

slide-15
SLIDE 15

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

VAD Threshold : Initialization

Speech Threshold = 45 Noise Threshold = 5

15 / 31 Victor Lenoir

slide-16
SLIDE 16

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

VAD Threshold : Learning

Final Threshold = 45+5

2

= 25

16 / 31 Victor Lenoir

slide-17
SLIDE 17

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

VAD Threshold : Learning

Final Threshold = 45+5

2

= 25

17 / 31 Victor Lenoir

slide-18
SLIDE 18

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

VAD Threshold : Learning

Final Threshold = 45+5

2

= 25

18 / 31 Victor Lenoir

slide-19
SLIDE 19

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

VAD Threshold : Learning

Final Threshold = 45+5

2

= 25 New Speech Threshold = 35+28+27+26+45

5

= 32 New Noise Threshold = 5+21+6+16+6

5

= 10 New Final Threshold = 32+10

2

= 21

18 / 31 Victor Lenoir

slide-20
SLIDE 20

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

Gaussian Mixture Model

Gaussian Mixture Model (GMM) is a probabilistic model used to represent a probability distribution. It’s defined as a weighted sum of gaussian components.

19 / 31 Victor Lenoir

slide-21
SLIDE 21

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

Voice Activity Detection using Gaussian Mixture Models

20 / 31 Victor Lenoir

slide-22
SLIDE 22

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

VAD GMM : Features Extraction

We extract the same features as the previous algorithm. Plus we extract MFCCs. And we compute a new feature (the same as the previous algorithm): ∆(X) = E(X) δ + SFM(X) ∗ Z(X) Where:

◮ E(X) Energy ◮ Z(X) Number of zero crossing ◮ SFM(X) Spectral Flatness Measure ◮ δ = cte to prevent division by 0 21 / 31 Victor Lenoir

slide-23
SLIDE 23

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

VAD GMM : Initialization

22 / 31 Victor Lenoir

slide-24
SLIDE 24

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

VAD GMM : Initialization

22 / 31 Victor Lenoir

slide-25
SLIDE 25

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

VAD GMM : Initialization

SpeechGMM = Expectation Maximization (E)

22 / 31 Victor Lenoir

slide-26
SLIDE 26

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

VAD GMM : Initialization

SpeechGMM = Expectation Maximization (E) NoiseGMM = Expectation Maximization (F)

22 / 31 Victor Lenoir

slide-27
SLIDE 27

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

VAD GMM : Learning and Segmentation

23 / 31 Victor Lenoir

slide-28
SLIDE 28

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

Outline

Introduction Feature Extraction Algorithms Experiments Results Discuss

24 / 31 Victor Lenoir

slide-29
SLIDE 29

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

Compare

VAD_Ref VAD_Threshold VAD_GMM

25 / 31 Victor Lenoir

slide-30
SLIDE 30

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

Context

◮ Speaker Recognition System : LRDE NIST-SRE

2010

◮ Audio Files : NIST SRE 2010

  • Telephone
  • Microphone
  • Interview

26 / 31 Victor Lenoir

slide-31
SLIDE 31

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

Benchmark - Male

0.1 0.5 1 2 5 10 20 30 40 50 80 0.001 0.01 0.1 0.5 1 2 5 10 20 30 40 50 Miss probability (in %) False Alarms probability (in %) male_ref 9.900 0.178 male_threshold 9.900 0.264 male_gmm 9.900 0.193

27 / 31 Victor Lenoir

slide-32
SLIDE 32

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

Benchmark - Female

0.1 0.5 1 2 5 10 20 30 40 50 80 0.001 0.01 0.1 0.5 1 2 5 10 20 30 40 50 Miss probability (in %) False Alarms probability (in %) female_ref 9.900 0.259 female_threshold 9.900 0.292 female_gmm 9.900 0.281

28 / 31 Victor Lenoir

slide-33
SLIDE 33

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

Discuss

Advantages

◮ Robust to noise

Disadvantages

◮ Not on-the-fly ◮ Make the assumption that 10% of the audio file is

speech/noise

Solution

◮ Big model on great number of audio files ◮ Don’t care 29 / 31 Victor Lenoir

slide-34
SLIDE 34

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

Conclusion and Perspectives

The work done

◮ Implementation and tests of working Voice Activity

Detection algorithms

The future work

◮ Improve the algorithms to have better results ◮ Improve the GMM algorithm with Hidden Markov

Model (HMM)

30 / 31 Victor Lenoir

slide-35
SLIDE 35

Voice Activity Detection

Introduction

Voice Activity Detection Speaker Recognition

Feature Extraction Algorithms

Threshold VAD Gaussian Mixture Model VAD

Experiments

Results Discuss

Questions?

31 / 31 Victor Lenoir