Age and Gender Recognition from Speech Patterns Based on Supervised - - PowerPoint PPT Presentation

age and gender recognition from speech patterns based on
SMART_READER_LITE
LIVE PREVIEW

Age and Gender Recognition from Speech Patterns Based on Supervised - - PowerPoint PPT Presentation

Age and Gender Recognition from Speech Patterns Based on Supervised NonNegative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme July 2011 Outline Introduction and Motivations Age and Gender Recognition Corpora


slide-1
SLIDE 1

Age and Gender Recognition from Speech Patterns Based on Supervised NonNegative Matrix Factorization

July 2011

  • Mohamad Hasan Bahari

Hugo Van hamme

slide-2
SLIDE 2

Outline

Introduction and Motivations Age and Gender Recognition Corpora Supervised Nonnegative Matrix Factorization

  • Supervised Nonnegative Matrix Factorization

Proposed Method Results Conclusions and Future Researches

slide-3
SLIDE 3

Introduction

  • Confirming the identity of individuals
  • Biometric Characteristics
  • Fingerprint
  • Face
  • Iris
  • Iris
  • Hand Geometry
  • Ear Shape
  • +
  • Choosing a characteristic
  • Availability
  • Reliability
slide-4
SLIDE 4

Motivation

  • In many real world cases, only speech patterns are available

(kidnapping, threatening calls, +)

  • Speech patterns can include many interesting information
  • Gender
  • Age
  • Age
  • Dialect (original or previous regions)
  • Membership of a particular social group
  • +
slide-5
SLIDE 5

Goal

  • To extract different physical and psychological characteristics of

the speaker from his/her voice patterns (). Physical: Psychological:

  • Physical:

1.

Gender

2.

Age

3.

Accent

4.

+ Psychological:

1.

Anxiousness

2.

Stress

3.

Confidence

4.

+

slide-6
SLIDE 6

Age and Gender Recognition

slide-7
SLIDE 7
  • !

" !

Age and Gender Recognition

  • #

$

  • % &

'

  • ( &

&

) ) ) ) ) $

slide-8
SLIDE 8

* + , - ./

Age and Gender Recognition

+0000000 00000

  • [1] W. S. Brown, R. J. Morris, H. Hollien, and E. Howell, Journal of Voice, vol. 5, pp. 310–315, 1991.

0000000 00000 10000000 00000

slide-9
SLIDE 9
  • &

'

  • Age and Gender Recognition
slide-10
SLIDE 10

Age and Gender Recognition

  • )

2 ' -2

  • 3 -2
  • 4
  • 4
  • + 5

&

& ' )

&

( &

slide-11
SLIDE 11

Corpora

  • 555 speakers from the Nbest evaluation corpus [1]

The corpus contains live and read commentaries, news, interviews, and

reports broadcast in Belgium

Different age groups and genders

  • Age

1835 1835 3645 3645 4681 4681 Number of Speakers 85 53 160 41 191 25

[1] D. A. Van Leeuwen, J. Kessens, E. Sanders, and H. van den Heuvel, In proc. Interspeech, pp. 25712574, 2009.

slide-12
SLIDE 12

SNMF

  • 67 & 5 -6

./

  • ! 6 !6 ./

[1] H. Van hamme, In proc. Interspeech, Australia, pp. 25542557, 2008.

slide-13
SLIDE 13

Problem Statement: Given a training dataset: Str= {(x1, y1), . . ., (xn, yn), . . . , (xN, yN)} xn is a vector of observed characteristics for the data item yn denotes a label vector which represents the class that xn belongs to

SNMF

  • Goal:

Approximation of a classifier function (g), such that ŷ=g(xtst) is as close as possible to the true label. xtst is an unseen observation

slide-14
SLIDE 14

SNMF

SNMF in Training Phase:

First step: Second step:

[ ] [ ]

  • =

=

     ≈       ≈       =

  • Extended KullbeckLeibler divergence:

Multiplicative updating formula:

  • (

)

( ) ( ) ( )

∑ ∑

+ − +       =

  • ρ
  • [

] [ ] [ ] [ ] [ ] [ ] [ ] [ ]

  • ρ

+ ← ←

× ×

slide-15
SLIDE 15

SNMF

SNMF in Testing Phase:

First step: Second step:

( )

  • 8

= =

  • =

=

  • 8

Extended KullbeckLeibler divergence: Multiplicative updating formula:

  • (

)

( ) ( ) ( )

∑ ∑

+ − +         =

  • ρ
  • [

] [ ] [ ] [ ]

  • ρ

+ ←

×

slide-16
SLIDE 16

Proposed Method

  • +

"

! '

#

  • #

9

slide-17
SLIDE 17

Proposed Method

  • *: !
  • 5
  • 5
  • +
  • +

Speech Signal Feature selection Feature Vectors +.

slide-18
SLIDE 18

Proposed Method

  • +

Speaker Independent Model Speaker Adaptation Method

  • + 2 #;<#= , "><" 7

&7

  • ' & ' 7

' ' '7

Model Method

slide-19
SLIDE 19

Proposed Method

3.

Supervector making procedure

Gaussian Mixture Model (GMM) of each speaker adapted HMMs is: Three type of supervectors:

  • )

)

  • = ∑

=

  • Three type of supervectors:
  • 1. Means
  • 2. Variances
  • 3. Weights

Weights supervectors: The result of this step is 555 supervectors for each of 555 speakers

[ ] [ ]

  • λ

λ λ χ λ

  • =

=

slide-20
SLIDE 20

Proposed Method

#

  • 9
slide-21
SLIDE 21

Results

Evaluation Methodology

  • 5fold crossvalidation (five independent run)
  • In each of five run:
  • Training set is speech data of 444 speakers
  • Testing set is speech data of 111 speakers
  • Testing set is speech data of 111 speakers
  • Database
  • Database

Run 1 Run 2

slide-22
SLIDE 22

Results

Gender recognition is 96%. relative confusion matrix

  • +(0
  • ?

" =" 9> = @ = ? = << =#

  • =9<

=

  • =@

= ## = #< =

  • =

9# = # < =

  • Age group recognition
  • =

9# = # < = ! =" = ; = <@ = ! = => > > @

(0 6 ?0 ?0 0

  • !0 !0

1

  • =

; < "# # + "

slide-23
SLIDE 23

Conclusions and Future Researches

(

  • + 7 !6
  • ! ,

" * 67% ( # , ;@A

  • #

, ;@A 9 +

B

  • +
  • 3 ,
slide-24
SLIDE 24