SPEAKER, ENVIR ONMENT AND CHANNEL CHANGE DETECTION AND - PDF document

SPEAKER, ENVIR ONMENT AND CHANNEL CHANGE DETECTION AND CLUSTERING VIA THE BA YESIAN INF ORMA TION CRITERION Sc ott Shaobing Chen & P.S. Gop alakrish nan IBM T.J. Watson R ese ar ch Center email: schen@watson.ibm.c om V arious segmen tation algorithms ha v e b een prop osed in ABSTRA CT the literation [2, 4, 6, 8 , 10, 14 ], whic h can b e categorized In this pap er, w e are in terested in detecting c hanges in as follo ws: sp eak er iden tit y , en vironmen tal condition and c hannel con- Deco der-guided segmen tation. The input audio stream dition; w e call this the problem of ac oustic change dete c- � can b e �rst deco ded; then the desired segmen ts can b e tion . The input audio stream can b e mo deled as a Gauss- pro duced b y cutting the input at the silence lo cations ian pro cess in the cepstral space. W e presen t a maxim um generated from the deco der [14, 8]. Other informations lik eliho o d approac h to detect turns of a Gaussian pro cess; the decision of a turn is based on the from the deco der, suc h as the gender information, could Bayesian Informa- also b e utilized in the segmen tation [8]. tion Criterion (BIC), a mo del selection criterion w ell-kno wn in the statistics literature. The BIC criterion can also b e Mo del-based segmen tation. [2] prop osed to build dif- � applied as a termination criterion in hierarc hical metho ds feren t mo dels, e.g. Gaussian mixture mo dels, for a for clustering of audio segmen ts: t w o no des can b e merged �xed set of acoustic classes, suc h as telephone sp eec h, only if the merging increases the BIC v alue. Our exp eri- pure m usic, etc, from a training corpus; the incoming men ts on the Hub4 1996 and 1997 ev aluation data sho w that audio stream can b e classi�ed b y maxim um lik eliho o d our segmen tation algorithm can successfully detect acoustic selection o v er a sliding windo w; segmen tation can b e c hanges; our clustering algorithm can pro duce clusters with made at the lo cations where there is a c hange in the high purit y , leading to impro v emen ts in accuracy through acoustic class. unsup ervised adaptation as m uc h as the ideal clustering b y Metric-based segmen tation. [4, 6, 10] prop osed to seg- the true sp eak er iden tities. � men t the audio stream at maxima of the distances b et w een neigh b oring windo ws placed at ev ery sample; distances suc h as the KL distance, the generalized lik e- 1. INTR ODUCTION liho o d ratio distance ha v e b een in v estigated. Automatic segmen tation of an audio stream and automatic In our opinion, these metho ds are not v ery successful in clustering of audio segmen ts according to sp eak er iden ti- detection the acoustic c hanges presen t in the data. The ties, en vironmen tal conditions and c hannel conditions ha v e deco der-guided segmen tation only places b oundaries at si- receiv ed quite a bit of atten tion recen tly [4, 8, 6, 10 ]. F or lence lo cations, whic h in general has no direct connection example, in the task of automatic transcription of broadcast with the acoustic c hanges in the data. Both the mo del- news [3], the data con tains clean sp eec h, telephone sp eec h, based segmen tation and the metric-based segmen tation rely m usic segmen ts, sp eec h corrupted b y m usic or noise, etc. on thresholding of measuremen ts whic h lac k stabilit y and There are no explicit cues for the c hanges in sp eak er iden- robustness. Besides, the mo del-based segmen tation do es tit y , en vironmen t condition and c hannel condition. Also not generalize to unseen acoustic conditions. the same sp eak er ma y app ear m ultiple times in the data. Clustering of audio segmen ts is often p erformed via hier- In order to transcrib e the sp eec h con ten t in audio streams arc hical clustering [10, 8]. First, a distance matrix is com- of this nature, puted; the common practice is to mo del eac h audio segmen t w e w ould lik e to se gment the audio stream in to homo- � as one Gaussian in the cepstral space and to use the KL geneous regions according to sp eak er iden tit y , en viron- distance or the generalized lik eliho o d ratio as the distance men tal condition and c hannel condition so that regions measure [6 ]. Then b ottom-up hierarc hical clustering can b e of di�eren t nature can b e handled di�eren tly: for ex- p erformed to generate a clustering tree. It is often di�cult ample, regions of pure m usic and noise can b e rejected; to determine the n um b er of clusters. One can heuristicall y also, one migh t design a separate recognition system pre-determine the n um b er of clusters or the minim um size for telephone sp eec h. of eac h cluster; accordingly , one can go do wn the tree to w e w ould lik e to cluster sp eec h segmen ts in to homoge- � obtain desired clustering [14]. Another heuristic solution neous clusters according to sp eak er iden tit y , en vironmen t and c hannel; unsup ervised adaptation can then is to threshold the distance measures during the hierarc hical pro cess; the thresholding lev el is tuned on a training b e p erformed on eac h cluster. [8, 10] sho w ed that a go o d clustering pro cedure can greatly impro v e the p er- set [10]. Jin et al. [7] shed some ligh t on automatically formance of unsup ervised adaptation suc h as MLLR. c ho osing a clustering solution.

SPEAKER, ENVIR ONMENT AND CHANNEL CHANGE DETECTION AND - PDF document

SPEAKER, ENVIR ONMENT AND CHANNEL CHANGE DETECTION AND CLUSTERING VIA THE BA YESIAN INF ORMA TION CRITERION Sc ott Shaobing Chen & P.S. Gop alakrish nan IBM T.J. Watson R ese ar ch Center email:

ZFS file system in deskt op l a b envir onment Dr . Al exei Kot el nikov & Tr ent Ha ndl

CHANNEL ALLOCATION Channel Language Translation Channel Translation Language Channel 1 German

ANNUAL ACCOUNTS PRESS CONFERENCE CHANNEL ALLOCATION. Channel Language Translation Channel

Channel Assignment and Channel Hopping in IEEE 802.11 Operating Channels for 802.11b Europe

Ap Appr proaches oaches in in As Assessing sessing En Envir vironme onment nt Andr drew

O ver view of t he W il dl if e & Nat ur al Envir onment (Scot l and) Act 2011 Pr of .

En Envir ironm onment ent For Production URL : : htt ttps://www /www.sh .shgse sewb

Su Sustai taining ning th the e En Envir ironment onment Gai ains ns Pos ost t COVID

Res esilienc iliency in in a a changing hanging en envir ironment onment Cons onsider

Channel design Channel coverage Intensive Selective Exclusive Channel

ANNUAL ACCOUNTS PRESS CONFERENCE LANGUAGE CHANNELS. Channel Language Channel (translation)

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

1 Simultaneous interpretation EN channel 1 FR channel 2 ES channel 3 DE channel 4 2 The Future

Speech Processing 15-492/18-492 Speaker ID Who is speaking? Speaker ID, Speaker Recognition

Dual-Channel Acoustic Detection of X. Niu & J. van Santen Nasalization Statuses

Formal Modeling in Cognitive Science 1 Noisy Channel Model Channel Capacity Lecture 29: Noisy

The Presentation of Cultural Heritage Models in Epoch Sven Havemann 1 , Volker Settgast 1 , Dieter

IFS INTERNATIONAL FERTILISER SOCIETY TECHNICAL CONFERENCE 2015 London 22 - 23 June 2015 40

English Learner Advisory Committee Third Meeting Chaparral High School January 28, 2020 1

ADJUSTED OPERATING PROFIT CONTINUED TO INCREASE Half-year Financial Review January-June 2018 24

Gary Rea MD PhD Gary Rea MD PhD Medical Director Medical Director OSU Comprehensive Spine

Lumbar HNP & Radiculopathy When to operate? John G. Heller, MD Baur Professor of Orthopaedic

THE BACK THE SPINAL CORD THE SPINAL CORD The structures in the vertebral canal: the spinal

The Physiotherapy Expert Witness in Clinical Negligence &

SPEAKER, ENVIR ONMENT AND CHANNEL CHANGE DETECTION AND - PDF document

SPEAKER, ENVIR ONMENT AND CHANNEL CHANGE DETECTION AND CLUSTERING VIA THE BA YESIAN INF ORMA TION CRITERION Sc ott Shaobing Chen & P.S. Gop alakrish nan IBM T.J. Watson R ese ar ch Center email:

ZFS file system in deskt op l a b envir onment Dr . Al exei Kot el nikov &amp; Tr ent Ha ndl

CHANNEL ALLOCATION Channel Language Translation Channel Translation Language Channel 1 German

ANNUAL ACCOUNTS PRESS CONFERENCE CHANNEL ALLOCATION. Channel Language Translation Channel

Channel Assignment and Channel Hopping in IEEE 802.11 Operating Channels for 802.11b Europe

Ap Appr proaches oaches in in As Assessing sessing En Envir vironme onment nt Andr drew

O ver view of t he W il dl if e &amp; Nat ur al Envir onment (Scot l and) Act 2011 Pr of .

En Envir ironm onment ent For Production URL : : htt ttps://www /www.sh .shgse sewb

Su Sustai taining ning th the e En Envir ironment onment Gai ains ns Pos ost t COVID

Res esilienc iliency in in a a changing hanging en envir ironment onment Cons onsider

Channel design Channel coverage Intensive Selective Exclusive Channel

ANNUAL ACCOUNTS PRESS CONFERENCE LANGUAGE CHANNELS. Channel Language Channel (translation)

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

1 Simultaneous interpretation EN channel 1 FR channel 2 ES channel 3 DE channel 4 2 The Future

Speech Processing 15-492/18-492 Speaker ID Who is speaking? Speaker ID, Speaker Recognition

Dual-Channel Acoustic Detection of X. Niu &amp; J. van Santen Nasalization Statuses

Formal Modeling in Cognitive Science 1 Noisy Channel Model Channel Capacity Lecture 29: Noisy

The Presentation of Cultural Heritage Models in Epoch Sven Havemann 1 , Volker Settgast 1 , Dieter

IFS INTERNATIONAL FERTILISER SOCIETY TECHNICAL CONFERENCE 2015 London 22 - 23 June 2015 40

English Learner Advisory Committee Third Meeting Chaparral High School January 28, 2020 1

ADJUSTED OPERATING PROFIT CONTINUED TO INCREASE Half-year Financial Review January-June 2018 24

Gary Rea MD PhD Gary Rea MD PhD Medical Director Medical Director OSU Comprehensive Spine

Lumbar HNP &amp; Radiculopathy When to operate? John G. Heller, MD Baur Professor of Orthopaedic

THE BACK THE SPINAL CORD THE SPINAL CORD The structures in the vertebral canal: the spinal

The Physiotherapy Expert Witness in Clinical Negligence &amp;

ZFS file system in deskt op l a b envir onment Dr . Al exei Kot el nikov & Tr ent Ha ndl

O ver view of t he W il dl if e & Nat ur al Envir onment (Scot l and) Act 2011 Pr of .

Dual-Channel Acoustic Detection of X. Niu & J. van Santen Nasalization Statuses

Lumbar HNP & Radiculopathy When to operate? John G. Heller, MD Baur Professor of Orthopaedic

The Physiotherapy Expert Witness in Clinical Negligence &