DCU at the NTCIR-11 SpokenQuery&Doc Task David N. Racca, Gareth - - PowerPoint PPT Presentation

dcu at the ntcir 11 spokenquery doc task
SMART_READER_LITE
LIVE PREVIEW

DCU at the NTCIR-11 SpokenQuery&Doc Task David N. Racca, Gareth - - PowerPoint PPT Presentation

DCU at the NTCIR-11 SpokenQuery&Doc Task David N. Racca, Gareth J.F. Jones CNGL Centre for Global Intelligent Content School of Computing, Dublin City University Dublin, Ireland Overview We participated in the slide-group SQ-SCR.


slide-1
SLIDE 1

DCU at the NTCIR-11 SpokenQuery&Doc Task

David N. Racca, Gareth J.F. Jones

CNGL Centre for Global Intelligent Content School of Computing, Dublin City University Dublin, Ireland

slide-2
SLIDE 2

3/19

Overview

― We participated in the slide-group SQ-SCR. ― General idea:

  • Augment text-retrieval methods with prosodic

features: pitch (F0), loudness, and duration.

  • Compute an acoustic score for each term.
  • Promote the rank of segments containing

acoustically prominent terms.

slide-3
SLIDE 3

4/19

Motivation

― Prosody:

  • Rhythm, stress, intonation, duration, loudness.

― Shown useful in many speech processing tasks:

  • Emotions, discourse structure, speech acts, speaker

ID, topic segmentation.

― Prominent speech units stand-out from their context. ― Information status: old vs new information.

slide-4
SLIDE 4

5/19

Related Work

― Crestani [1]: possible correlation between

acoustic stress and TF-IDF scores (English).

― Chen et al [2]: signal amplitude and duration in a

spoken document retrieval (SDR) task (Mandarin).

― Guinaudeau [3]: F0 and RMS energy in a topic

tracking task (French).

― Racca et al [4]: F0, loudness, and duration in

SCR (English).

slide-5
SLIDE 5

6/19

Data Pre-processing

Lectures WAV

Manual Annotated T ranscripts

VAD OpenSMILE Julius LVCSR

ChaSen

"%m %M %y" Annotation Removal Forced Alignment

Queries WAV

— 1-best WORD match, unmatchAMLM, and manual transcripts.

F0 Loudness every 10ms Normalised F0 Loudness every 10ms

IPUs WAV

Manual T ranscripts ASR T ranscripts Enriched Manual T ranscripts Enriched ASR T ranscripts

Capitalisation

10-best hypothesis per IPU

%M

  • r

%m %M or %m %y Lecture Normalisation

vnorm=vraw−minv maxv−minv

Provided by organisers

slide-6
SLIDE 6

7/19

Prosodic Features

— Raw duration, lecture-normalised F0 and loudness. — Example:

F0

max(f0i, j

k )=280.44 Hz Raw

max (f0i, j

k )=0.58 Normalised

Pitch (F0)

Loudness

max (li, j

k )=1.16 Raw

max(li, j

k )=0.37 Normalised

Loudness

tf-idf

Max ~ 280.44 Hz Max ~ 1.16 end ~ 2.36 start ~1.02

d=2.36 s−1.02 s=1.34 s

Duration

Lecture Normalisation

vnorm=vraw−minv maxv−minv

slide-7
SLIDE 7

8/19

Prosodic Features

—F0, loudness, and duration for the term “i” term in segment “j”. f0(i , j)=max

k

{max (f0i , j

k )}

l(i , j)=max

k

{max(l i, j

k )}

f0range(i , j)=max

k

{max(f0i , j

k )}−min k {min(f0i , j k )}

d(i , j)=max

k

{di, j

k }

slide-8
SLIDE 8

9/19

Acoustic Score

— We experimented with six definitions for the acoustic

score of term “i” in segment “j”.

ac(i , j)={ f0(i , j) Pitch [P] l (i, j) Loudness [L] d(i , j) Duration [Dur] f0range(i , j) Pitch Range [Pr] l (i, j).f0(i, j) [LP] l (i, j).f0range(i, j) [LPr]

slide-9
SLIDE 9

10/19

Indexing

IPUs with Prosody Slide-group segments with Prosody

Segment Index

Terrier Indexing

Enriched T ranscripts

― Slide-group segments indexed using Terrier IR

Framework.

― Index stores F0, loudness and duration for each

term occurrence along with text statistics.

IPU Grouping

slide-10
SLIDE 10

11/19

Retrieval

tf(i, j)=

  • k1. tfi, j

tfi, j+k1(1−b+bdl j avdl ) idf(i ,C)=log( N ni +1)

rel(q ,s j)=∑

i M

w(i , j)

― Probabilistic model with BM25 weighting: ― Three definitions for were explored:

w(i , j)={ idf(i ,C)[α .tf(i , j)+(1−α)ac(i , j)] LI θir. tf(i, j).idf(i ,C)+θac.ac(i, j) θir+θac G tf(i, j).idf(i ,C) TF_IDF

w(i , j)

slide-11
SLIDE 11

12/19

Parameter T uning

Lecture Transcript uMAP pwMAP fMAP

Manual

LI LPr 0.7 .1369 .0976 .1005 LI Pr 0.7 .1369 .0951 .0995

G

LP 1 1 .1326 .0960 .0989

TF-IDF

.1270 .0950 .0972

Match

LI LPr 0.5 .0842 .0508 .0524 LI Dur

0.3

.0819 .0498 .0521

G Pr

1 1 .0786 .0473 .0499 LI Pr 0.7 .0778 .0490 .0501 TF-IDF .0682 .0477 .0486

UnmatchAMLM

G P 3 1 .0288 .0208 .0131 LI LP

0.5

.0278 .0210 .0135 LI LPr 0.2 .0271 .0205 .0132 LI P 0.9 .0227 .0206 .0129 TF-IDF .0222 .0203 .0128

θir

θac

― SpokenDoc-2 passage retrieval: 120 text queries

ac(i , j) w(i , j)

α

slide-12
SLIDE 12

13/19

Results: SpokenQuery&Doc

Manual Match UnmatchAMLM 0.02 0.04 0.06 0.08 0.1 0.12 0.14

Manual Transcripts

LI-Pr-0.7 LI-LPr-0.7 TF_IDF Spoken Query Types MAP

slide-13
SLIDE 13

14/19

Results: SpokenQuery&Doc

Manual Match UnmatchAMLM 0.02 0.04 0.06 0.08 0.1 0.12 0.14

Match Transcripts

LI-LPr-0.5 LI-Pr-0.7 LI-Dur-0.3 TF_IDF Spoken Query Types MAP

slide-14
SLIDE 14

15/19

Results: SpokenQuery&Doc

Manual Match UnmatchAMLM 0.02 0.04 0.06 0.08 0.1 0.12 0.14

UnmatchAMLM Transcripts

LI-LPr-0.2 LI-LPr-0.5 LI-P-0.9 TF_IDF Spoken Query Types MAP

slide-15
SLIDE 15

16/19

Results: SpokenQuery&Doc

Match Unmatch Manual Match Unmatch Manual Match Unmatch Manual Manual Match Unmatch 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Query 1: Prosodic-based vs TF_IDF

TF_IDF Prosodic-based AveP

Spoken Query Type

2 relevant segments

slide-16
SLIDE 16

17/19

Conclusions & Further Work

― Continued exploring if prosodic prominence can

be used to improve retrieval effectiveness.

― No significant differences between prosodic and

text based runs (t student's test ~ 95% conf. level).

― Transcript quality affects retrieval effectiveness. ― Prosodic-based models may be useful for some

queries/target segments:

  • Future work: predict when this happens.
slide-17
SLIDE 17

18/19

References

—[1] Crestani. Towards the use of prosodic information for spoken document retrieval. SIGIR'01, 2001. —[2] Chen, et al. Improved spoken document retrieval by exploring extra acoustic and linguistic cues. INTERSPEECH'01, 2001. —[3] Guinaudeau and Hirschberg. Accounting for prosodic information to improve ASR-based topic tracking for TV broadcast news. INTERSPEECH'11, 2011. —[4] Racca et al. DCU search runs at MediaEval 2014 Search and

  • Hyperlinking. MediaEval 2014 Multimedia Benchmark

Workshop, 2014

slide-18
SLIDE 18

19/19

Questions?