dcu at the ntcir 11 spokenquery doc task
play

DCU at the NTCIR-11 SpokenQuery&Doc Task David N. Racca, Gareth - PowerPoint PPT Presentation

DCU at the NTCIR-11 SpokenQuery&Doc Task David N. Racca, Gareth J.F. Jones CNGL Centre for Global Intelligent Content School of Computing, Dublin City University Dublin, Ireland Overview We participated in the slide-group SQ-SCR.


  1. DCU at the NTCIR-11 SpokenQuery&Doc Task David N. Racca, Gareth J.F. Jones CNGL Centre for Global Intelligent Content School of Computing, Dublin City University Dublin, Ireland

  2. Overview ― We participated in the slide-group SQ-SCR. ― General idea: ● Augment text-retrieval methods with prosodic features: pitch (F0), loudness, and duration. ● Compute an acoustic score for each term. ● Promote the rank of segments containing acoustically prominent terms. 3/19

  3. Motivation ― Prosody : ● Rhythm, stress, intonation, duration, loudness. ― Shown useful in many speech processing tasks: ● Emotions, discourse structure, speech acts, speaker ID, topic segmentation. ― Prominent speech units stand-out from their context. ― Information status: old vs new information. 4/19

  4. Related Work ― Crestani [1] : possible correlation between acoustic stress and TF-IDF scores (English). ― Chen et al [2] : signal amplitude and duration in a spoken document retrieval (SDR) task (Mandarin). ― Guinaudeau [3] : F0 and RMS energy in a topic tracking task (French). ― Racca et al [4] : F0, loudness, and duration in SCR (English). 5/19

  5. Data Pre-processing — 1-best WORD match , unmatchAMLM , and manual transcripts. Provided by organisers %M Lectures ChaSen Julius 10-best ASR WAV or Capitalisation hypothesis LVCSR T ranscripts "%m %M %y" %m per IPU Enriched %y %M or %m ASR T ranscripts Manual Manual Forced IPUs Annotation VAD T ranscripts Annotated Alignment Removal WAV T ranscripts Enriched Manual Lecture Normalisation T ranscripts Normalised F0 v norm = v raw − min v Queries OpenSMILE F0 Loudness Loudness WAV every 10ms max v − min v every 10ms 6/19

  6. Prosodic Features — Raw duration, lecture-normalised F0 and loudness. — Example: Duration d = 2.36 s − 1.02 s = 1.34 s Lecture Normalisation v norm = v raw − min v max v − min v start end ~1.02 ~ 2.36 Loudness Max ~ 1.16 Loudness k )= 1.16 Raw max ( l i, j F0 Max ~ 280.44 Hz k )= 0.37 Normalised max ( l i, j Pitch (F0) k )= 280.44 Hz Raw max ( f0 i, j k )= 0.58 Normalised max ( f0 i, j tf-idf 7/19

  7. Prosodic Features — F0, loudness, and duration for the term “ i ” term in segment “ j ” . k ) } { max ( f0 i , j f0 ( i , j )= max k k ) } { max ( l i, j l ( i , j )= max k k } { d i, j d ( i , j )= max k k ) } − min k ) } { max ( f0 i , j k { min ( f0 i , j f0 range ( i , j )= max k 8/19

  8. Acoustic Score — We experimented with six definitions for the acoustic score of term “ i ” in segment “ j ”. ac ( i , j )= { f0 ( i , j ) Pitch [P] l ( i, j ) Loudness [L] d ( i , j ) Duration [Dur] f0 range ( i , j ) Pitch Range [Pr] l ( i, j ) . f0 ( i, j ) [LP] l ( i, j ) . f0 range ( i, j ) [LPr] 9/19

  9. Indexing Slide-group segments IPUs with with Prosody Prosody Terrier Segment Enriched Indexing Index T ranscripts IPU Grouping ― Slide-group segments indexed using Terrier IR Framework. ― Index stores F0, loudness and duration for each term occurrence along with text statistics. 10/19

  10. Retrieval ― Probabilistic model with BM25 weighting: M rel ( q , s j )= ∑ w ( i , j ) i w ( i , j ) ― Three definitions for were explored: w ( i , j )= { idf ( i ,C )[α . tf ( i , j )+( 1 −α) ac ( i , j )] LI θ ir . tf ( i, j ) . idf ( i ,C )+θ ac . ac ( i, j ) idf ( i ,C )= log ( + 1 ) G N θ ir +θ ac n i k 1 . tf i, j tf ( i, j ) . idf ( i ,C ) TF_IDF tf ( i, j )= tf i, j + k 1 ( 1 − b + b dl j avdl ) 11/19

  11. Parameter T uning ― SpokenDoc-2 passage retrieval: 120 text queries α ac ( i , j ) θ ir θ ac Lecture Transcript w ( i , j ) uMAP pwMAP fMAP LI LPr 0.7 .1369 .0976 .1005 LI Pr 0.7 .1369 .0951 .0995 Manual G LP 1 1 .1326 .0960 .0989 TF-IDF .1270 .0950 .0972 LI LPr 0.5 .0842 .0508 .0524 0.3 LI Dur .0819 .0498 .0521 Match G Pr 1 1 .0786 .0473 .0499 LI Pr 0.7 .0778 .0490 .0501 TF-IDF .0682 .0477 .0486 G P 3 1 .0288 .0208 .0131 0.5 LI LP .0278 .0210 .0135 UnmatchAMLM LI LPr 0.2 .0271 .0205 .0132 LI P 0.9 .0227 .0206 .0129 TF-IDF .0222 .0203 .0128 12/19

  12. Results: SpokenQuery&Doc Manual Transcripts MAP LI-Pr-0.7 LI-LPr-0.7 TF_IDF 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 Manual Match UnmatchAMLM Spoken Query Types 13/19

  13. Results: SpokenQuery&Doc Match Transcripts MAP LI-LPr-0.5 LI-Pr-0.7 LI-Dur-0.3 TF_IDF 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 Manual Match UnmatchAMLM Spoken Query Types 14/19

  14. Results: SpokenQuery&Doc UnmatchAMLM Transcripts MAP LI-LPr-0.2 LI-LPr-0.5 LI-P-0.9 TF_IDF 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 Manual Match UnmatchAMLM Spoken Query Types 15/19

  15. Results: SpokenQuery&Doc 2 relevant segments Query 1: Prosodic-based vs TF_IDF TF_IDF Prosodic-based Manual Unmatch Unmatch Match Spoken Query Type Manual Match Unmatch Match Manual Manual Unmatch Match 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 AveP 16/19

  16. Conclusions & Further Work ― Continued exploring if prosodic prominence can be used to improve retrieval effectiveness. ― No significant differences between prosodic and text based runs (t student's test ~ 95% conf. level). ― Transcript quality affects retrieval effectiveness. ― Prosodic-based models may be useful for some queries/target segments: • Future work: predict when this happens. 17/19

  17. References — [1] Crestani. Towards the use of prosodic information for spoken document retrieval. SIGIR'01, 2001. — [2] Chen, et al. Improved spoken document retrieval by exploring extra acoustic and linguistic cues. INTERSPEECH'01, 2001. — [3] Guinaudeau and Hirschberg. Accounting for prosodic information to improve ASR-based topic tracking for TV broadcast news. INTERSPEECH'11, 2011. — [4] Racca et al. DCU search runs at MediaEval 2014 Search and Hyperlinking . MediaEval 2014 Multimedia Benchmark Workshop, 2014 18/19

  18. Questions? 19/19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend