narrative theme navigation for sitcoms supported by fan
play

Narrative Theme Navigation for Sitcoms Supported by Fan-generated - PowerPoint PPT Presentation

Narrative Theme Navigation for Sitcoms Supported by Fan-generated Scripts Gerald Friedland, Luke Gottlieb, Adam Janin International Computer Science Institute (ICSI) Presented by: Katya Gonina What? Novel method to generate indexing information


  1. Narrative Theme Navigation for Sitcoms Supported by Fan-generated Scripts Gerald Friedland, Luke Gottlieb, Adam Janin International Computer Science Institute (ICSI) Presented by: Katya Gonina

  2. What? Novel method to generate indexing information for the navigation of TV content

  3. Why?  Lots of different ways to watch videos  DVD, Blu-ray  On-demand  Internet  Lots of videos out there!  Need better ways to navigate content  Show a particular scene  Show where a favorite actor talks  Support random seek into videos

  4. Example: Sitcoms  Specifically “Seinfeld”  Strict set of rules  Every scene transition is marked by music  Every punchline marked by artificial laughter  Video: http://www.youtube.com/watch?v=PaPxSsK6ZQA

  5. Outline Original Joke-O-Mat (2009) 1. System setup - Evaluation - Limitations - Enhanced version (2010) 2. System setup - Evaluation - Future Work 3.

  6. Outline Original Joke-O-Mat (2009) 1. System setup - Evaluation - Limitations - Enhanced version (2010) 2. System setup - Evaluation - Future Work 3.

  7. Joke-O-Mat  Original system (2008-2009)  Ability to navigate basic narrative elements:  Scenes  Punchlines  Dialog segments  Per-actor filter  Ability to skip certain parts  “Surf” the episode "Using Artistic Markers and Speaker Identification for Narrative-Theme Navigation of Seinfeld Episodes” G. Friedland, L. Gottlieb, and A. Janin Proceedings of the 11th IEEE International Symposium on Multimedia (ISM2009), San Diego, California, pp. 511-516

  8. Joke-O-Mat  Two main elements: Pre-processing step 1. Online video browser: 2.

  9. Joke-O-Mat  Two main elements: Pre-processing and analysis step 1.

  10. Acoustic Event & Speaker Identification  Goal: Train GMMs for different audio events  Jerry, Kramer, Elaine, George  Male & female supporting actor  Laughter  Music  Non-speech (i.e. other noises)  Use 1-minute audio sample  Compute 19-dim MFCCs  Train 20-component GMMs

  11. Audio Segmentation  Given the trained GMMs  2.5 sec * 10ms = 250 frames  Compute likelihood for each set of features for each GMM  Use majority vote to classify to either speakers or laughter/ music/non-speech

  12. Narrative Theme Analysis  Transforms acoustic event segmentation and speaker detection into narrative theme segments  Rule-based system:  Dialog = single contiguous speech segment  Punchline = dialog + laughter  Top-5 punchlines = 5 punchlines followed by the longest laughter  Scene = segment of at least 10 sec between two music events

  13. Narrative Theme Analysis  Creates icons for the GUI  Sitcom rules: actor has to be shown once a certain speaking time is exceeded  Median frame of the longest speech segment for each actor  Could use a visual approach here..  Use median frame for other events (scene, punchlines, dialog)

  14. Online Video Browser  Shows video  Allows for play/pause, seeking to random positions  Navigational panel allows to browse directly to:  Scene  Punchline  Top-5 punchlines  Dialog element  Select/deselect actors  http://www.icsi.berkeley.edu/jokeomat/HD/auto/ index.html

  15. Evaluation Phase Performance For 25min Episode Training 30% real-time 2.7min Classification 10% real-time 2.5min Narrative Theme 10% real-time 2.5min Analysis Total 7.7min  Diarization Error Rate (DER) = 46%  5% per class  Winner of the ACM Multimedia Grand Challenge 2009

  16. Limitations of the original Joke-O-Mat  Requires manual training of speaker models  Requires 60 seconds of training data for each speaker  Cannot support actors with minor roles  Does not take into account what was said

  17. Outline Original Joke-O-Mat (2009) 1. System setup - Evaluation - Limitations - Enhanced version (2010) 2. System setup - Evaluation - Future Work 3.

  18. Extended System  Enhanced Joke-O-Mat (2010)  + Speech Recognition  Keyword search  Automatic alignment of speaker ID and ASR with:  Fan-generated scripts  Closed captions  Significantly reduces manual intervention

  19. New Joke-O-Mat System

  20. New Joke-O-Mat System

  21. Context-Augmentation  Producing transcripts can be costly  Luckily we have the Internet!  Scripts and closed captions produced by fans

  22. Fan-generated data  Fan-sourced scripts  Tend to be very accurate  However, don’t contain any time information  Closed captions  Contain time information  However, do not contain speaker attribution  Less accurate, often intentionally altered  Normalize and merge them together…

  23. Fan-generated data  Normalize the scripts and the closed captions  Then, use minimum edit distance to align two sources  Start & End words in script = Start & End words in caption  Use timing from the closed caption, speaker from the script  If one speaker = single-single speaker segment  If multiple speakers = multi-speaker segment (37.3%)

  24. Forced Alignment + = Transcript Audio Alignment  Generate detailed timing information for each word  Perform all steps of a speech recognizer on the audio  But, instead of using a language model, use only the transcript sequence of words  Also does speaker adaptation over segments  Will be more accurate on speaker-homogeneous segments

  25. Forced Alignment  Run forced alignment on each segment  For 10 episodes tested – 90% of the segments aligned at the first step  Start time & end time of each word  Speaker attribution

  26. Forced Alignment  Pool segments for each speaker and train speaker models  + train a garbage model  On audio that falls between the segments  Assume that contain only laughter, music, and other non-speech

  27. Forced Alignment  For the failed single-speaker segments:  Still use segment start and end time  Don’t have a way to index exact temporal location of each word  For each failed multi-speaker segment:  Generate a HMM alternating:  Speaker states  Garbage states

  28. Forced Alignment  For each time step, advance an arc and collect probability  Ex: if move across “Patrice” arc, invoke “Patrice” speaker model at that time step  Segmentation = most probable path through the HMM  Garbage model allows for arbitrary noise between speakers  Minimum duration for each speaker  In reality, system was not sensitive the the duration

  29. Forced Alignment  Multi-speaker segments => many single-speaker segments  Run the forced alignment with ASR again

  30. Music & Laughter Segmentation  Laughter decoded using Shout speech/nonspeech decoder  Music models are trained separately (same as the original Joke-O-Mat)

  31. Putting it all together http://www.icsi.berkeley.edu/jokeomat/HD/auto/ index.html

  32. Evaluation  Compare to expert-annotated ground truth DER 1.  False alarms: closed captions spanning multiple dialog segments  Missed speech: truncation of words in forced alignment

  33. Evaluation  Compare to expert-annotated ground truth 2. User Study  25 participants  Randomly showed expert- and fan-annotated episodes  Asked to state preference

  34. Outline Original Joke-O-Mat (2009) 1. System setup - Evaluation - Limitations - Enhanced version (2010) 2. System setup - Evaluation - Future Work 3.

  35. Limitations & Future Work  Laughter and scene transition music – manually trained  Require scripts and closed captions  Available from show producers  Failed single-speaker segments – how to handle?  Retrain speaker models  HMM for the whole episode  Look at other genres (dramas, soap operas, lectures?)  New rules  Add visual data

  36. Thanks!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend