visual language
play

Visual Language Perception from Videos MOHIT GUPTA ADVISOR: - PowerPoint PPT Presentation

Visual Language Perception from Videos MOHIT GUPTA ADVISOR: AMITABHA MUKERJEE Introduction and Motivation Humans process and store what they perceive in a highly abstracted, condensed format For e.g. Computers on the other


  1. Visual Language Perception from Videos MOHIT GUPTA ADVISOR: AMITABHA MUKERJEE

  2. Introduction and Motivation  Human’s process and store what they perceive in a highly abstracted, condensed format  For e.g. …  Computers on the other hand are much less efficient in this department  Possibilities if computers could condense perception  Significant dip in information size (less memory requirement)  ‘show me who is the villain in this movie and when does he enter’ will become a valid question for a computer  Absolutely new; no similar work has been done

  3. Methodology  Scene Segmentation  Using change in histogram method  Heuristic for start or end of speech-silence boundary  Strong heuristic for change in speaker  Sound Segmentation  Classifying voice, silence and miscellaneous (music, audience laughing etc.)  Threshold-ing energy of signal, zero-crossing rate, pitch detection by Yin algorithm  Diarization of voices (separating voices of different speakers)  Voice features like MFCCs are most significant for speaker recognition  Associating faces with speech  Detect faces in frames containing speech using Haar-based features  Tag face with the speech stream for a speaker based on majority-first approach

  4. Methodology  Sound Segmentation  Classifying voice, silence and miscellaneous (music, audience laughing etc.)  Threshold-ing energy of signal, zero-crossing rate, pitch detection

  5. Methodology  Sound Segmentation  Classifying voice, silence and miscellaneous (music, audience laughing etc.)  Threshold-ing energy of signal, zero-crossing rate, pitch detection

  6. Methodology  Associating faces with speech  Detect faces in frames containing speech  Using acquired speech boundaries and detecting faces in each segment

  7. Subtitles and speech  The pitch plot also separates words with high recall but low precision  Subtitle alignment in small-error domain successfully achieved by maximizing the common pitch-subtitle boundaries

  8. Applications  Surround Sound Effects  Using the knowledge of who is speaking in a frame and the location of his face  Background sounds separated from speech and attenuated to get more vocals  Information abstraction and retrieval  Efficiency in memory usage  Model voice, face and scene; use text to produce speech and video on the fly  Asking the computer to seek the video to the instance the villain is first seen

  9. References [1] Tran, Luan, et al. "Pitch reduced patterns relative to photolithography features." U.S. Patent No. 7,253,118. 7 Aug. 2007. [2] Swe, Ei Mon Mon, and Moe Pwint. "An Efficient Approach for Classification of Speech and Music." Advances in Multimedia Information Processing-PCM 2008 . Springer Berlin Heidelberg, 2008. 50-60. [3] Cotton, Courtenay. "A Three-Feature Speech/Music Classification System." (2006). [4] Shah, Sejal, and Archana Bhise. "Fast Speaker Recognition using Efficient Feature Extraction Technique." International Journal of Computer Science 2. [5] Hossen, Abdulnasir, and Said Al-Rawahi. "A Text – Independent Speaker Identification System Based on the Zak Transform." Signal Processing an International Journal (SPIJ) 4.2: 68. [6] Zhao, Xianyu, et al. "SVM-based speaker verification by location in the space of reference speakers." Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on . Vol. 4. IEEE, 2007.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend