MUSCLE WP5 Showcase: M. Perakakis E. Sanchez-Soto Real-Time - - PowerPoint PPT Presentation

▶

Mar 17, 2024 222 likes •262 views

Groups and Researchers Involved TSI-TUC A. Potamianos (showcase leader) MUSCLE WP5 Showcase: M. Perakakis E. Sanchez-Soto Real-Time Audio-Visual ICCS-NTUA Automatic Speech Recognition P. Maragos (group leader)

SLIDE 1

Real-Time Audio-Visual Automatic Speech Recognition Demonstrator

TSI-TUC (leader) ICCS-NTUA INRIA-TEXMEX

MUSCLE WP5 Showcase:

April 2007

MUSCLE MUSCLE

ICCS-NTUA

TSI-TUC

A. Potamianos (showcase leader)
M. Perakakis
E. Sanchez-Soto

ICCS-NTUA

P. Maragos (group leader)
G. Papandreou (visual/fusion)
A. Katsamanis (audio/fusion)
V. Pitsikalis (audio/fusion)

INRIA-TEXMEX:

P. Gros (group leader)
G. Gravier (fusion)

Groups and Researchers Involved

April 2007

MUSCLE MUSCLE

ICCS-NTUA

Audio-Visual Automatic Speech Recognition

Audio Video Recognized Speech

Audio-only Automatic Speech Recognition (ASR) degrades under noise Use video for lip-reading to boost ASR performance

April 2007

MUSCLE MUSCLE

ICCS-NTUA

Showcase Main Points

Shortcomings of current AV-ASR systems

Research-level set-ups videos shot under carefully controlled conditions processing is performed off-line

Goal: build a proof-of-concept practically deployable laptop-based AV-ASR prototype which:

uses low-end consumer microphone and camera to capture the speaker performs visual/audio feature extraction, as well as speech recognition on the laptop in real-time is robust to failures of a single modality, such as visual

cclusion of the speaker's face

SLIDE 2

April 2007

MUSCLE MUSCLE

ICCS-NTUA

Tasks

T1: Visual Front-end

Face detector (DONE) Face tracking and feature extraction (DONE) Optimization for real-time performance (IN PROGRESS)

T2: Audio-Visual Recognition Model and Fusion

Advanced baseline audio front-end (DONE) HMM-based recognition back-end (DONE) Model training on audio-visual corpora (DONE) Adaptive audio-visual fusion (IN PROGRESS)

T3: System Integration

Laptop-based system (IN PROGRESS) Usable for live AV-ASR demonstrations (IN PROGRESS)

Project duration: December 2006 – June 2007

April 2007

MUSCLE MUSCLE

ICCS-NTUA

Visual Front-End

+ p1 + p2

=

λ +

=

Analyze face expression and appearance Real-time feature extraction algorithms Excellent performance in AV-ASR experiments

April 2007

MUSCLE MUSCLE

ICCS-NTUA

Feature Fusion

Goal:

Adaptive fusion heterogeneous information streams

Stream weights improve recognition performance Test alternative techniques for stream weight computation

Minimum classification error Feature measurement uncertainty compensation Previous work by all three partners

Stream weight adaptation

Depending on auditory SNR Either static or fully dynamic

April 2007

MUSCLE MUSCLE

ICCS-NTUA

SLIDE 3

April 2007

MUSCLE MUSCLE

ICCS-NTUA

Audio-Only ASR Live Demo

Real-Time continuous digits ASR Model Training on the WSJ database

April 2007

MUSCLE MUSCLE

ICCS-NTUA

Tasks

T1: Visual Front-end

Face detector (DONE) Face tracking and feature extraction (DONE) Optimization for real-time performance (IN PROGRESS)

T2: Audio-Visual Recognition Model and Fusion

Advanced baseline audio front-end (DONE) HMM-based recognition back-end (DONE) Model training on audio-visual corpora (DONE) Adaptive audio-visual fusion (IN PROGRESS)

Real-Time Audio-Visual Automatic Speech Recognition Demonstrator

TSI-TUC (leader) ICCS-NTUA INRIA-TEXMEX

MUSCLE WP5 Showcase:

MUSCLE MUSCLE

TSI-TUC

ICCS-NTUA

INRIA-TEXMEX:

Groups and Researchers Involved

MUSCLE MUSCLE

Audio-Visual Automatic Speech Recognition

Audio Video Recognized Speech

Audio-only Automatic Speech Recognition (ASR) degrades under noise Use video for lip-reading to boost ASR performance

MUSCLE MUSCLE

Showcase Main Points

Shortcomings of current AV-ASR systems

Research-level set-ups videos shot under carefully controlled conditions processing is performed off-line

Goal: build a proof-of-concept practically deployable laptop-based AV-ASR prototype which:

uses low-end consumer microphone and camera to capture the speaker performs visual/audio feature extraction, as well as speech recognition on the laptop in real-time is robust to failures of a single modality, such as visual

MUSCLE MUSCLE

Tasks

T1: Visual Front-end

Face detector (DONE) Face tracking and feature extraction (DONE) Optimization for real-time performance (IN PROGRESS)

T2: Audio-Visual Recognition Model and Fusion

Advanced baseline audio front-end (DONE) HMM-based recognition back-end (DONE) Model training on audio-visual corpora (DONE) Adaptive audio-visual fusion (IN PROGRESS)

T3: System Integration

Laptop-based system (IN PROGRESS) Usable for live AV-ASR demonstrations (IN PROGRESS)

Project duration: December 2006 – June 2007

MUSCLE MUSCLE

Visual Front-End

+ p1 + p2

=

λ +

λ +

=

Analyze face expression and appearance Real-time feature extraction algorithms Excellent performance in AV-ASR experiments

MUSCLE MUSCLE

Feature Fusion

Goal:

Adaptive fusion heterogeneous information streams

Stream weights improve recognition performance Test alternative techniques for stream weight computation

Minimum classification error Feature measurement uncertainty compensation Previous work by all three partners

Stream weight adaptation

Depending on auditory SNR Either static or fully dynamic

MUSCLE MUSCLE

MUSCLE MUSCLE

Audio-Only ASR Live Demo

Real-Time continuous digits ASR Model Training on the WSJ database

MUSCLE MUSCLE

Tasks

T1: Visual Front-end

Face detector (DONE) Face tracking and feature extraction (DONE) Optimization for real-time performance (IN PROGRESS)

T2: Audio-Visual Recognition Model and Fusion

Advanced baseline audio front-end (DONE) HMM-based recognition back-end (DONE) Model training on audio-visual corpora (DONE) Adaptive audio-visual fusion (IN PROGRESS)

T3: System Integration

Laptop-based system (IN PROGRESS) Usable for live AV-ASR demonstrations (IN PROGRESS)

Project duration: December 2006 – June 2007

MUSCLE MUSCLE

Audio-Visual Speech Recognition Demo

AV A