Course Outline Introduction and the MPEG standards Introduction to - - PDF document

course outline
SMART_READER_LITE
LIVE PREVIEW

Course Outline Introduction and the MPEG standards Introduction to - - PDF document

E E E E 810 8108 Multime Multimedia ia Proce Processing ssing & & Commu Communicat nications Course Instructor: Pro f. L ing Guan Department of Electrical & Computer Engineering Room 315, ENG Building Tel:


slide-1
SLIDE 1

1

E E 810 E E 8108

Multime Multimedia ia Proce Processing ssing & & Commu Communicat nications

Course Instructor: Pro f. L ing Guan

Department of Electrical & Computer Engineering Room 315, ENG Building Tel: (416)979-5000 ext 6072 Email: lguan@ee.ryerson.ca

Participating Instructor:

  • Dr. L

ei Gao

1 9/8/2018

Course Outline

– Introduction and the MPEG standards – Introduction to statistical pattern recognition & neural networks – Feature Coding and Multimodal information fusion

  • Why and How?
  • Data/Feature level
  • Interaction level
  • Score/Decision level

– Media indexing and retrieval

  • Past, present and future
  • Content‐based retrieval (CBR)
  • Metasearch engines

2 9/8/2018

slide-2
SLIDE 2

2

Course Outline (2)

– Human‐signature recognition

  • Overview
  • Human body movement analysis and recognition
  • Human emotion recognition
  • Human hand gesture recognition
  • Multimedia in immersive environment

– Introduction to multimedia in immersive

environment

  • Virtual reality (VR)
  • Augmented reality (AR)

3 9/8/2018

Lectures and Assessment

  • Lecture time
  • 3 hours/week from week 1‐ week 10 (including a one week break)
  • Assessment
  • Project

60%

  • Presentation

10%

  • Report

50%

  • In Class Test (Test 1) 20%
  • Final Test (Test 2)

20%

  • Project
  • Choose your own topic
  • Speak to me if you cannot find a suitable topic
  • Submit your topic and a one page proposal before the Reading Week
  • Presentation time: Week 13 class time
  • Report due: to be determined
  • Test 1: week 7 ‐ the week after reading week (1 hour, in classroom)
  • Test 2: week 12 (1 hour, in classroom)

4 9/8/2018

slide-3
SLIDE 3

3

Teaching Material

  • Lecture notes will be available at the course website.

Check your EE8108 D2L

  • References
  • Multimedia Image and Video Processing, Edited by L. Guan, Y.

He and S.‐Y. Kung, CRC Press 2012, 2nd edition

  • IEEE Transactions on Multimedia
  • ACM Multimedia
  • Other IEEE/ACM Transactions (talk to me if you need more

information)

  • Proceedings IEEE Int. Conf. on Multimedia and Expo (ICME)
  • Proceedings ACM Multimedia Conference

5 9/8/2018

Project Requirement

 You are required to work on a technical topic, either chosen by yourself in consultation with the instructor, or provided by the instructor. You are encouraged to choose your own topic.  The topic of your project could be one of the following:

– comparison of two or more methods you found in the literature – further development/analysis of an existing method/idea – novel approach/technique, analysis or algorithm

 A project of literature review in nature is not acceptable

6 9/8/2018

slide-4
SLIDE 4

4

Project Requirement (2)

Electronically submit your project proposal on Tuesday, October 2 to lguan@ee.ryerosn.ca. You may use any programming language; MatLab, C/C++,

  • etc. Your choice.

You are required to demonstrate that your system and/or algorithm works as described in your report/presentation. Ideally, you demonstrate at the presentation time. You are encouraged to work in a team of two students.

7 9/8/2018

A Note on Academic Integrity

  • Please be advised to get yourself familiarized with

Ryerson’s Regulation on Academic Integrity by

– Reading Ryerson SENATE POLICY 60: ACADEMIC INTEGRITY: Pages 1 – 4 and acting accordingly.

— http://www.ryerson.ca/senate/policies/pol60.pdf

– Attending the mandatory departmental graduate seminar series which is offered every semester, covering research methods, research writing, library, ethics and integrity.

8 9/8/2018

slide-5
SLIDE 5

5

Introduction and the MPEG Standards

9 9/8/2018

What is Multimedia?

 What is multimedia?

  • A brief history of multimedia available at

http://people.ucalgary.ca/~edtech/688/hist.htm

 What is multimedia processing & communications (MMPC)?  What impact has signal processing brought to multimedia technology?  Where are the multimedia technologies taking us?  …?

10 9/8/2018

slide-6
SLIDE 6

6

Are These the Answers?

  • Multimedia is a domain of multi‐facets
  • Easy to define each facet individually, but challenging to

consider them as a combined identity

  • Coherent integration of media contents obtained from

different sources/sensors

  • Humans are natural and generic multimedia processing

machines (human intelligence) Can we teach computers/machines to do the same via machine learning and, more general, artificial intelligence?

11 9/8/2018

What Are We Sure about MMPC?

 It offers a forum for interaction among researchers in several media processing areas  MMPC opens up opportunities for information processing that falls in‐between the domains of traditional areas, such as speech, audio, music, text, graphics, image and video  MMPC brings together the signal processing community with computer, communication and systems engineers

  • IEEE Conference on Multimedia & Expo
  • ACM Multimedia Conference
  • Various IEEE and ACM Transactions and Journals
  • …..

12 9/8/2018

slide-7
SLIDE 7

7

Current Trend in MMPC

 Single media vs. multimedia: about 50% of the research in multimedia is still concerned with single media  Due to the maturity of standards, coding somehow dictates the direction of research in multimedia  Multiple media vs. multimedia  Real multimedia  Multimedia in immersive environment (VR/AR)  Intelligent multimedia plays foundational role for big data analytics  So plenty of room for new research, and your participation and contribution to this important area are very welcome

13 9/8/2018

What can be categorized as MMPC?

  • Media coding and compression
  • Media compression
  • Compressed domain processing
  • Joint audio‐video coding and processing
  • Multimedia databases
  • Indexing, retrieval, archiving, and management
  • Authoring, sharing and editing
  • Content recommendation
  • Digital library
  • Multimodal information fusion
  • Fusibility
  • Fusion levels
  • Fusion of methodology
  • Human‐machine interaction and perception
  • Content recognition/analysis/synthesis
  • Emotion/intention and attention recognition
  • Analysis and recognition of human gestures and activities
  • Perceptual quality and human factors

14 9/8/2018

slide-8
SLIDE 8

8

What can be categorized as MMPC (2)?

  • Multimedia communications
  • Transport protocols
  • QoS control
  • Media streaming
  • Error concealment and loss recovery
  • Rate control and hierarchical coding
  • Multimedia cloud computing
  • Multimedia in immersive environment
  • Media security and watermarking
  • Multimedia applications
  • Standards and related issues
  • ITU‐T H‐series Standards for a/v communications
  • MPEG Standards
  • JPEG Standards
  • Convergence of ITU‐T H‐series and MPEG –> H.264
  • MHEG, MJEPG, HTML, VRML and more

15 9/8/2018

Why Standards?

 Instead of hiding and protecting your inventions, you publicly share your ideas with your colleagues  Standards encourage collaborations of experts to jointly work on a particular topic  Due to increased commercial interest in video communications, the need for image/video compression standards arose  The exercise in standardization proves that it can provide a powerful vehicle to promote new technology  Competition is very intense

16 9/8/2018

slide-9
SLIDE 9

9

The MPEG Standards

 Coding & multimedia standards developed and managed by Motion Picture Experts Group (MPEG)

 MPEG‐1: VCD  MPEG‐2: DVD, HDTV  MPEG‐3:???  MPEG‐4: Content‐based video coding  MPEG‐7: Multimedia indexing and retrieval  MPEG‐21:???  MPEG‐A/B/C/D/E/V/M/U/H/DASH  FTV Standard For more information on MPEG standards: http://en.wikipedia.org/wiki/Moving_Picture_Experts_Group

17 9/8/2018

The MPEG‐1 Standard

 Released in 1992  A standard for coded representation of

  • Moving pictures
  • Associated audio
  • And their combination

When used for storage and retrieval on digital media with bit rate of up to1.5 Mbit/s  Typical application – video CD (VCD)

18 9/8/2018

slide-10
SLIDE 10

10

The MPEG‐2 Standard

 Released in 1994, still one of the most popular standards  A standard to provide video quality not lower than NTSC/PAL with bit rates target between 2‐10 Mbit/s  Applications

  • Digital cable TV distribution
  • Networked database service via ATM
  • Digital video tape recorder (VTR)
  • Satellite and terrestrial digital broadcasting distribution

 It also supports HDTV applications, and so pre‐emptied MPEG‐3 standard  Lost to JPEG‐2000 (MJEPG) in coding competition for digital cinema in 2002

19 9/8/2018

The MPEG‐4 Standard

 First released in 1998, and targeted at content‐based multimedia applications and low bit‐rate video coding.  Algorithms and tools for coding and flexible representation of audio/video to meet the challenges of multimedia applications  It addresses the needs for

  • Universal accessibility and robustness in error‐prone environment
  • High interactive functionality
  • Coding of natural and synthetic data (image/graphics), setting the stage for AR
  • Scalable coding
  • High compression efficiency

 Bit rates:

  • PSTN – 5‐64 kbit/s
  • TV/film – 4 Mbit/s

 Ironically, the objective of low bit‐rate video coding was later accomplished by ITU‐T H.264, the convergence of H.263 and MPEG‐2.

20 9/8/2018

slide-11
SLIDE 11

11

The MPEG‐7 Standard

 First released in 2001  Official name:

Multimedia Content Description Interface

 Objective:

  • To allow efficient search for multimedia content using standardized

descriptors

 The main research issues:

  • Optimum search engine
  • (Content‐based) feature analysis & query design

 Unfortunately, the challenges were under‐estimated…

21 9/8/2018

 MPEG-7 Working Group focuses on description interchange (the normative components -shaded blocks in the XM)  The rest is left for open competition from industry and research organizations

MPEG-7 Architecture

AV Decoder Feature Extraction Coding Scheme Decoding Scheme Media Data AV File D/ DS MPEG-7 File Matching and Filtering

MPEG-7 Experimental Model (XM) Architecture

slide-12
SLIDE 12

12

MPEG‐7 Research Issues

 Optimal search (retrieval) engine in Internet/wireless multimedia communications  Feature extraction and query design in information retrieval, especially image/video retrieval. In general, the success of MPEG‐7 is limited, but some specialized versions seems to work in well defined applications.

Feature Extraction

Standard Description Search Engine Scope of MPEG-7

The MPEG‐21 Standard

 MPEG for the 21st Century!  Aim at defining a normative open framework for multimedia delivery and consumption for use by all the players in the delivery and consumption chain.  Provide content creators, producers, distributors and service providers with equal opportunities in the MPEG‐ 21 enabled open market.  Benefit the content consumer by providing them access to a large variety of content in an interoperable manner  Never got close to this bold objective! Abolished in 2009.

http://www.chiariglione.org/mpeg/standards/mpeg‐21/mpeg‐21.htm

slide-13
SLIDE 13

13

MPEG‐A

  • Multimedia Application Formats (MAFs)

– Facilitate the swift development of multimedia applications and services. – Standardize MAFs for multimedia products and software. – Stimulate the increased use of MPEG technology through interoperability of different media types.

25 9/8/2018

MPEG‐B

  • Systems technologies
  • MPEG‐B Part 1: Binary MPEG format for XML

– Standard defining a generic binary format for encoding XML documents – Relies on schema knowledge between encoder and decoder in order for high compression efficiency. – Provides fragmentation mechanisms for ensuring transmission flexibility.

26 9/8/2018

slide-14
SLIDE 14

14

MPEG‐E

  • Multimedia middleware (M3W)
  • Interfaces of audio/video broadcast decoding,

processing, and rendering.

  • Interfaces of support API (application program

interface): interaction with remote services, resource management, component download, faulty management, integrity management.

27 9/8/2018

MPEG‐V

  • Media context and control
  • Provides architecture and specifies associated

information between

– virtual worlds (digital content, gaming simulation), and the real world (sensors, vision, rendering, robotics).

  • A well‐defined connection between the virtual and

the real world for better design methodology and tools.

28 9/8/2018

slide-15
SLIDE 15

15

MPEG‐DASH

  • Suite of standards for efficient streaming of multimedia,

using existing internet infrastructure.

– Servers, CDNs, as well as proxies, caches

  • Support on‐demand and live streaming.
  • Provide MPEG‐4 file format and MPEG‐2 Transport

Streams.

  • Control streaming sessions with DASH client.
  • Enable dynamic ad‐insertion and on‐demand content.

29 9/8/2018

Other MPEG Standards

  • MPEG‐C: a suite of video standards that do not fall in other well‐

established MPEG standards.

  • MPEG‐D: a suite of standards for Audio technologies that do not fall

in other MPEG standards

  • MPEG‐M: a suit of standards to enable design and implementation
  • f media‐handling value chains.
  • MPEG‐U: provides a general purpose technology with innovative

functionality that enable its use in heterogeneous scenarios such as broadcast, mobile, home network and web domains:

  • MPEG‐H: Suite of standards for heterogeneous environment

delivery of audio‐visual information compressed with high efficiency.

30 9/8/2018

slide-16
SLIDE 16

16

FTV Standard

FTV ‐ Free Viewpoint TV  Started in January, 2004  Objective:

  • To achieve an efficient and standard method for coding and view

generation of FTV

 1st phase standardization

  • Standardize the coding part of FTV as Multiview Video Coding (MVC)
  • Completed in May, 2009

 2nd phase standardization

  • Targeted the standardization of 3‐D Video (3DV)
  • In progress

31 9/8/2018

MVC Standard

 To remove the correlation among multiview video data  Major approaches:

  • Combining interview and

temporal prediction

  • Motion compensation

 20‐30% bitrate savings compared to simulcast coding using H.264

Inter-view prediction structure for MVC 32 9/8/2018

slide-17
SLIDE 17

17

3DV Standard

 Motivation:

  • Decouple production from coding format
  • MVC only optimized for 2D color video, but not for depth information

 The main research issues:

  • Data format
  • 3D video coding method
  • Intermediate view generation

 Current progress:

  • Define FTV reference model
  • Adopt N view + N depth format as FTV Data Unit (FDU)

33 9/8/2018

An Instance

34

“Bullet time” from “The Matrix”. Warner Bros. 2000

slide-18
SLIDE 18

18

Multimedia Information Retrieval

(Driven by Machine Learning)

  • Content‐based Image and Video Retrieval

– Low‐level visual features – Relevance feedback

  • Information Fusion for Retrieval

– Combining visual and audio information – Combining audio/visual and contextual information

  • Visual Re‐ranking
  • Large‐scale Search

– Data driven approaches – Annotation and indexing

35 9/8/2018 36

Computer-Computer Interactions Semantic Gaps Human-Computer Interactions

Research Challenges Ahead

slide-19
SLIDE 19

19

Large‐scale Visual Indexing, Search and Application

  • Large‐scale visual search and analysis is important.
  • Efficient feature extraction.
  • Suitable Data structure

– Bag‐of‐visualwords – Tree structure

  • Search mechanism: term frequency‐inverse

document frequency (TF‐IDF)

  • Applications

37 9/8/2018

 Iris recognition  Fingerprints recognition  Speech recognition  Face recognition  Hand gesture recognition  Emotion recognition  Human movement modeling and recognition  HSR for multimodal HCI and modality fusion

38 9/8/2018

slide-20
SLIDE 20

20

Iris Recognition

  • Iris – The most accurate and reliable biometric
  • 249 degrees of freedom (DOF) and good discrimination

entropy

  • Little changes with aging
  • Reliably recognizing 9 million with no false positive

[Daugman 2002]

  • The projection – 1 in 10 billion false positive: more than

the population of the planet

  • Requiring the complete co‐operation of the people being

screened – highly invasive

39 9/8/2018

Fingerprint Recognition

  • Also very accurate and inexpensive
  • Used extensively by police and in security check

(try get into US border today!)

  • Artificial fingers made of cheap and readily

available gelatin can cause serious flaw

  • Highly invasive

40 9/8/2018

slide-21
SLIDE 21

21

Face Recognition

  • One of the most actively studied areas in HSR
  • Potential applications (commercial and law reinforcement):

– Allow to access an ATM machine – Control entry to restricted areas – Recognize people in specific areas (bank, store) – Retrieve people in a specific database (police)

  • The accuracy is not up to required level, and it is at least semi‐

invasive

  • The most popular methods in face recognition:

– EigenFaces – Hidden Markov Model Recognition – Compressed Sensing

41 9/8/2018

Speech Recognition

  • One of the most actively studied areas in HSR
  • The modeling is very elegant in English language: it is

based on 50 smallest contrastive phonetic units ‐ phonemes

  • Prosodic and phonetic features
  • Hidden Markov Model in Recognition
  • Could be very accurate with millions of features and

lengthy training

42 9/8/2018

slide-22
SLIDE 22

22

Hand Gesture Recognition

  • One of the most studied human body movement area
  • It has found many applications
  • It contributed significantly to computer vision‐based

full body human movement recognition

43 9/8/2018

Marker-based gesture HCI Natural free‐hand HCI (Magic Leap)

 Vocal emotion recognition  Emotion recognition using visual cues  Bimodal emotion recognition  3D techniques in emotion recognition  Language, culture and context independence  Realistic data collection  It is not a well studies field

Human Emotion Recognition

44 9/8/2018

slide-23
SLIDE 23

23

Marker/tracker-based approaches (since 1970), then requesting

  • Extensive hardware setup (lessened recently)
  • Significant setup time (lessened recently)
  • Invasive in nature

Computer vision-based approaches

  • Mainly software based
  • 2D vs 3D
  • Rigid model vs non-rigid model
  • Highly depending on state-of-the-art in image

processing and computer vision

Human Movement Recognition

45 9/8/2018

Dynamic fusion of mulitmodal biometrics Combine face, fingerprints, emotion, hand gesture, speech, and gait/action. Dynamically select the level of multimodality The significance

  • The difficulty to simultaneously forge multiple

biometrics

  • More effective HCI in the design of immersive systems

HSR in Multimodal HCI

46 9/8/2018

slide-24
SLIDE 24

24

Example Areas HCI Plays a Critical Role

  • Multimedia information mining
  • Media indexing and retrieval
  • Media manipulation
  • Smart city/smart home
  • Security and surveillance
  • Learning of special needs
  • Bioinformatics
  • Creation of lifelike experience

47 9/8/2018

Transmission of Multimedia Data

 Coding and compression  Distributed video coding  Multiple‐description coding  Streaming media  Peer‐to‐peer networks  Multi‐path transmission  Resource allocation  Multimedia in the cloud

48 9/8/2018

slide-25
SLIDE 25

25

H.264 PFGS Video Codec

Intra Prediction Frame Buffer1 MC ME Video MC Loop Filter DCT Bit Plane VLC IDCT DCTQ IDCTQ Entropy Coding + Frame Buffer0 Loop Filter + + + + Base-Layer Enhancement-Layer UEP Channel Coder UEP Channel Coder

H.264 PFGS Encoder (Courtesy Microsoft Research) 49 9/8/2018 50

The Original image One level DWT

9/8/2018

slide-26
SLIDE 26

26

Close Relationship Between Content & Traffic

  • “Forrest Gump” (Bocheck‐Chang)

51 9/8/2018

Dynamic Resource Allocation System

VBR Streams

Resource Allocation

and Admission Control Link Network Buffer/ Scheduler Source1 Source2 Source N . . . . . .

Need to determine

  • determine renegotiation time
  • estimate how much resource

to request

Renegotiation Points time Reserved Bandwidth 52 9/8/2018

slide-27
SLIDE 27

27

Cloud Computing

A model for enabling convenient,

  • n‐demand network access to a

shared pool of configurable computing resources (for ex., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.

Source: National Institute of Standards and Technology (NIST) http://csrc.nist.gov/publications/nistpubs/800- 145/SP800-145.pdf 53 9/8/2018

Other Research Issues

 Media security

  • Data hiding & water marking
  • Multimedia for security/surveillance

 Multimedia computing in the immersive environment  Wireless multimedia  Multimedia information mining  Hardware design for multimedia

54 9/8/2018

slide-28
SLIDE 28

28

Other Research Issues (2)

 Pattern recognition/computer vision  Image processing and analysis  Speech processing/recognition/synthesis  Information and pattern mining  Artificial/Computational intelligence  Bioinformatics

55 9/8/2018

My Perception on Multimedia

 Indexing & retrieval: Arguably the core

  • You get what you want

 Multimodality and fusion: Real multimedia  Human‐computer interaction: User friendly  Coding & transmission: Efficiency & quality in storage & delivery of multimedia data  Immersive 3D: Creation of lifelike experience  Media Security: IP and business consideration  Wireless multimedia: Make it handy

56 9/8/2018

slide-29
SLIDE 29

29

Applications of Multimedia

 Healthcare & telemedicine  Life science  Arts and cultural heritage  Digital asset management  Security/Surveillance/Military  Education (distance education)  Business/service on demand  Entertainment (Digital Cinema)  Gaming  Smart universe/city/grid…

57 9/8/2018

Application in Distributed Environment

Server 1 Client 2 Client 1 Server 2 Client 3 Client 4 Server 3

Backbone Network External Network

58 9/8/2018

slide-30
SLIDE 30

30

What can multimedia do for us?

59 9/8/2018