Online Learning for Energy-Efficient Multimedia Systems Nick - - PowerPoint PPT Presentation

online learning for energy efficient multimedia systems
SMART_READER_LITE
LIVE PREVIEW

Online Learning for Energy-Efficient Multimedia Systems Nick - - PowerPoint PPT Presentation

Online Learning for Energy-Efficient Multimedia Systems Nick Mastronarde nhmastro@ee.ucla.edu PhD Defense May 6, 2011 Multimedia Communications and Systems Laboratory Video conferencing In home Surveillance Sensor networks Data


slide-1
SLIDE 1

Multimedia Communications and Systems Laboratory

  • Online Learning for Energy-Efficient

Multimedia Systems

Nick Mastronarde nhmastro@ee.ucla.edu PhD Defense May 6, 2011

slide-2
SLIDE 2
  • Old: Higher multimedia quality is better

Optimize rate-distortion performance

  • H.264/AVC

Minimize delay

Minimize distortion

  • New: Quality costs power

Surveillance Video conferencing Sensor networks Data centers In home

Resource intensive multimedia applications are booming over a variety of resource constrained networks and systems

Delay, Distortion Energy

My Focus! Energy-efficient resource management

slide-3
SLIDE 3

Multimedia Communications and Systems Laboratory

Performance Metrics and High-level System Model

  • Performance metric depends on the system and application

Minimize energy subject to QoS constraint

Optimize QoS subject to energy budget

  • For example:

E[Cost] = E[Energy] + µE[Delay]

  • QoS

Delay, Distortion

slide-4
SLIDE 4

Multimedia Communications and Systems Laboratory

Two types of optimization objectives

  • Myopic:

Minimize expected immediate cost

  • Foresighted:

Minimize expected immediate cost + expected future cost

Why?

  • Power & Delay: Time to transmit current packet impacts time available (and

power required) to transmit future packets before their deadlines

  • Multimedia Utility: Scheduling decisions at the current time impact future

scheduling decisions due to source-coding dependencies

  • E[Cost] = E[Energy] + µE[Delay]

Suboptimal! My Focus!

slide-5
SLIDE 5

Multimedia Communications and Systems Laboratory

Foresighted Optimization

  • How does foresighted optimization work?

In time slot n, take transmission action to minimize:

  • Dynamics:

Action:

Current cost Expected future cost State: State:

  • Time n

Time n+1

Channel Buffer backlog MM Data state Scheduling AMC Channel Data arrivals Tx errors

Myopic solutions are suboptimal because they ignore the expected future utility

slide-6
SLIDE 6

Multimedia Communications and Systems Laboratory

  • Challenges
  • Challenge 1: Unknown dynamic environments

Dynamic traffic and channel conditions

Lack of statistical knowledge of dynamics

Fast learning algorithms

  • Challenge 2: Heterogeneous multimedia data

Different deadlines, priorities, dependencies

  • Challenge 3: Multi-user

Coupling due to shared resources

Curse of dimensionality

slide-7
SLIDE 7

Multimedia Communications and Systems Laboratory

  • Existing Solutions (1/2)
  • Cross-layer optimization in multimedia communications and systems

Myopic: Ignore the impact of current decisions on the future

  • performance. [Nahrstedt 2006, 2007, He 2005, Sachs 2003, Mohapatra 2005, van der

Schaar 2003, 2007]

  • Single-layer optimizations

Hardware layer (dynamic power management): [Benini 1999, Chung 2002,

Marculescu 2005]

  • Learning solutions require too much memory or are too complex

Physical layer (transmission power-control)

  • Optimal solutions require statistical knowledge of dynamics [Berry 2002]
  • Learning solutions are slow to converge [Borkar 2008]

Application layer (multimedia rate-control) [Ortega 1994]

  • Rate-distortion characteristics are assumed to be known
slide-8
SLIDE 8

Multimedia Communications and Systems Laboratory

  • Existing solutions (2/2)
  • Multi-user network optimization

Network utility maximization [Chiang 2007]

  • Static utility function
  • Ignores network dynamics
  • Ignores packet deadlines, priorities, and dependencies
  • No learning for unknown environments

Stability-constrained optimization [Neely 2006]

  • Guarantees queue stability, but achieves suboptimal power consumption in

low delay region

  • Ignores packet deadlines, priorities, and dependencies
slide-9
SLIDE 9

Multimedia Communications and Systems Laboratory

  • Improvement over state-of-the-art

Problem setting Previous state-of-the-art Achieved improvement Point-to-point energy- efficient wireless communication

[Mastronarde 2011b]

Heuristic policy

[Nahrstedt 2007]

Reinforcement learning

[Borkar, 2008]

Reduce power by up to 33% for same delay

(in non-stationary environment)

Reduce delay and power by up to 50% and 23%, respectively, after 3000 learning steps Cooperative multi-user video transmission

[Mastronarde 2011a]

Non-cooperative multi-user video transmission

[Fu, van der Schaar, 2010]

Improve 5 – 10 dB PSNR for nodes with feeble direct signals Cross-layer multimedia system optimization*

[Mastronarde 2010, 2009b]

Cross-layer adaptation

[Nahrstedt 2005]

Improve up to 7 dB PSNR and reduce power by 21%

The proposed framework achieves... *Prior work presented during Qualifying Exam

slide-10
SLIDE 10

Multimedia Communications and Systems Laboratory

  • Overview
  • Part I: Fast reinforcement learning for energy-efficient wireless

communication [Mastronarde, 2011b]

Post-decision state learning

Virtual experience learning

  • Part II: A distributed cross-layer approach to cooperative video

transmission [Mastronarde, 2011a]

Multi-user Markov decision process formulation

Mitigating the curse of dimensionality

slide-11
SLIDE 11

Multimedia Communications and Systems Laboratory

  • Overview
  • Part I: Fast reinforcement learning for energy-efficient wireless

communication [Mastronarde, 2011b]

Post-decision state learning

Virtual experience learning

  • Part II: A distributed cross-layer approach to cooperative video

transmission [Mastronarde, 2011a]

Multi-user Markov decision process formulation

Mitigating the curse of dimensionality

slide-12
SLIDE 12

Multimedia Communications and Systems Laboratory

  • The Solved Energy-efficient Wireless Communication

Problem (1/2)

  • Point-to-point time-slotted wireless communication system
  • Minimize power consumption subject to buffer delay constraint

Little’s law: Average buffer delay is proportional to average buffer occupancy

slide-13
SLIDE 13

Multimedia Communications and Systems Laboratory

  • The Solved Energy-efficient Wireless Communications

Problem (2/2)

  • System variables

Buffer occupancy state:

Channel state: -- Finite state Markov chain (e.g. Rayleigh fading)

Power management state:

Data arrivals: -- i.i.d.

  • Decision variables (actions)

Packet throughput:

Bit-error probability:

Power management action:

  • Goodput
slide-14
SLIDE 14

Multimedia Communications and Systems Laboratory

  • Buffer Model
  • Buffer state:

,

Buffer recursion

Controlled Markov chain with transition probabilities:

slide-15
SLIDE 15

Multimedia Communications and Systems Laboratory

  • Power Management Model
  • Power management state:

Controlled Markov chain with transition probabilities [Benini 1999]

  • Switch “on”

Switch “off”

  • Switching wireless card “on” or “off”

Incurs transition power penalty (watts):

Incurs expected transition delay:

slide-16
SLIDE 16

Multimedia Communications and Systems Laboratory

  • Costs

We want to achieve the optimal power subject to a buffer constraint

  • Power cost:
  • Buffer cost:
  • !
  • Proportional to the delay

(by Little’s law) Provides incentive to tx packets instead of dropping them

  • "

# "

slide-17
SLIDE 17

Multimedia Communications and Systems Laboratory

  • Formulation as Markov Decision Process (MDP)
  • State:
  • Action:
  • Policy:
  • Cost:
  • Transition probability:
  • $
slide-18
SLIDE 18

Multimedia Communications and Systems Laboratory

Value Functions

  • State-value function:
  • Optimal state-value function:
  • Optimal policy:
  • %

%

  • &
  • !
  • %

% %

  • If and are known, this is a simple numerical problem…
slide-19
SLIDE 19

Multimedia Communications and Systems Laboratory

Conventional Reinforcement Learning Algorithm: Q-Learning

  • '

(

  • )
  • '

(

  • "

!#

  • *
  • '

(

  • '

(

  • *
  • !"# !$

%&" ! '()* '+ ,-

  • &
  • &
  • !$$"
  • !$$"
  • Problem

Problem Problem

slide-20
SLIDE 20

Multimedia Communications and Systems Laboratory

  • Post-Decision State Definition

Definition: An intermediate state after the known dynamics take place, but before the unknown dynamics take place.

State (time n) State (time n+1) Post-decision state (time n)

  • Known

Unknown Deterministic

  • PM state transition
  • Power cost
  • N/A

Stochastic

  • Goodput distribution
  • Holding cost
  • Traffic arrival distribution
  • Channel state distribution
  • Overflow cost
slide-21
SLIDE 21

Multimedia Communications and Systems Laboratory

  • Post-Decision State Generalization
  • Transition probability function
  • Cost function
  • %

%

  • $
  • $
  • $
  • $

Known Unknown Known Unknown

  • %
  • In a large class of

wireless systems

slide-22
SLIDE 22

Multimedia Communications and Systems Laboratory

  • Post-Decision State Value Function

State (time n) State (time n+1) Post-decision state (time n)

  • %
  • %
  • %
  • %
  • %
  • %

%

  • %

%

  • %

%

  • !
  • (a)

(b) (a) (b) known unknown The PDS value function must be learned

slide-23
SLIDE 23

Multimedia Communications and Systems Laboratory

Post-Decision State Learning

  • &
  • %

%

  • !
  • %

%

  • !
  • )
  • '

(

  • "

!#

  • *
  • '

(

  • '

(

  • *
  • '

(

  • No Exploration!

Integrates known information! Problem

slide-24
SLIDE 24

Multimedia Communications and Systems Laboratory

  • Virtual Experience Learning
  • Problem: PDS learning only updates one PDS in each time slot
  • Observation: unknown dynamics and are

independent of the buffer and power management states

Learn about all buffer and power management states in each time slot!

Improve adaptation speed at the expense of increased complexity.

  • &
  • &
  • +
  • Actual PDS experience tuple:
  • Set of virtual experience tuples:

Current VE state Next VE state

  • &
  • VE cost
slide-25
SLIDE 25

Multimedia Communications and Systems Laboratory

  • Comparison of Learning Algorithms

Action Selection Complexity Learning Update Complexity Q-learning

  • PDS learning
  • Virtual experience learning
  • +
slide-26
SLIDE 26

Multimedia Communications and Systems Laboratory

  • Simulation Setup
  • PHY layer: QAM square constellations + Gray code
  • Unknown channel transition and packet arrival distributions

Simulation Parameters

Parameter Value Parameter Value Arrival rate 200 packets/second Packet loss rates {1, 2, 4, 8, 16} % Buffer size 25 packets Power management actions

  • Channel states

{-18.82, -13.79, -11.23, -9.37,

  • 7.80, -6.30, -4.68, -2.08} dB

Power management states

  • Holding cost constraint

4 packets Time slot duration

  • 10 ms

“Off” power

  • 0 watts

Transmission actions* {0, 1, 2, … , 10} packets/time slot “On” power

  • 80 mW, 160mW, or 320 mW

Discount factor 0.98 Transition power

  • Set equal to
  • Noise power spectral density
  • '
  • +
  • watts/Hz

*Symbol rate

(

) *

  • +

symbols/s Packet size

*

  • bits

Bits per symbol

  • '
  • ,
slide-27
SLIDE 27

Multimedia Communications and Systems Laboratory

Learning Algorithm Performance Comparison (1/2)

  • 2

4 6 x 10

4

5 10 15 20 25 Time slot (n) Holding Cost 2 4 6 x 10

4

200 250 300 Time slot (n) Power (mW) 2 4 6 x 10

4

0.1 0.2 0.3 0.4 Time slot (n) θoff PDS Learning PDS Learning (No DPM) Q-learning PDS + Virtual Experience (update period = 1)

* * * * *[Borkar, 2008]

slide-28
SLIDE 28

Multimedia Communications and Systems Laboratory

Learning Algorithm Performance Comparison (2/2)

  • 2

4 6 x 10

4

5 10 15 20 25 Time slot (n) Holding Cost 2 4 6 x 10

4

200 250 300 Time slot (n) Power (mW) 2 4 6 x 10

4

0.1 0.2 0.3 0.4 Time slot (n) θoff PDS + Virtual Experience (update period = 1) PDS + Virtual Experience (update period = 10) PDS + Virtual Experience (update period = 25) PDS + Virtual Experience (update period = 125) PDS Learning

slide-29
SLIDE 29

Multimedia Communications and Systems Laboratory

  • Comparison to State-of-the-Art
  • Threshold-k [Nahrstedt, ’07]

If backlog exceeds k, then turn

  • n wireless card and transmit

all packets

After transmitting, turn card off

Ignore channel conditions

  • Non-stationary dynamics

Markov modulated arrival process using unobservable 5- state Markov chain

Time-varying channel transition probabilities

1 2 3 4 5 6 7 8 50 100 150 200 250 Holding cost (packets) Power (mW) Proposed Threshold-k

11% – 33% improvement for same holding cost *Update period for proposed: T = 50 time slots

slide-30
SLIDE 30

Multimedia Communications and Systems Laboratory

  • Summary:

Fast learning for energy-efficient wireless communications

  • Proposed first unified power management framework for delay-

sensitive wireless communication

Integrate system-level and physical-layer centric power management

  • Exploited structure of the problem to improve learning performance

Post-decision state

  • Separation of known and unknown dynamics
  • Eliminate need for exploration

Virtual experience learning

  • Independence of unknown dynamics and components of state
slide-31
SLIDE 31

Multimedia Communications and Systems Laboratory

  • Overview
  • Part I: Fast reinforcement learning for energy-efficient wireless

communication [Mastronarde, 2011b]

Post-decision state learning

Virtual experience learning

  • Part II: A distributed cross-layer approach to cooperative video

transmission [Mastronarde, 2011a]

Multi-user Markov decision process formulation

Mitigating the curse of dimensionality

slide-32
SLIDE 32

Multimedia Communications and Systems Laboratory

  • Multi-user Wireless Video Network With Cooperation

Cooperative phase II uses randomized space-time block coding rule

  • Direct mode: Transmit at data rate
  • Cooperative mode:

Phase I: transmit at data rate

Phase II: transmit at data rate

Cooperative: data rate

$))+,

  • $))+,
  • ' $))+,
  • !
  • '
  • $))+,
  • '
  • .( "

.( " (

  • (
  • '
  • '

' (

'(

  • '
  • : Time slot duration (seconds)

: Transmission time fraction in [0,1] : Phase I time fraction in [0,1]

slide-33
SLIDE 33

Multimedia Communications and Systems Laboratory

  • Prior Work
  • Throughput-maximizing (opportunistic) multiple access policies [Knopp

1995], [Viswanath 2002], [Tse 2005] –

Schedule nodes with good fades

Ignore delay deadlines, priorities, and dependencies

  • Cross-layer solutions [Katsaggelos 2007, 2008], [Su 2007], [van der Schaar 2010],

[Melodia 2010] –

Balance between scheduling easy nodes and most important nodes

Underlying inefficiency in network resource usage

  • Users with high priority data, but worse fades, get access to the channel

Cooperation reduces inefficiency! Enables users with feeble direct signals, but high priority data, to exploit channel diversity

slide-34
SLIDE 34

Multimedia Communications and Systems Laboratory

  • A Sophisticated Traffic Model for Video
  • Traffic state:

– Schedulable frame set: – Buffer state: Simple IBPB IBPB... GOP structure Illustrative Traffic State

  • ,
slide-35
SLIDE 35

Multimedia Communications and Systems Laboratory

  • Traffic State Transition Illustration

t t t t t t t t +1 +1 +1 +1 t t t t +2 +2 +2 +2 Ft = (1,2,3) b b b bt = = = =(4,3,2) Ft+1 = (2,3,1,4) b b b bt+1 = = = = (3,2,6,1) Ft+2 = (1,4,2,3) b b b bt+2 = = = = (4,-1,4,1) y y y yt = = = = (4 4 4 4,0,0) y y y yt+1 = = = = (0,0,2 2 2 2,0) y y y yt+2 = = = = (4 4 4 4,0,1 1 1 1,0) Traffic State Scheduling Action

slide-36
SLIDE 36

Multimedia Communications and Systems Laboratory

  • Multi-User Markov Decision Process Formulation
  • States

Channel state (i.i.d.):

Traffic state:

  • Actions

Scheduling action:

Cooperation decision:

  • Utility and Transition Probability
  • '
  • ,
  • !
  • !
  • .
  • Distortion reduction for packets belonging to frame j
slide-37
SLIDE 37

Multimedia Communications and Systems Laboratory

Feasible Scheduling Actions

  • Constraint set:

Buffer constraint:

Packet constraint

Dependency constraint:

  • "

"

"

slide-38
SLIDE 38

Multimedia Communications and Systems Laboratory

  • Optimization Objective
  • Decision variables:

Scheduling action:

Cooperation decision:

  • Dynamic programming equation:
  • Subject to
  • Challenges:

Complexity is quadratic in , which scales exponentially in and

Traffic state information is local to users , where

  • '
  • #
  • #
  • %

%

  • !
  • &
  • .
  • '
  • ,
  • '
  • ,
slide-39
SLIDE 39

Multimedia Communications and Systems Laboratory

Mitigating the Curse of Dimensionality

  • Problem 1: Complexity scales exponentially in

Theorem: Cooperation decision that maximizes immediate throughput is long-term optimal [Mastronarde, 2011a]

Implications of theorem:

  • Instead of tracking track maximum transmission rates
  • Use an opportunistic cooperation scheme for cooperation decision
  • Problem 2: Complexity scales exponentially in

Solution [Fu, van der Schaar, 2010]: Lagrangian relaxation with a resource price

  • The resulting MU-MDP can be decomposed into one local MDP per user
  • Optimal resource price can be determined using subgradient method

'

  • %
  • %
  • /
  • %
  • /
slide-40
SLIDE 40

Multimedia Communications and Systems Laboratory

  • Simulation Setup
  • Scenarios:

Homogeneous

  • Foreman (CIF, 30 Hz, 1.5 Mb/s)

Heterogeneous 1

  • Coastguard (CIF, 30 Hz, 1.5 Mb/s)
  • Mobile (CIF, 30 Hz, 2.0 Mb/s)
  • Foreman (CIF, 30 Hz, 1.5 Mb/s)

Heterogeneous 2

  • Coastguard (CIF, 30 Hz, 1.5 Mb/s)
  • Foreman (CIF, 30 Hz, 1.5 Mb/s)
  • Mobile (CIF, 30 Hz, 2.0 Mb/s)

Parameter Description Value

  • Length of the STBC

2

  • Rate of orthogonal STBC rule

1

  • Self-selection parameter

0.20

  • Packet size

8000 bits

  • Bit error probability target

(

  • Path loss exponent

3

  • WLAN coverage radius

(5 dB SNR at boundary) 100 m Number of nodes (excluding the AP) 50

  • Discount factor

0.80

)

  • Symbol rate

(symbols per second) 625000 or 1250000

  • .!
  • "
  • *

* * * * * * *

slide-41
SLIDE 41

Multimedia Communications and Systems Laboratory

Network Topology

  • 100
  • 50

50 100

  • 100
  • 50

50 100 AP Video Sources Potential Relays

  • 100
  • 50

50 100

  • 100
  • 50

50 100

  • 100
  • 50

50 100

  • 100
  • 50

50 100

  • Source

Distance to AP Angle 1 20 m 25º 2 45 m

  • 30º

3 80 m 0º

slide-42
SLIDE 42

Multimedia Communications and Systems Laboratory

  • Transmission Rates
  • A: Feeble direct
  • B: Strong direct
  • C: Cooperative

gains

  • D: Homogeneous

allocation

  • E:

Heterogeneous allocation

Cooperative (Low Congestion) Direct (Low Congestion) Cooperative (High Congestion) Direct (High Congestion)

1 2 3 200 400 600 800 1000 1200 1400 1600 1800 Homogeneous (Foreman) 200 400 600 800 1000 1200 1400 1600 1800 Heterogeneous 1 (Coastguard, Mobile, Foreman 200 400 600 800 1000 1200 1400 1600 1800 Heterogeneous 2 (Coastguard, Foreman, Mobile #"""/0&"$ #"""/0&"$ #"""/0&"$

slide-43
SLIDE 43

Multimedia Communications and Systems Laboratory

  • Video Quality Comparison
  • A: Feeble direct video undecodable at receiver
  • B: Cooperation achieves 5-10 dB PSNR improvement for nodes with

feeble direct signals

  • C: Cooperation minimally impacts nodes with strong direct signals

Streaming Scenario Transmission Mode Video User 1 @ 20 m (Low / High) Video User 2 @ 45 m (Low / High) Video User 3 @ 80 m (Low / High) Homogeneous Foreman Foreman Foreman Direct 36.82 dB / 36.51 dB 35.85 dB / 30.20 dB 29.89 dB / --- dB Cooperative 36.69 dB / 35.82 dB 36.58 dB / 34.83 dB 36.04 dB / 27.12 dB Change

  • 0.13 dB / -0.69 dB

0.73 dB / 4.63 dB 6.15 dB / --- dB Heterogeneous 1 Coastguard Mobile Foreman Direct 32.30 dB / 31.09 dB 26.74 dB / 24.53 dB 25.94 dB / --- dB Cooperative 31.94 dB / 30.89 dB 27.14 dB / 25.8 dB 35.69 dB / 27.12 dB Change

  • 0.36 dB / -0.20 dB

0.4 dB / 1.27 dB 9.75 dB / --- dB Heterogeneous 2 Coastguard Foreman Mobile Direct 31.91 dB / 31.72 dB 35.16 dB / 32.75 dB 21.85 dB / --- dB Cooperative 31.56 dB / 30.97 dB 35.72 dB / 32.39 dB 26.53 dB / 22.03 dB Change 0.35 dB / -0.75 dB 0.56 dB / -0.36 dB 4.68 dB / --- dB

slide-44
SLIDE 44

Multimedia Communications and Systems Laboratory

Video Quality Example

  • Video user 3 @ 80 m
  • Low congestion
  • Original

Direct Transmission 26.9 dB PSNR Cooperative Transmission 34.7 dB PSNR

slide-45
SLIDE 45

Multimedia Communications and Systems Laboratory

  • Optimal Resource Price

Streaming Scenario Transmission Mode Resource Price (Low / High) Homogeneous Direct 45.79 / 42.97 Cooperative 38.72 / 52.56 Change

  • 6.93 / 9.59

Heterogeneous 1 Direct 51.01 / 53.17 Cooperative 48.02 / 71.94 Change

  • 2.99 / 18.77

Heterogeneous 2 Direct 68.24 / 41.48 Cooperative 62.61 / 72.86 Change

  • 5.63 / 31.38
slide-46
SLIDE 46

Multimedia Communications and Systems Laboratory

  • Summary:

Multi-user cooperative video transmission

  • Multi-user MDP based approach

Enables high priority nodes to exploit diversity of channel fading states in the network

Improves video quality of feeble (distant nodes) by 5-10 dB PSNR

  • Reduces quality of nodes with strong direct signals by < 1 dB

Resource price for managing congestion

  • Increases in congested networks
  • Decreases in uncongested networks
  • Mitigate complexity

Opportunistic cooperation is long-term optimal

Decompose problem into local MDPs for each user

slide-47
SLIDE 47

Multimedia Communications and Systems Laboratory

Impact in Industry

Company Impact Sanyo (i) Energy-efficient point-to-point wireless communication (ii) Cooperative video transmission Intel Optimal video encoder mode decisions IBM Learning for data exploration Skype Rigorous modeling and optimization using MDP and reinforcement learning

slide-48
SLIDE 48

Multimedia Communications and Systems Laboratory

  • Thank you!

http://www.ee.ucla.edu/~nhmastro/ http://medianetlab.ee.ucla.edu/

slide-49
SLIDE 49

Multimedia Communications and Systems Laboratory

  • My Journal Papers

1.

[Mastronarde, 2011b] N. Mastronarde and M. van der Schaar, “Fast reinforcement learning for energy efficient wireless communications,” in review.

2.

[Mastronarde, 2011a] N. Mastronarde, F. Verde, D. Darsena, A. Scaglione, and M. van der Schaar, “Transmitting important bits and sailing high radio waves: a decentralized cross-layer approach to cooperative video transmission,” in review.

3.

[Mastronarde, 2010] N. Mastronarde and M. van der Schaar, “Online reinforcement learning for dynamic multimedia systems,” IEEE Trans. on Image Processing, vol. 19, no. 2, pp. 290-305, Feb. 2010.

4.

[Mastronarde, 2009c] N. Mastronarde and M. van der Schaar, “Designing autonomous layered video coders,” Elsevier Journal Signal Processing: Image Communication – Special Issue on Scalable Coded Media Beyond Compression, vol. 24, no. 6, pp. 417-436, July 2009.

5.

[Mastronarde, 2009b] N. Mastronarde and M. van der Schaar, “Towards a General Framework for Cross-Layer Decision Making in Multimedia Systems,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 19, no. 5, pp. 719-732, May 2009.

6.

[Mastronarde, 2009a] N. Mastronarde and M. van der Schaar, “Automated bidding for media services at the edge of a content delivery network,” IEEE Trans. on Multimedia, vol. 11, no. 3, pp. 543-555, Apr. 2009.

7.

[Mastronarde, 2008] N. Mastronarde and M. van der Schaar, “A bargaining theoretic approach to quality-fair system resource allocation for multiple decoding tasks,” IEEE Trans. Circuits and Systems for Video Technology,

  • vol. 18, no. 3, Mar. 2008.

8.

[Mastronarde, 2007b] N. Mastronarde and M. van der Schaar, "A queuing-theoretic approach to task scheduling and processor selection for video decoding applications," IEEE Trans. Multimedia, vol. 8, no. 7, pp. 1493-1507,

  • Nov. 2007.

9.

[Mastronarde, 2007a] N. Mastronarde, D. S. Turaga, and M. van der Schaar. “Collaborative resource exchanges for peer-to-peer video streaming over wireless mesh networks,” IEEE J. on Select. Areas in Communications Peer-to-peer Communications and Applications, vol. 25, no. 1, pp. 108-118, Jan. 2007.

10.

[Mastronarde, 2006] Y. Andreopoulos, N. Mastronarde, and M. van der Schaar, “Cross-layer optimized video streaming over wireless multi-hop mesh networks,” IEEE J. on Select. Areas in Communications Multi-Hop Wireless Mesh Networks, vol. 24, no. 11, pp. 2104-2115, Nov. 2006.

slide-50
SLIDE 50

Multimedia Communications and Systems Laboratory

  • References
  • Myopic Cross-Layer (Multimedia systems and communications)

[He, 2005] Z. He, Y. Liang, L. Chen, I. Ahmad, and D. Wu, “Power-rate-distortion analysis for wireless video communication under energy constraints,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 15, no. 5, pp. 645-658, May 2005.

[Sachs, 2003] D. G. Sachs, S. Adve, D. L. Jones, “Cross-layer adaptive video coding to reduce energy on general-purpose processors,” in Proc. International Conference on Image Processing, vol. 3, pp. III-109-112 vol. 2, Sept. 2003.

[Nahrstedt, 2006] W. Yuan, K. Nahrstedt, S. V. Adve, D. L. Jones, R. H. Kravets, “GRACE-1: cross-layer adaptation for multimedia quality and battery energy,” IEEE Trans. on Mobile Computing, vol. 5, no. 7, pp. 799-815, July 2006.

[Nahrstedt, 2007] K. Nahstedt, W. Yuan, S. Shah, Y. Xue, and K. Chen, “QoS support in multimedia wireless environments,” in Multimedia Over IP and Wireless Networks, ed. M. van der Schaar and P. Chou, Academic Press, 2007.

[Mohapatra, 2005] S. Mohapatra, R. Cornea, H. Oh, K. Lee, M. Kim, N. Dutt, R. Gupta, A. Nicolau, S. Shukla, N. Venkatasubramanian, “A cross-layer approach for power-performance optimization in distributed mobile systems,” 19th IEEE International Parallel and Distributed Processing Symposium, 2005.

[Pillai, 2003] P. Pillai, H. Huang, and K.G. Shin, “Energy-Aware Quality of Service Adaptation,” Technical Report CSE-TR-479-03,

  • Univ. of Michigan, 2003.

[van der Schaar 2003] M. van der Schaar, S. Krishnamachari, S. Choi, and X. Xu, “Adaptive cross-layer protection strategies for robust scalable video transmission over 802.11 WLANs,” IEEE JSAC, vol. 21, no. 10, pp. 1752-1763.

[van der Schaar 2007] M. van der Schaar, Y. Andreopoulos, and Z. Hu, “Optimized scalable video streaming over 802.11 a/e HCCA wireless networks under delay constraints,” IEEE Trans. on Mobile Computing, vol. 5, no. 6, pp. 755-768, June 2006.

  • Foresighted Single-Layer (no learning, or heuristic)

[Benini, 1999] L. Benini, A. Bogliolo, G. A. Paleologo, G. D. Micheli, “Policy optimization for dynamic power management,” IEEE

  • Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol. 18, no. 6, pp. 813-833, June 1999.

[Ortega, 1994] A. Ortega, K. Ramchandran, M. Vetterli, “Optimal trellis-based buffered compression and fast approximations,” IEEE Trans. on Image Processing, vol. 3, no. 1, pp. 26-40, Jan. 1994.

[Berry, 2002] R. Berry and R. G. Gallager, “Communications over fading channels with delay constraints,” IEEE Trans. Info. Theory, vol. 48, no. 5, pp. 1135-1149, May 2002.

slide-51
SLIDE 51

Multimedia Communications and Systems Laboratory

  • References
  • Foresighted Single Layer (with learning)

[Chung, 2002] E.-Y. Chung, L. Benini, A. Bogliolo, Y.-H. Lu, and G. De Micheli, “Dynamic power management for nonstationary service requests,” IEEE Trans. on Computers, vol. 51, no. 11, Nov. 2002.

[Marculescu, 2005] Z. Ren, B. H. Krogh, R. Marculescu, “Hierarchical adaptive dynamic power management,” IEEE Trans. on Computers, vol. 54, no. 4, Apr. 2005.

[Borkar, 2008] N. Salodkar, A. Bhorkar, A. Karandikar, V. S. Borkar, “An on-line learning algorithm for energy efficient delay constrained scheduling over a fading channel,” IEEE JSAC, vol. 26, no. 4, pp. 732-742, Apr. 2008.

[Krishnamurthy] M. H. Ngo and V. Krishnamurthy, “Monotonicity of constrained optimal transmission policies in correlated fading channels with ARQ,” IEEE Trans. on Signal Processing, vol. 58, no. 1, pp. 438-451, Jan. 2010.

  • Multiuser network optimization

[Neely, 2010] L. Huang, S. Moeller, M. J. Neely and B. Krishnamachari, “LIFO-Backpressure Achieves Near Optimal Utility-Delay Tradeoff,” Aug. 2010, ArXiv Technical Report, arXiv:1008.4895v1.

[Neely, 2009] M. J. Neely and R. Urgaonkar, "Optimal Backpressure Routing in Wireless Networks with Multi-Receiver Diversity," Ad Hoc Networks (Elsevier), vol. 7, no. 5, pp. 862-881, July 2009.

  • M. J. Neely, "Energy Optimal Control for Time Varying Wireless Networks", IEEE Trans. On Information Theory, vol. 52, no. 7, pp.

2915-2934, July 2006.

[Fu, van der Schaar, 2010] F. Fu and M. van der Schaar, “A systematic framework for dynamically optimizing multi-user video transmission,” IEEE JSAC, vol. 28, pp. 308-320, Apr. 2010.

[Chiang 2007] M. Chiang, S. H. Low, A. R. Caldbank, and J.C. Doyle, “Layering as optimization decomposition: A mathematical theory of network architectures,” Proc. of IEEE, vol. 95, no. 1, 2007.

slide-52
SLIDE 52

Multimedia Communications and Systems Laboratory

  • References
  • Other

[Katsaggelos 2008] J. Huang, Z. Li, M. Chiang, and A.K. Katsaggelos, “Joint Source Adaptation and Resource Allocation for Multi- User Wireless Video Streaming,” IEEE Trans. Circuits and Systems for Video Technology, vol. 18, issue 5, 582-595, May 2008.

[Katsaggelos 2007] E. Maani, P. Pahalawatta, R. Berry, T.N. Pappas, and A.K. Katsaggelos, “Resource Allocation for Downlink Multiuser Video Transmission over Wireless Lossy Networks,” IEEE Transactions on Image Processing, vol. 17, issue 9, 1663- 1671, September 2008.

[Su 2007] G.-M. Su, Z. Han, M. Wu, and K.J.R. Liu, “Joint Uplink and Downlink Optimization for Real-Time Multiuser Video Streaming Over WLANs,” IEEE Journal of Selected Topics in Signal Processing, vol. 1, no. 2, pp. 280-294, August 2007.

[Knopp 1995] R. Knopp and P. A. Humblet, “Information capacity and power control in single-cell multiuser communications,” Proc. IEEE ICC, 1995.

[Viswanath 2002] P. Viswanath, D. N. C. Tse, R. Laroia, “Opportunistic beamforming using dumb antennas,” IEEE Trans. on Information Theory, vol. 48, no. 6, June 2002.

[Tse 2005] D. N. C. Tse and P. Viswanath, Fundamentals of wireless communication. Cambridge, U.K.: Cambridge Univ. Press, 2005.

[Alay 2009] O. Alay, P. Liu, Z. Guo, L. Wang, Y. Wang, E. Erkip, and S. Panwar, “Cooperative layered video multicast using randomized distributed space time codes”, IEEE INFOCOM Workshops 2009, Rio de Janeiro, Brazil, Oct. 2009, pp. 1–6.

[Laneman 2003] J.N. Laneman and G.W. Wornell, “Distributed space-time block coded protocols for exploiting cooperative diversity in wireless networks,” IEEE Trans. Inf. Theory, vol. 49, pp. 2415–2425, Oct. 2003.

[Sendonaris 2003] A. Sendonaris, E. Erkip, and B. Aazhang, “User cooperation diversity – Part I & II,” IEEE Trans. Commun., vol. 51, pp. 1927–1948, Nov. 2003.

[Melodia, 2010] T. Melodia and W. Heinzelmann, “Cross-layer optimization in video sensor networks,” IEEE COMSOC MMTC E- Letter, vol. 5, no. 3, May 2010.

slide-53
SLIDE 53

Multimedia Communications and Systems Laboratory

  • Supplementary Slides
slide-54
SLIDE 54

Multimedia Communications and Systems Laboratory

  • Multimedia Application Characteristics
  • Characteristics

Stringent delay constraints

Sophisticated source-coding dependency structures

Mixed priorities

Intense resource requirements

(a) Sequential Dependencies (b) Typical Hybrid Coder Dependencies (MPEG-2, H.264/AVC)

[Chou, 2006]

(c) Scalable Coding Dependencies

1 2 3 4 5 6 7 8 9 1000 2000 3000 4000 5000 6000 7000 8000 Complexity profile over time for decoding four layers -- Silent.CIF at 1.5 Mb/s Time (sec) Normalized Processor Ticks 1 2 3 4 5 6 7 8 9 1000 2000 3000 4000 5000 6000 7000 8000 Complexity profile over time for decoding four layers -- Silent.CIF at 1.5 Mb/s Time (sec) Normalized Processor Ticks

(c)

Decoding complexity (Silent sequence) Time (seconds) Normalized Complexity

slide-55
SLIDE 55

Multimedia Communications and Systems Laboratory

Reinforcement Learning Architecture

  • Policy

Traffic and Channel Dynamics cost action state Error Value Function

slide-56
SLIDE 56

Multimedia Communications and Systems Laboratory

  • Conventional Reinforcement Learning Algorithm
  • Q-learning

Experience tuple:

Q-learning update:

  • Exploration vs. Exploitation

Given state , how do we choose the action ?

Exploitation: Take greedy action w.r.t. action-value function estimate

  • Prevents the discovery of potentially better actions

Exploration: Take suboptimal action w.r.t. action-value function estimate

  • Sacrifice immediate performance for possibly improved future performance
  • )
  • '

(

  • starting estimate

new sample revised estimate

  • '

(

  • Problem

Problem Problem

slide-57
SLIDE 57

Multimedia Communications and Systems Laboratory

  • Partially Known Dynamics
  • Known dynamics

Goodput:

PM state transition distribution

Power cost

Holding cost

  • Unknown dynamics

Packet arrival distribution:

Channel state transition:

Overflow cost

  • Post-decision state

An intermediate state

  • After known dynamics take place
  • Before unknown dynamics take place
  • $
slide-58
SLIDE 58

Multimedia Communications and Systems Laboratory

  • Decomposition into Known and Unknown Components
  • %
  • Known and unknown transition probabilities
  • Known and unknown costs

known unknown known unknown

  • %
  • !
  • The unknown components do not depend on the action
slide-59
SLIDE 59

Multimedia Communications and Systems Laboratory

  • Post-Decision State Learning
  • Post-decision state learning (online):

PDS experience tuple:

  • %

%

  • %

%

  • %

%

  • !
  • (a)

(b) The PDS value function must be learned

  • )
  • starting estimate

new sample revised estimate

slide-60
SLIDE 60

Multimedia Communications and Systems Laboratory

  • Comparison to Prior Work using Post-Decision States [N.

Salodkar, 2010]*

Salodkar Proposed DPM No Yes AMC Yes Yes Power-control Yes Yes Packet losses No Yes Post-decision state Deterministic Stochastic Costs Known only Known and unknown State transitions Known and unknown Known and unknown Optimization Criteria Undiscounted Discounted Virtual Experience No Yes * Differences in the proposed work are highlighted in red

slide-61
SLIDE 61

Multimedia Communications and Systems Laboratory

  • Learning Algorithm Performance Comparison

2 4 6 x 10

4

5 10 15 20 25 Time slot (n) Holding Cost

  • x 10

PDS + Virtual Experience (update period = 1) PDS + Virtual Experience (update period = 10) PDS + Virtual Experience (update period = 25) PDS + Virtual Experience (update period = 125) PDS Learning PDS Learning (No DPM) Q-learning

2 4 6 x 10

4

0.1 0.2 0.3 0.4 Time slot (n) θoff

  • 2

4 6 x 10

4

200 250 300 Time slot (n) Power (mW)

  • *[Borkar, 2008]

*

slide-62
SLIDE 62

Multimedia Communications and Systems Laboratory

  • Comparison to Optimal Policy With Imperfect Statistics

10 10

1

10

2

10

3

10

4

10

5

2 4 6 8 10 12 Time slot (n) Holding Cost 10 10

1

10

2

10

3

10

4

10

5

50 100 150 200 250 300 350 Time slot (n) Power (mW)

PDS + Virtual Experience (update period = 1) Optimal policy (imperfect statistics)

slide-63
SLIDE 63

Multimedia Communications and Systems Laboratory

  • Non-Stationary Arrivals

2 4 6 x 10

4

100 200 300 400 Time slot (n) Expected arrival rate (packets/s)

  • Unobservable 5-state Markov modulated process

States

  • Expected arrival rate for a Poisson arrival process
  • (0, 100, 200, 300, 400) packets/s

Stationary distribution

  • (0.0188, 0.3755, 0.0973, 0.4842, 0.0242).
slide-64
SLIDE 64

Multimedia Communications and Systems Laboratory

  • Non-Stationary Channel Transitions
  • Channel state transition probabilities vary over time as an AR(1)

process.

Self-transition probabilities (White indicates a relatively high self-transition probability)

slide-65
SLIDE 65

Multimedia Communications and Systems Laboratory

  • Information-Theoretic Power Cost
  • '

'

'

slide-66
SLIDE 66

Multimedia Communications and Systems Laboratory

  • Physical Layer: Adaptive Modulation and Power Control
  • Transmission rate:

bits per symbol:

packet length (bits):

packet rate (packets/s):

  • Bit-error probability (BEP):
  • SNR:

)

  • )
  • $
  • (

'

  • ' )'

) '

  • %
  • "
slide-67
SLIDE 67

Multimedia Communications and Systems Laboratory

  • Physical Layer: Adaptive Modulation and Power Control
  • Variables of interest

Packet throughput (packets/time slot):

Bits per symbol:

Transmission power (watts):

Bit-error probability:

  • packet throughput
  • )
  • bits per symbol
  • '
  • '
  • (
  • "

' (

  • '

(

  • transmission power
  • bit-error probability

Decision variable 1 Decision variable 2 Adaptive Modulation Power Control

slide-68
SLIDE 68

Multimedia Communications and Systems Laboratory

  • Initializing the PDS Value Function
  • PDS Value iteration:
  • Initialization:

Define reasonable estimates: and

Perform PDS value iteration with estimates

  • "

"

  • %

%

  • "
  • "
slide-69
SLIDE 69

Multimedia Communications and Systems Laboratory

  • Impact of Initial Conditions
  • Initial arrival rate is assumed

deterministic or uniform

  • Channel state is assumed

constant

3.9 3.95 4 4.05 4.1 4.15 4.2 180 190 200 210 220 230 240 Holding cost (packets) Power (mW)

  • Init. Arr. Rate = 100 packets/s
  • Init. Arr. Rate = 200 packets/s
  • Init. Arr. Rate = 300 packets/s
  • Init. Arr. Rate = 400 packets/s
  • Init. Arr. Rate = 500 packets/s
  • Init. Arr. Rate = 600 packets/s

Initialized Arrival Rate = Uniform

slide-70
SLIDE 70

Multimedia Communications and Systems Laboratory

  • Traffic State Transition Illustration

t t t t t t t t +1 +1 +1 +1 t t t t +2 +2 +2 +2 Ft = (1,2,3) b b b bt = = = =(4,3,2) Ft+1 = (2,3,1,4) b b b bt+1 = = = = (3,2,6,1) Ft+2 = (1,4,2,3) b b b bt+2 = = = = (4,-1,4,1) y y y yt = = = = (4 4 4 4,0,0) y y y yt+1 = = = = (0,0,2 2 2 2,0) y y y yt+2 = = = = (4 4 4 4,0,1 1 1 1,0) Traffic State Scheduling Action

slide-71
SLIDE 71

Multimedia Communications and Systems Laboratory

Phase I Time Fraction

  • K and Q symbols have to be transmitted in phase I and phase II,

respectively

  • '
  • '
  • )
  • /
  • '
  • &
  • 12

)

  • &
slide-72
SLIDE 72

Multimedia Communications and Systems Laboratory

  • Reformulated Multi-user Markov Decision Process
  • Global state:
  • Decision variables:

Scheduling action:

  • Dynamic programming equation:
  • Constraints:
  • Challenges:

Complexity is proportional to , which scales exponentially in

Traffic state information is local to users , where

  • , where
slide-73
SLIDE 73

Multimedia Communications and Systems Laboratory

  • Optimization Decomposition
  • The multi-user optimization can be decomposed into local MDPs

satisfying:

  • Requires message exchanges between users and the AP

Users

  • AP: Discounted infinite horizon resource consumption

AP

  • Users: Uniform resource price to manage congestion
slide-74
SLIDE 74

Multimedia Communications and Systems Laboratory

Proposed Opportunistic Cooperation Protocol (1/2)

  • When is cooperative transmission better than direct transmission?
  • Candidate cooperative nodes can self-select themselves:
  • AP verifies fulfillment of following condition:
  • If satisfied, then cooperation is better; otherwise, choose direct
  • !
  • '
  • #

/

  • !
  • '
slide-75
SLIDE 75

Multimedia Communications and Systems Laboratory

Proposed Opportunistic Cooperation Protocol (2/2)

  • RTS

Request to send

  • CRS

Cooperative recruitment signal

  • HTS

Help to send

  • CTS

Clear to send

  • &

,

  • 34//
  • '
  • '
slide-76
SLIDE 76

Multimedia Communications and Systems Laboratory

Computation of Transmission Rates

  • Direct rate
  • Phase I rate
  • Phase II rate
slide-77
SLIDE 77

Multimedia Communications and Systems Laboratory

  • Cooperation Statistics