[PPT] - Designing Autonomic Wireless Multi-hop Networks for Delay-Sensitive PowerPoint Presentation

SLIDE 1

Designing Autonomic Wireless Multi-hop

Networks for Delay-Sensitive Applications

Peter Hsien-Po Shiang Advisor : Prof. Mihaela van der Schaar Electrical Engineering, UCLA

SLIDE 2

Delay-sensitive applications are booming!

Examples of delay-sensitive applications

1) Hard delay constraints 2) Prioritized multimedia traffic (graceful degradation desired)

Video telephony Surveillance Live audio Live video Vehicular communications Battlefield sensing Games Video conferencing

SLIDE 3

Overall goal

Building efficient multi-hop networks for delay-sensitive applications

Autonomic decision making framework
Gather local information
Learn
Make decisions and interact

Channel condition

Autonomic node = Agent

Information gathering phase Control info. Data Decision making phase

Info. exchange

Learning phase

Application layer traffic requirements

Network environment

SLIDE 4

Autonomic network scenarios

SINR Interference coupling among transmitter- receiver pairs Power control over ad hoc mobile networks Primary users’ loading, other secondary users’ actions Resource availability (spectrum holes) Distributed resource management over cognitive radio networks Source rate, transmission rate, packet error rate Source traffic, channel condition Multimedia transmission over wireless mesh network Local information Dynamics Network scenarios

……… ………

S2

D1 r4 r5 r1 r2 r3 S1 D2 S2 S2 D1 D1 r4 r5 r1 r1 r2 r3 S1 D2

SLIDE 5

Overview

I. Multimedia transmission over mesh networks

II. Exploiting information over space – information horizon

III. Exploiting information over time – learning

IV. Conclusions

1. Information gathering phase Control info. Data packets

3. Decision phase
2. Learning

phase

Multimedia characteristics

SLIDE 6

Focus: multiple multimedia applications over

multi-hop wireless networks

V1 V2

Nodes: Applications: Actions: Utility: Goal: Maximizing overall efficiency

Edges:

Classes:

SLIDE 7

Limitations of prior work (1/2)

Centralized optimization for multimedia transmission

Wu and Chou (2005)
Setton, Yoo, Zhu, Goldsmith, and Girod. (2005)
Jurca, Frossard (2007)
Andreopoulos, Mastronarde, and Van der Schaar (2006,2007)
Capacity constraint

Function of resource, e.g. throughput

Low High Complexity Distributed Centralized Decision maker Fast Slow Adaptation ability Low Proposed sol. High Traditional sol. Information

verhead

SLIDE 8

Limitations of prior decentralized work (2/2)

Flow-based optimized routing using queue information feedback

Awerbuch and Leighton (1994)
Neely, Modiano, Rohrs (2005)
Gupta, Javidi (2007)
Gupta, Lin, Srikant (2007)

Flow-based optimized routing using link state information

Wei, Zakhor (2002)
Draves, Padhye, Zill (2004)

Online adaptation Predetermined Resource allocation Packet-based Flow-based Application model Explicit Proposed sol. Implicit Traditional sol. Delay constraint

SLIDE 9

Challenges

Heterogeneous characteristics of delay-sensitive applications

Different priorities, hard delay deadlines, and loss tolerance

Time-varying transmission environment

Dynamic network conditions

Informationally-decentralized environment

Cost of information gathering

Coupling among agents’ actions and utilities

SLIDE 10

Required solution features for multimedia

transmission

Fully distributed optimization that determines the actions at each node , e.g. how to relay Dynamic adaptation to the changing network/source conditions at each node and coupling between nodes’ actions From rate-constraint flow-based to explicit delay- constraint packet-based optimization

SLIDE 11

Delay-sensitive application quality model
Quality at the sources:
Quality at the destinations:

GOP Transmission Time Multimedia Packets

: : : :

Classes

Source Nodes

Relay Nodes

SLIDE 12

Cross-layer transmission strategy (action )

Application scheduling Network relay selection Data Link retransmission limit Physical MCS selection

Priority queuing model provides

an unified framework to analyze

Packet scheduling
Relay selection impacts packet

arrival process

ARQ can be modeled as geometric

service time distribution

MCS provides different physical

transmission rates and packet error rates Per-packet decisions with explicit delay constraints

SLIDE 13

Elementary structure (2-hop case)

V users with distinct sources and destinations. M intermediate nodes. Information feedback from all the nodes of the next hop – fully distributed.

SLIDE 14

Decisions at the PHY and MAC layer

We show that in the delay constraint optimization, it is

ptimal to transmit the most important packet first with

infinite retransmission limit Knowing the next relay, the optimal modulation and coding scheme:

Cascade the elementary structure to a multi-hop network

SLIDE 15

Overlay multi-hop network structure

Directed Acyclic Multi-hop Network Can be applied to any physical network as an overlay network

Classes

Information feedback

SLIDE 16

Decentralized optimization
Advantages:

–

No predetermined rate allocation

–

Low complexity

–

Fully distributed solution

–

Fast adaptation to network changes

Delay evaluation

Centralized cross-layer optimization:

!
Decentralized optimization:
!
"
"

Delay constraints

SLIDE 17

Route selection at the network layer

Our method: a generalization of the Bellman-Ford routing algorithm Local information: Transmission rates, packet error rates, expected delays

"

" # "

Information from the next hop

Information to the previous hop relay selecting parameter

!

"

Property: Automatically avoid the congestion region

Multi-path routing

$%

! "

Proposed solution: self-learning algorithm [H. Shiang JSAC 2007]

SLIDE 18

Convergence of the route selection

Proposition 1: The self-learning policy over an H-hop

verlay network will converge to a steady state

Key idea:

Evaluate

"

Get information feedback

"
Learn the relay

selecting prob.

Priority queuing

analysis for

"#
"

" # "

SLIDE 19

Advantages of using a queuing model

Advantages:

Fast adaptation to the dynamics
Sophisticated models at different layers

Video streams statistics Relay selection

Input rate analysis

Service time

analysis

(MAC layer) Retransmission, TXOP (PHY) Modulation

Priority queuing analysis

Delay/Packet loss

SLIDE 20

Average queue waiting time analysis

Average queue waiting time (M/G/1 preemptive-repeat model):

"

"# " "

Interference affects the average service time
&

&

&
&
#
" #
" #

" " " #

Approximation of packet loss rate at a relay
&

& & &

&

& &

"#

"#

SLIDE 21

Results of the elementary structure

Simulation Analytical

35.59 35.10 34.29 32.26 35.61 35.34 33.93 32.49 PSNR(dB) Coastguard 33.05 32.00 31.41 29.34 33.12 31.74 30.34 30.15 PSNR(dB) Mobile 0.6 0.5 0.4 0.3 0.6 0.5 0.4 0.3 Tm(Mbps)

v1 v2 v2 v1 m1 m2 m3 m4 5Tm 4Tm 5Tm 3Tm 5Tm 4Tm 5Tm 3Tm 3Tm 4Tm 5Tm 5Tm 4Tm 5Tm 3Tm 5Tm

Analytical Result

SLIDE 22

Results for a 6-hop network

Simulation Analytical

35.58 33.88 33.56 31.86 35.61 33.93 33.92 32.48 PSNR(dB) Coastguard 32.85 31.35 30.21 28.39 33.12 31.74 30.34 28.20 PSNR(dB) Mobile 0.6 0.5 0.4 0.3 0.6 0.5 0.4 0.3 Tm(Mbps)

Analytical Result

SLIDE 23

Comparisons with state-of-the-art routing

solutions

S2 D1 r4 r5 r1 r2 r3 S1 D2

: Physical connections : Overlay connections

!
S2

S2 D1 D1 r4 r5 r1 r1 r2 r3 S1 D2

: Physical connections : Overlay connections

!
35.61

33.10 33.27 30.42

Self-learning policy

35.58 32.85 31.86 28.39

MDTMR [Wei, Zakhor 2004]

34.32 31.37 30.67 24.98

AODV [Perkins 1999]

“Coastguard” Y-PSNR (dB) “Mobile” Y-PSNR(dB) “Coastguard” Y-PSNR (dB) “Mobile” Y-PSNR(dB) Tm = 0.6 (Mbps) Tm = 0.3 (Mbps)

Simulated method

SLIDE 24

Overview

I. Multimedia transmission over mesh networks

II. Exploiting information over space – information horizon

III. Exploiting information over time – learning

IV. Conclusions

The need of information feedback

Decentralized decision making
Timely adaptation
Inter-user collaboration

Similar concept can be found in distance vector routing protocols, e.g. AODV [Perkins 1999], DSDV [Perkins 1994]

Information horizon = 1 hop?

SLIDE 25

Larger information horizon

Information horizon n1 n2 n3 n4 n5 n6 n7 hop1 hop2 hop3

Video data (With TX strategies) Information feedback

TX strategies

n1 n2 n3 n4 n5 n6 n7 hop1 hop2 hop3

TX strategies

!'!

from multiple agents
!'!
!'!
Advantages:
More accurate delay estimation
Faster adaptation

SLIDE 26

Information horizon tradeoff
!'!
!

!'!

better decisions of , so that

larger time overhead per packet

!'!
!'!

"

!'!
Example: risk-aware scheduling

SLIDE 27

Risk-aware scheduling – definition of “risk”

Three categories of the queued packets

“Dropped” packets “Almost dropped” packets “Seldom dropped” packets

Definition of risk:

"
"
"
"
#
$%&%%'(

) #

$&%%'(

)

!'!
#
"

" " "

"
#
#
"
"

" "# "

!
"
"

SLIDE 28

Illustrative example
"

Class should be sent before class during , since it is more “risky”

(
"

User 1: Mobile Deadline: 500 ms User 2: Coastguard Deadline: 300 ms

SLIDE 29

Problem formulation

Priority-based packet scheduling Risk-aware packet scheduling

" "

*+'
#

%

(

!'!

!'!
$!
"
!

!

#
$
%
"
*+'
#

%

(
$!
"
!
#
$
%
Number of class packets sent
Instead of using only

SLIDE 30

Optimal information horizon

S1 S2 D1 D2 10Tm 10Tm 5Tm 5Tm 5Tm 5Tm 4Tm 3Tm 3Tm 4Tm 5Tm 5Tm 5Tm 5Tm 4Tm 3Tm 5Tm 4Tm 5Tm 5Tm Video: Mobile Deadline = 500ms Video: Coastguard Deadline = 300ms Hop1 Hop2 Hop3 Hop4

n1 n3 n4 n5 n6 n7 n8 n2

5Tm 5Tm Hop5

n9

n10 n11 n12 n13

Hop6 5Tm 10Tm 5Tm 5Tm 4Tm 3Tm 3Tm 4Tm 3Tm 5Tm 5Tm 10Tm 10Tm 10Tm S1 S2 D1 D2 10Tm 10Tm 5Tm 5Tm 5Tm 5Tm 4Tm 3Tm 3Tm 4Tm 5Tm 5Tm 5Tm 5Tm 4Tm 3Tm 5Tm 4Tm 5Tm 5Tm Video: Mobile Deadline = 500ms Video: Coastguard Deadline = 300ms Hop1 Hop2 Hop3 Hop4

n1 n3 n4 n5 n6 n7 n8 n2

5Tm 5Tm Hop5

n9

n10 n11 n12 n13

Hop6 5Tm 10Tm 5Tm 5Tm 4Tm 3Tm 3Tm 4Tm 3Tm 5Tm 5Tm 10Tm 10Tm 10Tm

31.75 30.85 29.59 Risk h=4 32.0 31.1 29.63 Risk h=3 31.55 30.80 30.1 Risk h=2 29.61 300Kbps 30.75 400Kbps 31.50 500Kbps Priority h=1

Tm

Analytical average PSNR (dB) for various information horizon

SLIDE 31

Overview

I. Multimedia transmission over mesh networks

II. Exploiting information over space – information horizon

III. Exploiting information over time – learning

IV. Conclusions

SLIDE 32

Given limited information feedback, can an

agent do better?

Remarks:

Interaction with other agents
Local information is changing over time
Current actions may influence future

local information

Answer: Yes!! Solution: Learn changing environment and make foresighted decisions

Agent e.g. input rate, SINR, etc.

Utility evaluation Determine transmission action Gather local information

Wireless networks (other agents)

future influence

SLIDE 33

Foresighted decision making

Key ideas:

Agent does not have to wait for something really to happen then react!!
The anticipation is not only over space, but also over time
Markov decision process

Future utility evaluation Gather local Information State Determine transmission action

Agent Priority queuing model

"#
)
e.g.

input rate, SINR, etc. Wireless networks (other agents)

future influence

State transition prob.

,

SLIDE 34

Markov Decision Process (MDP)
Tuple:

– state: – action: – transition probability: – immediate reward: – discount factor:

Goal: maximize the discounted sum of future

rewards

,

where

&

&

,
Why discounted???

SLIDE 35

MDP Solution

Policy: Optimal state-value function (Bellman equation): Optimal policy: Off-line solution: value-iteration

Immediate Reward Discounted Expected Future Reward

.

.

,
.

.

,
,

SLIDE 36

Reinforcement learning

Applied when the dynamics are partially known or unknown Model-free reinforcement learning, e.g. Q-learning [Watkins 1992],TD- learning [Sutton 1988]

Cannot take advantage of the queuing model Converges slowly

Model-based reinforcement learning

Priority queuing model (M/G/1 preemptive-repeat model) Maximum likelihood state transition probability

,

.

,
.

.

SLIDE 37

Coupling among agents in the multi-hop network

Information feedforward

Expected delay evaluation

Condition to drop a certain priority class

"

"# "

"#
%
Required local information

Information feedback

%
%
"
Agent
%

SLIDE 38

%

%

"#
"
#
$
Proposed transmission policy

Transmission policy update: Information exchange update:

%

% $

"#
$
"#
Current delay

Future delay

SLIDE 39

Proposed distributed MDP

Step 1: Gather local information Step 2: Evaluate state transition prob. and queuing delays Step 3: Update transmission policy Step 4: Exchange information

Future utility evaluation Distributed MDP Local Information State Determine transmission action

Decision

process

f agents
%
Markovian state

transition

%
%

%

"#
"
#
$
Feedback-modified Bellman equation

SLIDE 40

Future

utility evaluation Distributed MDP Local Information State Determine transmission action

Decision

process

f agents
%
Markovian state

transition

converge

Convergence of proposed distributed MDP

Proposition 2: The transmission policy of the distributed MDP will converge if and only if the priority class is not dropped in the networks

*
Future

utility evaluation Distributed MDP Local Information State Determine transmission action

Decision

process

f agents

%

Markovian state

transition

%
Last hop

converge

SLIDE 41

Comparison with traditional routing solutions

Existing routing solutions

–

Throughput optimal [Tassiulas 1996]

–

Flow-based optimized routing using queue size backpressure [Neely and Modiano 2006]

–

Throughput and delay optimized opportunistic routing [Gupta and Javidi 2007]

–

Low complexity distributed joint scheduling-routing algorithms [Gupta, Lin, Srikant 2007]

–

Selfish routing based on congestion information [Roughgarden 2002]

–

Network utility maximization framework (NUM) [Kelly 1998][Xu 2008] Required knowledge Decision making Online learning based on local information A priori known environment (e.g. given capacity region) Foresighted decision making Myopic decision making Proposed autonomic multi-hop routing Traditional routing

SLIDE 42

20

40 60 80 100 120 20 40 E[Delay1] 20 40 60 80 100 120 20 40 E[Delay2] 20 40 60 80 100 120 20 40 E[Delay3] 20 40 60 80 100 120 20 40 E[Delay4] 40 Model-based learning Self-learning Q-learning

Simulation results

Delay deadline: 1sec

Packet loss time (sec)

Delay deadline: 1sec Delay deadline: 1sec

SLIDE 43

Multi-agent interactive learning solutions -

required observations and information

Model-free learning

– Q-learning [Watkins 1992] – TD-learning [Sutton 1988] – Reinforcement learning

Model-based learning

– Fictitious play

[Brown 1951][Shapley 1996]

– Model-based

reinforcement learning [Singh 1995][Ok 1998]

Information

verhead

Reinforcement learning Model-based reinforcement learning

bservation and information

about the agent itself

Fictitious play

bservation and information

about all the other agents

SLIDE 44

Autonomic network scenarios

Reinforcement learning SINR Interference coupling among agents Power control over ad hoc mobile networks Fictitious play Primary users’ loading, other secondary users’ actions Resource availability (spectrum holes) Distributed resource management over cognitive radio networks Model-based reinforcement learning Source rate, transmission rate, packet error rate Source traffic, channel condition Multimedia transmission over wireless mesh network Suitable learning Local information Dynamics Network scenarios

SLIDE 45

Conclusions

Decentralized decision making is not enough! Proposed new networking paradigm, where autonomic agents can self-configure and optimize the applications’ performance by

adapting their cross-layer transmission strategies proactively acquiring information by trading-off information overheads

vs. performance gains

interactively learning making foresighted decisions (across time and across hops)

SLIDE 46

Broader impact and future direction

Vision: the foresighted decision making and interactive learning approaches

Managing any decentralized system with both information and delay constraint Decentralized management by autonomic “cognitive” agents

Future direction Hierarchies of cognitive agents

Coalition of agents

Solutions for malicious behavior prevention

SLIDE 47

Summary of main contributions

Cross-layer design for multimedia streaming

Multi-user video streaming over multi-hop wireless networks [JSAC 2007, Asilomar 2006, IIH-MSP 2006] Risk-aware scheduling [TMM 2007, VCIP 2008]

Dynamic resource management in cognitive radio networks

Queuing-based channel selection for multimedia transmission [TMM 2008, ICIP 2008] Joint route/channel selection in multi-hop cognitive radio networks [TVT 2008, DySPAN 2008]

Learning in games

Adaptive learning in power control game [TVT 2009] Predictive channel selection [ICC 2008] Learning in conjecture-based channel selection game [TNet submitted, Gamenets 2009] Model-based reinforcement learning for distributed MDP [under preparation]

Other

Routing decision for surveillance network under information constraint [TCSVT Submitted]

SLIDE 48

Journal publications

Accepted

Hsien-Po Shiang, Mihaela van der Schaar, “Multi-user Video Streaming over Multi-hop Wireless

Networks: A Distributed, Cross-layer Approach Based on Priority Queuing,” IEEE Journal of Selected Areas in Communications, vol. 25, no. 4, pp. 770-785, May 2007.

Hsien-Po Shiang, Mihaela van der Schaar, “Informationally Decentralized Video Streaming over

Multi-hop Wireless Networks,” IEEE Transactions on Multimedia, vol. 9, no. 6, pp. 1299-1313, Oct 2007.

Hsien-Po Shiang, Mihaela van der Schaar, “Queuing-Based Dynamic Channel Selection for

Heterogeneous Multimedia Applications over Cognitive Radio Networks,” IEEE Transactions on Multimedia, vol. 10, no. 5, pp. 896-909, Aug. 2008.

Hsien-Po Shiang, Mihaela van der Schaar, “Distributed Resource Management in Multi-hop

Cognitive Radio Networks for Delay Sensitive Transmission,” IEEE Transactions on Vehicular Technology, vol. 52, no.2, pp. 941-953, Feb 2009.

Hsien-Po Shiang, Mihaela van der Schaar, “Feedback-Driven Interactive Learning in Dynamic

Wireless Resource Management for Delay Sensitive Users,” IEEE Transactions on Vehicular Technology, accepted, to appear. Submitted

Hsien-Po Shiang, Mihaela van der Schaar, “Conjecture-Based Channel Selection for

Autonomous Delay-Sensitive Users in Multi-Channel Wireless Networks,” submitted to IEEE Transactions on Networking.

Hsien-Po Shiang, Mihaela van der Schaar, “Information-Constrained Resource Allocation in

Multi-Camera Wireless Surveillance Networks,” submitted to IEEE Transactions on Circuits and Systems for Video Technology.

SLIDE 49

Conference Papers
Hsien-Po Shiang, Mihaela van der Schaar, “Delay-Sensitive Resource Management in Multi-hop

Cognitive Radio Networks" in IEEE Dynamic Spectrum Access Networks (DySPAN 2008), Oct. 2008.

Hsien-Po Shiang, Mihaela van der Schaar, “Dynamic Channel Selection for Multi-user Video

Streaming over Cognitive Radio Networks," in Proc. Int. Conf. On Image Processing. (ICIP 2008)

Oct. 2008.
Hsien-Po Shiang, Wenchi Tu, Mihaela van der Schaar, “Dynamic Resource Allocation of Delay

Sensitive Users Using Interactive Learning over Multi-carrier Networks," in Proc. Int. Conf.

Commun. (ICC 2008) May 2008.
Hsien-Po Shiang, Mihaela van der Schaar, “Risk-aware scheduling for multi-user video streaming
ver wireless multi-hop networks,” in IS&T/SPIE Visual Communications and Image Processing

(VCIP 2008), San Jose, Jan 2008.

Hsien-Po Shiang, Mihaela van der Schaar, “Multi-user Video Streaming over Multi-hop Wireless

Networks: A Cross-layer Priority Queuing Approach,” in IEEE Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2006), pp. 255-258, Dec 2006.

Hsien-Po Shiang, D. Krishnaswamy, and Mihaela van der Schaar, “Quality-aware Video

Streaming over Wireless Mesh Networks with Optimal Dynamic Routing and Time Allocation,” in Proceedings of the 40th Asilomar Conference on Signals, Systems, and Computers, Oct 2006.

D. Krishnaswamy, H.-P. Shiang, J. Vicente, W. S. Conner, S. Rungta, W. Chan and K. Miao, “A

Cross-Layer Cross-Overlay Architecture for Proactive Adaptive Processing in Mesh Networks,” in 2nd IEEE Workshop on Wireless Mesh Networks (WiMesh 2006), Sep 2006.

SLIDE 50

References (1/2)

[WCZ05] Y. Wu, P. A. Chou, Q. Zhang, K. Jain, W. Zhu, S.Y. Kung, "Network Planning in Wireless Ad Hoc Networks: A Cross-Layer Approach", IEEE Journal on Selected Areas in Communications,

vol. 23, no. 1, pp. 136-150, Jan. 2005.

[SYZ05] E. Setton, T. Yoo, X. Zhu, A. Goldsmith, and B. Girod, “Cross-layer design of Ad hoc Networks for real-time video streaming,” IEEE Wireless Communications Mag., pp. 59-65, Aug 2005. [JF07] D. Jurca, P. Frossard, “Packet Selection and Scheduling for Multipath video streaming,” IEEE Transactions on Multimedia, vol. 9, no. 2, Apr. 2007. [AMV06] Y. Andreopoulos, N. Mastronarde, and M. van der Schaar, “Cross-layer Optimized video Streaming over wireless multi-hop Mesh Networks,” IEEE Journal on Selected Areas in Communications, vol. 24, no. 11, Nov 2006, pp. 2104-2115. [PR99] C. E. Perkins, E. M. Royer, “Ad hoc on-demand distance vector routing,” in Proceedings of the 2nd IEEE Workshop on Mobile Computing Systems and Applications, pp. 90-100, Feb 1999. [PB94] C. E. Perkins, P. Bhagwat, “Highly Dynamic Destination-Sequenced Distance-Vector Routing (DSDV) for Mobile Computers,” ACM SIGCOMM Computer Communication Review, vol. 24, no. 4, pp. 234-244, Oct. 1994. [WZ02] W. Wei, and A. Zakhor, “Multipath unicast and multicast video communication over wireless ad hoc networks,” Proc. Int. Conf. Broadband Networks, Broadnets, pp. 496-505, 2002. [DPZ04] R. Draves, J. Padhye, and B. Zill, “Routing in multi-radio, multi-hop wireless mesh networks,” in Proc. ACM Internat. Conf. on Mob. Computing and Networking (MOBICOM), 2004, pp. 114- 128. [AL94] B. Awerbuch and T. Leighton, “Improved Approximation Algorithms for the Multi-commodity Flow Problem and Local Competitive Routing in Dynamic Networks,” Proc. 26th ACM Symposium

n Theory of Computing, May 1994.

[NMR05] M. J. Neely, E. Modiano, and C. E. Rohrs, “Dynamic Power Allocation and Routing for Time- Varying Wireless Networks”, IEEE Journal on Selected Areas in Communications, vol. 23. no1, Jan 2005. pp. 89-103. [GJ07] P. Gupta and T. Javidi, "Towards Throughput and Delay-Optimal Routing for Wireless Ad-Hoc Networks,'' Asilomar Conference on Signals, Systems and Computers, Nov. 2007.

SLIDE 51

References (2/2)

[WD92] C. J. C. H. Watkins, P. Dayan, “Q-learning”, Machine Learning, vol. 8, no. 3-4, pp. 279-292, May 1992. [Sut88] R. S. Sutton, ”Learning to predict by the method of temporal differences,” Machine Learning,

vol. 3, no. 1, pp. 9-44, Aug. 1988.

[TO98] P. Tadepalli and D. Ok, "Model-based average reward reinforcement learning", Artificial Intelligence, Volume 100, Issues 1-2, January 1998, Pages 177-224. [BBS95] A. G. Barto, S. J. Bradtke and S. P. Singh, "Learning to act using real-time dynamic programming", Artificial Intelligence, Volume 72, Issues 1-2, January 1995, Pages 81-138.

SLIDE 52

Sub-flows separation of Coastguard and Mobile

video sequences

/

SLIDE 53

Agent

Local information Transmission action Delay evaluation

SLIDE 54

Delay-Sensitive Multimedia Applications
Heterogeneous dependencies

–

Delay deadlines

–

Time-varying complexity

Loss tolerant / adaptable

1 2 3 4 5 6 7 8 9 1000 2000 3000 4000 5000 6000 7000 8000 Complexity profile over time for decoding four layers -- Silent.CIF at 1.5 Mb/s Time (sec) Normalized Processor Ticks 1 2 3 4 5 6 7 8 9 1000 2000 3000 4000 5000 6000 7000 8000 Complexity profile over time for decoding four layers -- Silent.CIF at 1.5 Mb/s Time (sec) Normalized Processor Ticks

(c)

Decoding complexity (Silent sequence) Time (seconds) Normalized Complexity

(a) Sequential Dependencies (a) Typical Hybrid Coder Dependencies (MPEG-2, H.264/AVC) (a) Scalable Coding Dependencies

[Chou, 2006]

SLIDE 55

Multimedia transmission over wireless mesh

networks

S2 D1 r4 r5 r1 r2 r3 S1 D2 S2 S2 D1 D1 r4 r5 r1 r1 r2 r3 S1 D2

SLIDE 56

Resource management in cognitive radio

networks

SLIDE 57

Power control in ad hoc networks

……… ………

SLIDE 58

Example: distributed channel/route selection

in multi-hop cognitive radio networks

[TVT Shiang 2008]

20 40 60 80 100 120 140 20 40 60 80 100 120 140 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

$
$
App: 2 video streams

Users: 15 secondary users (nodes) Actions: channel/route selection 2 frequency channels Utilities: reduce packet loss rate of delay-sensitive applications Transmission range: 40 meters Primary user around node 11,12 Adopt fictitious play

SLIDE 59

Fictitious Play

– Goal: learn the other agents’ policies – Count the empirical frequency of the other agents’ actions – Probabilistic behaviors

#
+
+

+

+

+ + + + +

+
"

,

+
,

+

'
Private

information Evaluate and maximize

User

Users
Wireless

network environment

Fictitious play
(
,
"

Should an agent monitor all the other agents??

SLIDE 60

Information cell

Benefit of acquiring more information

Build more accurate belief Avoid “information mismatch problem”

Cost of gathering information

(a)
" $
" $
" $
" $
" $
Interference range of
Information horizon
(a)
" $
" $
" $
" $
" $
Interference range of
Information horizon
" $
" $
" $
" $
" $
Interference range of
Information horizon
(b)
" $
" $
" $
(b)
" $
" $
" $
(
(
$
(
$
Decision making

Packet transmission

(

(
$
(
$
Decision making

Packet transmission

$

(

(
$

SLIDE 61

Adaptive fictitious play in cognitive radio

networks

Adaptive fictitious play adapts the information cell that limits the neighbors with which information is exchanged

Primary users

Minimum-delay

route/channel selection

!
Available

resource Adaptive Fictitious play

Secondary users

in horizon Node n

SLIDE 62

Results of the two applications V1 and V2

2 3 4 5 6 7 8 9 10 0.2 0.4 0.6 0.8 1 Average Transmission Rate T(e,f) (Mbps) Packet Loss Rate 2 3 4 5 6 7 8 9 10 0.2 0.4 0.6 0.8 1 Average Transmission Rate T(e,f) (Mbps) Packet Loss Rate AODV V2 AODV/LB V2 DCS V2 AFP horizon 2 V2 AFP horizon 1 V2 AODV V1 AODV/LB V1 DCS V1 AFP horizon 2 V1 AFP horizon 1 V1

Myopic channel selection Learn from less neighbors Learn from more neighbors Random channel selection (Primary users loading ~ 0)

SLIDE 63

Results regarding the impact of primary

users

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.4 0.5 0.6 0.7 0.8 Primary user time fraction Packet loss rate AFP horizon 3 V1 AFP horizon 2 V1 AFP horizon 1 V1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.1 0.15 0.2 0.25 0.3 0.35 Primary user time fraction Packet loss rate AFP horizon 3 V2 AFP horizon 2 V2 AFP horizon 1 V2

(Primary users around nodes 11, 12, T=5Mbps) Information cost