Opportunistic Spectrum Access with Multiple Users: Learning under - - PowerPoint PPT Presentation

opportunistic spectrum access with multiple users
SMART_READER_LITE
LIVE PREVIEW

Opportunistic Spectrum Access with Multiple Users: Learning under - - PowerPoint PPT Presentation

Opportunistic Spectrum Access with Multiple Users: Learning under Competition Anima Anandkumar 1 Nithin Michael 2 Ao Tang 2 1 EECS, Massachusetts Institute of Technology, Cambridge, MA. USA 2 ECE, Cornell University, Ithaca, NY. USA IEEE INFOCOM


slide-1
SLIDE 1

Opportunistic Spectrum Access with Multiple Users: Learning under Competition

Anima Anandkumar1 Nithin Michael2 Ao Tang2

1EECS, Massachusetts Institute of Technology, Cambridge, MA. USA 2ECE, Cornell University, Ithaca, NY. USA

IEEE INFOCOM 2010

Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 1 / 21

slide-2
SLIDE 2

Introduction: Cognitive Radio Network

Two types of users

Primary Users

◮ Priority for channel access

Secondary or Cognitive Users

◮ Opportunistic access ◮ Channel sensing abilities Primary User Secondary User

Limitations of secondary users

Sensing constraints: Sense only part of spectrum at any time Lack of coordination: Collisions among secondary users Unknown behavior of primary users: Lost opportunities Maximize total secondary throughput subject to above constraints

Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 2 / 21

slide-3
SLIDE 3

Distributed Learning and Access

  • No. of channels C

µ1 µ2 µC Slotted tx. with U cognitive users and C > U channels Channel Availability for Cognitive Users: Mean availability µi for channel i and µ = [µ1, . . . , µC]. µ unknown to secondary users: learning through sensing samples No explicit communication/cooperation among cognitive users

Objectives for secondary users

Users ultimately access orthogonal channels with best availabilities µ

  • Max. Total Cognitive System Throughput ≡ Min. Regret

Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 3 / 21

slide-4
SLIDE 4

Distributed Learning and Access

  • No. of channels C

µ∗

1

µ∗

2

µ∗

C

> > >

Slotted tx. with U cognitive users and C > U channels Channel Availability for Cognitive Users: Mean availability µi for channel i and µ = [µ1, . . . , µC]. µ unknown to secondary users: learning through sensing samples No explicit communication/cooperation among cognitive users

Objectives for secondary users

Users ultimately access orthogonal channels with best availabilities µ

  • Max. Total Cognitive System Throughput ≡ Min. Regret

Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 3 / 21

slide-5
SLIDE 5

Distributed Learning and Access

  • No. of channels C

µ∗

1

µ∗

2

µ∗

C

> > >

Slotted tx. with U cognitive users and C > U channels Channel Availability for Cognitive Users: Mean availability µi for channel i and µ = [µ1, . . . , µC]. µ unknown to secondary users: learning through sensing samples No explicit communication/cooperation among cognitive users

Objectives for secondary users

Users ultimately access orthogonal channels with best availabilities µ

  • Max. Total Cognitive System Throughput ≡ Min. Regret

Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 3 / 21

slide-6
SLIDE 6

Summary of Results

Propose two distributed learning+access policies: ρPRE and ρRAND

◮ ρPRE: under pre-allocated ranks among cognitive users ◮ ρRAND: fully distributed and no prior information

Provable guarantees on sum regret under two policies

◮ Convergence to optimal configuration ◮ Regret grows slowly in no. of access slots R(n) ∼ O(log n)

Lower bound for any uniformly-good policy: also logarithmic in no. of access slots R(n) ∼ Ω(log n) We propose order-optimal distributed learning and allocation policies

Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 4 / 21

slide-7
SLIDE 7

Related Work

Multi-armed Bandits

Single cognitive user (Lai & Robbins 85) Multiple users with centralized allocation (Ananthram et. al 87) Key Result: Regret R(n) ∼ O(log n) and optimal as n → ∞ Auer et. al. 02: order optimality for sample mean policies

Cognitive Medium Access & Learning

Liu et. al. 08: Explicit communication among users Li 08: Q-learning, Sensing all channels simultaneously Liu & Zhao 10: Learning under time division access Gai et. al. 10: Combinatorial bandits, centralized learning

Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 5 / 21

slide-8
SLIDE 8

Outline

1

Introduction

2

System Model & Recap of Bandit Results

3

Proposed Algorithms & Lower Bound

4

Simulation Results

5

Conclusion

Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 6 / 21

slide-9
SLIDE 9

System Model

Primary and Cognitive Networks

Slotted tx. with U cognitive users and C channels Primary Users: IID tx. in each slot and channel

◮ Channel Availability for Cognitive Users:

In each slot, IID with prob. µi for channel i and µ = [µ1, . . . , µC].

Perfect Sensing: Primary user always detected Collision Channel: tx. successful only if sole user Equal rate among secondary users: Throughput ≡ total no. of successful tx.

  • No. of channels C

µ1 µ2 µC

Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 7 / 21

slide-10
SLIDE 10

Problem Formulation

Distributed Learning Through Sensing Samples

No information exchange/coordination among secondary users All secondary users employ same policy

Throughput under perfect knowledge of µ and coordination

S∗(n; µ, U) := n

U

  • j=1

µ(j∗) where j∗ is jth largest entry in µ and n: no. of access slots

Regret under learning and distributed access policy ρ

Loss in throughput due to learning and collisions R(n; µ, U, ρ) := S∗(n; µ, U) − S(n; µ, U, ρ)

  • Max. Throughput ≡ Min. Sum Regret

Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 8 / 21

slide-11
SLIDE 11

Single Cognitive User: Multi-armed Bandit

  • No. of channels C

µ1 µ2 µC−1 µC

Exploration vs. Exploitation Tradeoff

Exploration: channels with good availability are not missed Exploitation: obtain good throughput Explore in the beginning and exploit in the long run

Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 9 / 21

slide-12
SLIDE 12

Single Cognitive User: Multi-armed Bandit

  • No. of channels C

µ∗

1

µ∗

2

µ∗

C−1 µ∗ C

> > >

Exploration vs. Exploitation Tradeoff

Exploration: channels with good availability are not missed Exploitation: obtain good throughput Explore in the beginning and exploit in the long run

Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 9 / 21

slide-13
SLIDE 13

Single Cognitive User: Multi-armed Bandit

  • No. of channels C

µ∗

1

µ∗

2

µ∗

C−1 µ∗ C

> > >

Exploration vs. Exploitation Tradeoff

Exploration: channels with good availability are not missed Exploitation: obtain good throughput Explore in the beginning and exploit in the long run

Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 9 / 21

slide-14
SLIDE 14

Single Cognitive User: Multi-armed Bandit (Contd.)

Ti,j(n): no. of slots where user j selects channel i Xi,j(Ti,j(n)): sample mean availability of channel i acc. to user j

Two Policies based on Sample Mean (Auer et. al. 02)

Deterministic Policy: Select channel with highest g-statistic: gj(i; n) := Xi,j(Ti,j(n)) +

  • 2 log n

Ti,j(n) Randomized Greedy Policy: Select channel with highest Xi,j(Ti,j(n)) with prob. 1 − ǫn and with prob. ǫn unif. select other channels, where ǫn := min[β n, 1]

Regret under the two policies is O(log n) for n no. of access slots

Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 10 / 21

slide-15
SLIDE 15

Outline

1

Introduction

2

System Model & Recap of Bandit Results

3

Proposed Algorithms & Lower Bound

4

Simulation Results

5

Conclusion

Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 11 / 21

slide-16
SLIDE 16

Overview of Two Proposed Algorithms

ρPRE Pre-allocation Policy: ranks are pre-assigned

If user j is assigned rank wj, select channel with wth

j highest Xi,j(Ti,j(n))

with prob. 1 − ǫn and with prob. ǫn unif. select other channels, where ǫn := min[β

n, 1]

ρRAND Random allocation Policy: no prior information

User adaptively chooses rank wj based on feedback for successful tx. If collision in previous slot, draw a new wj uniformly from 1 to U If no collision, retain the current wj Select channel with wth

j highest entry:

gj(i; n) := Xi,j(Ti,j(n)) +

  • 2 log n

Ti,j(n)

Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 12 / 21

slide-17
SLIDE 17

Learning Under Pre-Allocation

If user j is assigned rank wj, select channel with wth

j highest Xi,j(Ti,j(n))

with prob. 1 − ǫn and with prob. ǫn unif. select other channels, where ǫn := min[β n, 1]

Regret: user does not select channel of pre-assigned rank

E[Ti,j(n)] ≤

n−1

  • t=1

ǫt+1 C +

n−1

  • t=1

(1 − ǫt+1)P[Ei,j(n)], i = w∗

j,

where Ei,j(n) is the error event that wth

j highest entry of ¯

Xi,j(Ti,j(n)) is not same as µ∗

wj

Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 13 / 21

slide-18
SLIDE 18

Regret Under Pre-allocation

Theorem (Regret Under ρPRE Policy)

  • No. of slots user j accesses channel i = w∗

j other than pre-allocated

channel under ρPRE satisfies E[Ti,j(n)] ≤ β C log n + δ, ∀i = 1, . . . , C, i = w∗

j,

when β > max[20, 4 ∆2

min

], where ∆min := min

i,j |µi − µj| is minimum separation.

Logarithmic regret under ρPRE

Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 14 / 21

slide-19
SLIDE 19

Distributed Learning and Randomized Allocation ρ

RAND

User adaptively chooses rank wj based on feedback for successful tx. If collision in previous slot, draw a new wj uniformly from 1 to U If no collision, retain the current wj Select channel with wth

j highest entry:

gj(i; n) := Xi,j(Ti,j(n)) +

  • 2 log n

Ti,j(n)

Upper Bound on Regret

R(n) ≤ 1 U

U

  • k=1

µ(k∗)  

U

  • j=1
  • i∈U-worst

E[Ti,j(n) + M(n)]   U-best: top U channels. U-worst: remaining channels

  • i∈U-worst

Ti,j(n): Time spent in U-worst channels by user j M(n): No. of collisions in U-best channels

Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 15 / 21

slide-20
SLIDE 20

Distributed Learning and Randomized Allocation ρ

RAND

Theorem

Under ρRAND Policy, E[

i∈U-worst

Ti,j(n)] and E[M(n)] are O(log n) and hence, regret is O(log n) where n is the number of access slots.

Proof for E[M(n)]: no. of collisions in U-best channels

Bound E[M(n)] under perfect knowledge of µ as Π(U) Good state: all users estimate order of top-U channels correctly Transition from bad to good state: Π(U) avg. no. of collisions Bound on no. of slots spent in bad state

Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 16 / 21

slide-21
SLIDE 21

Lower Bound on Regret

Uniformly good policy ρ

A policy which enables users to ultimately settle down in orthogonal best channels under any channel availabilities µ: user j spends most of time in i ∈ U-best channel Eµ[n − Ti,j(n)] = o(nα), ∀α > 0, µ ∈ (0, 1)C. Satisfied by ρPRE and ρRAND policies

Theorem (Lower Bound for Uniformly Good Policy)

The sum regret satisfies lim inf

n→∞

R(n; µ, U, ρ) log n ≥

  • i∈U-worst

U

  • j=1

∆(U∗, i) D(µi, µj∗). Order optimal regret under ρPRE and ρRAND policies

Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 17 / 21

slide-22
SLIDE 22

Outline

1

Introduction

2

System Model & Recap of Bandit Results

3

Proposed Algorithms & Lower Bound

4

Simulation Results

5

Conclusion

Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 18 / 21

slide-23
SLIDE 23

Simulation Results

1 2 3 4 5 x 10

4

50 100 150 200 250 300 350 400 450 500

Random Allocation Central Allocation Distributed Lower Bnd Centralized Lower Bnd

Normalized regret R(n)

log n vs. n slots.

U = 4 users, C = 9 channels.

1 2 3 4 5 6 50 100 150 200 250 300 350 400

Random Allocation Central Allocation Distributed Lower Bnd Centralized Lower Bnd

Normalized regret R(n)

log n vs. U users.

C = 9 channels, n = 2500 slots. Probability of Availability µ = [0.1, 0.2, . . . , 0.9].

Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 19 / 21

slide-24
SLIDE 24

Outline

1

Introduction

2

System Model & Recap of Bandit Results

3

Proposed Algorithms & Lower Bound

4

Simulation Results

5

Conclusion

Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 20 / 21

slide-25
SLIDE 25

Conclusion

Summary

Considered maximizing total throughput of cognitive users under unknown channel availabilities and no coordination Proposed two algorithms which achieve order optimality

◮ ρPRE policy works under pre-allocated ranks ◮ ρRAND policy does not require prior information

Outlook

Imperfect sensing: logarithmic regret still achievable

  • No. of cognitive users unknown to the policy: logarithmic regret still

achievable Cognitive users with different rates and objectives

Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 21 / 21