opportunistic spectrum access with multiple users
play

Opportunistic Spectrum Access with Multiple Users: Learning under - PowerPoint PPT Presentation

Opportunistic Spectrum Access with Multiple Users: Learning under Competition Anima Anandkumar 1 Nithin Michael 2 Ao Tang 2 1 EECS, Massachusetts Institute of Technology, Cambridge, MA. USA 2 ECE, Cornell University, Ithaca, NY. USA IEEE INFOCOM


  1. Opportunistic Spectrum Access with Multiple Users: Learning under Competition Anima Anandkumar 1 Nithin Michael 2 Ao Tang 2 1 EECS, Massachusetts Institute of Technology, Cambridge, MA. USA 2 ECE, Cornell University, Ithaca, NY. USA IEEE INFOCOM 2010 Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 1 / 21

  2. Introduction: Cognitive Radio Network Two types of users Primary Users ◮ Priority for channel access Secondary or Cognitive Users ◮ Opportunistic access ◮ Channel sensing abilities Secondary User Primary User Limitations of secondary users Sensing constraints: Sense only part of spectrum at any time Lack of coordination: Collisions among secondary users Unknown behavior of primary users: Lost opportunities Maximize total secondary throughput subject to above constraints Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 2 / 21

  3. Distributed Learning and Access No. of channels C µ 1 µ 2 µ C Slotted tx. with U cognitive users and C > U channels Channel Availability for Cognitive Users: Mean availability µ i for channel i and µ = [ µ 1 , . . . , µ C ] . µ unknown to secondary users: learning through sensing samples No explicit communication/cooperation among cognitive users Objectives for secondary users Users ultimately access orthogonal channels with best availabilities µ Max. Total Cognitive System Throughput ≡ Min. Regret Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 3 / 21

  4. Distributed Learning and Access No. of channels C µ ∗ µ ∗ µ ∗ > > > 1 2 C Slotted tx. with U cognitive users and C > U channels Channel Availability for Cognitive Users: Mean availability µ i for channel i and µ = [ µ 1 , . . . , µ C ] . µ unknown to secondary users: learning through sensing samples No explicit communication/cooperation among cognitive users Objectives for secondary users Users ultimately access orthogonal channels with best availabilities µ Max. Total Cognitive System Throughput ≡ Min. Regret Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 3 / 21

  5. Distributed Learning and Access No. of channels C µ ∗ µ ∗ µ ∗ > > > 1 2 C Slotted tx. with U cognitive users and C > U channels Channel Availability for Cognitive Users: Mean availability µ i for channel i and µ = [ µ 1 , . . . , µ C ] . µ unknown to secondary users: learning through sensing samples No explicit communication/cooperation among cognitive users Objectives for secondary users Users ultimately access orthogonal channels with best availabilities µ Max. Total Cognitive System Throughput ≡ Min. Regret Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 3 / 21

  6. Summary of Results Propose two distributed learning+access policies: ρ PRE and ρ RAND ◮ ρ PRE : under pre-allocated ranks among cognitive users ◮ ρ RAND : fully distributed and no prior information Provable guarantees on sum regret under two policies ◮ Convergence to optimal configuration ◮ Regret grows slowly in no. of access slots R ( n ) ∼ O (log n ) Lower bound for any uniformly-good policy: also logarithmic in no. of access slots R ( n ) ∼ Ω(log n ) We propose order-optimal distributed learning and allocation policies Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 4 / 21

  7. Related Work Multi-armed Bandits Single cognitive user (Lai & Robbins 85) Multiple users with centralized allocation (Ananthram et. al 87) Key Result: Regret R ( n ) ∼ O (log n ) and optimal as n → ∞ Auer et. al. 02: order optimality for sample mean policies Cognitive Medium Access & Learning Liu et. al. 08: Explicit communication among users Li 08: Q -learning, Sensing all channels simultaneously Liu & Zhao 10: Learning under time division access Gai et. al. 10: Combinatorial bandits, centralized learning Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 5 / 21

  8. Outline Introduction 1 System Model & Recap of Bandit Results 2 Proposed Algorithms & Lower Bound 3 Simulation Results 4 Conclusion 5 Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 6 / 21

  9. System Model Primary and Cognitive Networks Slotted tx. with U cognitive users and C channels Primary Users: IID tx. in each slot and channel ◮ Channel Availability for Cognitive Users: In each slot, IID with prob. µ i for channel i and µ = [ µ 1 , . . . , µ C ] . Perfect Sensing: Primary user always detected Collision Channel: tx. successful only if sole user Equal rate among secondary users: Throughput ≡ total no. of successful tx. No. of channels C µ 1 µ 2 µ C Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 7 / 21

  10. Problem Formulation Distributed Learning Through Sensing Samples No information exchange/coordination among secondary users All secondary users employ same policy Throughput under perfect knowledge of µ and coordination U � S ∗ ( n ; µ , U ) := n µ ( j ∗ ) j =1 where j ∗ is j th largest entry in µ and n : no. of access slots Regret under learning and distributed access policy ρ Loss in throughput due to learning and collisions R ( n ; µ , U, ρ ) := S ∗ ( n ; µ , U ) − S ( n ; µ , U, ρ ) Max. Throughput ≡ Min. Sum Regret Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 8 / 21

  11. Single Cognitive User: Multi-armed Bandit No. of channels C µ 1 µ 2 µ C − 1 µ C Exploration vs. Exploitation Tradeoff Exploration: channels with good availability are not missed Exploitation: obtain good throughput Explore in the beginning and exploit in the long run Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 9 / 21

  12. Single Cognitive User: Multi-armed Bandit No. of channels C µ ∗ µ ∗ µ ∗ C − 1 µ ∗ > > > 1 2 C Exploration vs. Exploitation Tradeoff Exploration: channels with good availability are not missed Exploitation: obtain good throughput Explore in the beginning and exploit in the long run Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 9 / 21

  13. Single Cognitive User: Multi-armed Bandit No. of channels C µ ∗ µ ∗ µ ∗ C − 1 µ ∗ > > > 1 2 C Exploration vs. Exploitation Tradeoff Exploration: channels with good availability are not missed Exploitation: obtain good throughput Explore in the beginning and exploit in the long run Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 9 / 21

  14. Single Cognitive User: Multi-armed Bandit (Contd.) T i,j ( n ) : no. of slots where user j selects channel i X i,j ( T i,j ( n )) : sample mean availability of channel i acc. to user j Two Policies based on Sample Mean (Auer et. al. 02) Deterministic Policy: Select channel with highest g -statistic: � 2 log n g j ( i ; n ) := X i,j ( T i,j ( n )) + T i,j ( n ) Randomized Greedy Policy: Select channel with highest X i,j ( T i,j ( n )) with prob. 1 − ǫ n and with prob. ǫ n unif. select other channels, where ǫ n := min[ β n, 1] Regret under the two policies is O (log n ) for n no. of access slots Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 10 / 21

  15. Outline Introduction 1 System Model & Recap of Bandit Results 2 Proposed Algorithms & Lower Bound 3 Simulation Results 4 Conclusion 5 Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 11 / 21

  16. Overview of Two Proposed Algorithms ρ PRE Pre-allocation Policy: ranks are pre-assigned If user j is assigned rank w j , select channel with w th j highest X i,j ( T i,j ( n )) with prob. 1 − ǫ n and with prob. ǫ n unif. select other channels, where ǫ n := min[ β n , 1] ρ RAND Random allocation Policy: no prior information User adaptively chooses rank w j based on feedback for successful tx. If collision in previous slot, draw a new w j uniformly from 1 to U If no collision, retain the current w j Select channel with w th j highest entry: � 2 log n g j ( i ; n ) := X i,j ( T i,j ( n )) + T i,j ( n ) Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 12 / 21

  17. Learning Under Pre-Allocation If user j is assigned rank w j , select channel with w th j highest X i,j ( T i,j ( n )) with prob. 1 − ǫ n and with prob. ǫ n unif. select other channels, where ǫ n := min[ β n, 1] Regret: user does not select channel of pre-assigned rank n − 1 n − 1 ǫ t +1 � � E [ T i,j ( n )] ≤ + (1 − ǫ t +1 ) P [ E i,j ( n )] , i � = w ∗ j , C t =1 t =1 j highest entry of ¯ where E i,j ( n ) is the error event that w th X i,j ( T i,j ( n )) is not same as µ ∗ w j Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 13 / 21

  18. Regret Under Pre-allocation Theorem (Regret Under ρ PRE Policy) No. of slots user j accesses channel i � = w ∗ j other than pre-allocated channel under ρ PRE satisfies E [ T i,j ( n )] ≤ β ∀ i = 1 , . . . , C, i � = w ∗ C log n + δ, j , when 4 β > max[20 , ] , ∆ 2 min where ∆ min := min i,j | µ i − µ j | is minimum separation. Logarithmic regret under ρ PRE Anandkumar et al. (MIT,Cornell) Spectrum Access INFOCOM ‘10 14 / 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend