Fast Online Learning of Antijamming and Jamming Strategies Y. Gwon, - PowerPoint PPT Presentation

Fast Online Learning of Antijamming and Jamming Strategies Y. Gwon, S. Dastangoo, C. Fossa, H. T. Kung December 9, 2015 Presented at the 58 th IEEE Global Communications Conference, San Diego, CA This work is sponsored by the Department of Defense under Air Force Contract FA8721-05-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the United States Government. DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited.

Outline • Introduction • Background: Competing Cognitive Radio Network • Problem • Model • Solution approaches • Evaluation • Conclusion GLOBECOM 2015 – 2

Introduction • Competing Cognitive Radio Network Multi-channel open spectrum (CCRN) models mobile networks under Collision competition – Blue Force ( ally ) vs. Red Force ( enemy ) Intra-network – Dynamic, open spectrum resource cooperation – Nodes are cognitive radios  Comm nodes and jammers Jam – Opportunistic data access – Jam Strategic jamming attacks Jam Red Force (RF) Blue Force (BF) Network Network Network-wide competition GLOBECOM 2015 – 3

Background: Competing Cognitive Radio Network • Formulation 1: Stochastic MAB – < A B , A R , R > Blue-force (B) & Red-force (R) action sets: a B = { a BC , a BJ }  A B , a R = { a RC , a RJ }  A R Reward: R  PD( r | a B , a R ) – Regret Γ = max a  A B ∑ T r ( a ) – ∑ T t ) r ( a B T ) [Lai&Robbinsʹ85] Optimal regret bound in O (log • Formulation 2: Markov Game – < A B , A R , S, R , T> Stateful model with states S and probabilistic transition function T – Strategy π : S ⟶ PD( A ) is probability distribution over action space Optimal strategy π * = arg max π E[∑ γ R ( s , a B , a R )] can be computed by Q-learning via linear programming GLOBECOM 2015 – 4

New Problem Formulation • Assume intelligent adversary – Hostile Red-force can learn as efficiently as Blue-force – Also, applies cognitive sensing to compute strategies • Consequences Well-behaved stochastic channel reward invalid ⇒ – time-varying channel rewards More difficult to predict or model – Nonstationarity in Red-force actions Random, arbitrary changepoint ⇒ introduces dynamic changes GLOBECOM 2015 – 5

Revised Regret Model • Stochastic MAB problems model regret Γ using reward function r(a) – Γ = max a  A B ∑ T r ( a ) – ∑ T t ) r ( a B • Using loss function l(a), we revise Γ – Revised regret Λ with loss function l (.) Λ = ∑ t ) – min a  AB ∑ l ( a B l ( a ) • Loss version is equivalent to reward version Γ – But provides adversarial view as if: “Red -force alters potential loss for Blue-force over time, t ) at time t ” revealing only l t ( a B GLOBECOM 2015 – 6

New Optimization Goals • Find best Blue-force action that minimizes Λ over time ∑ t ) – min a  AB ∑ a * = arg l t ( a B l t ( a ) min a • It’s critical to estimate l t (.) accurately for new optimization – l (.) evolves over t, and intelligent adversary makes it difficult to estimate GLOBECOM 2015 – 7

Our Approach: Online Convex Optimization If l t (.)  convex set, optimal regret bound can be • achieved by online convex programming [Zinkevichʹ03] – Underlying idea is gradient descent/ascent • What is gradient descent? – Find minima of loss by tracing estimated gradient (slope) of loss initial_guess = x0 f ( x ) search_dir = –f′(x) choose step h > 0 x_next = x_cur – h f′( x_cur) Initial guess stop when |f ′( x)| < ε Stop f ( x F ) x x F x 0 GLOBECOM 2015 – 8

Our New Algorithm: Fast Online Learning • Sketch of key ideas – Estimate expected loss function for next time – Take gradient that leads to minimum loss iteratively – Test if reached minimum is global or local – When stuck at inefficiency (undesirable local min), use escape mechanism to get out – Go back and repeat until convergence GLOBECOM 2015 – 9

New Algorithm Explained (1) l (regret) + a t +1 = a t l t ( a t ) l * a – + a t a t a t GLOBECOM 2015 – 10

New Algorithm Explained (2) l (regret) a t +1 = a t + u l t ( a t ) l * a – + a t a t a t GLOBECOM 2015 – 11

New Algorithm Explained (3) l (regret) a t +1 = a t l t ( a t ) ≈ l * a – + a t a t a t GLOBECOM 2015 – 12

Evaluation • Wrote custom simulator in MATLAB – Simulated spectrum with N = 10, 20, 30, 40, 50 channels – Varied number of nodes M = 10 to 50 Number of jammers in M total nodes varied 2 to 10 – Simulation duration = 5,000 time slots • Algorithms evaluated 1. MAB (Blue-force) vs. random changepoint (Red-force) 2. Minimax-Q (Blue-force) vs. random changepoint (Red- force) 3. Proposed online (Blue-force) vs. random changepoint (Red-force) • All algorithmic matchups in centralized control GLOBECOM 2015 – 13

Results: Convergence Time GLOBECOM 2015 – 14

Results: Average Reward Performance ( N = 40, M = 20) New algorithm finds optimal strategy much more rapidly than MAB and Q-learning based algorithms GLOBECOM 2015 – 15

Summary • Extended Competing Cognitive Radio Network (CCRN) to harder class of problems under nonstochastic assumptions – Random changepoints for enemy channel access & jamming strategies, time-varying channel reward • Proposed new algorithm based on online convex programming – Simpler than MAB and Q-learning – Achieved much better convergence property – Finds optimal strategy faster • Future work – Better channel activity prediction can help estimate more accurate loss function GLOBECOM 2015 – 16

Support Materials GLOBECOM 2015 – 17

Proposed Algorithm GLOBECOM 2015 – 18

Channel Activity Matrix, Outcome, Reward, State (1/2) • Example: there are two comm nodes and two jammers for each BF and RF network – BF uses channel 10 for control, RF channel 1 • At time t , actions are the following – t = { a B,comm = [7 3], a B,jam = [1 5]} A B  a B,comm = [7 3] means BF comm node 1 transmit at channel 7, and comm node at 2 channel 3 – t = { a R,comm = [3 5], a B,jam = [10 9]} A R • How to figure out channel outcomes, compute rewards, and determine state? – Channel Activity Matrix GLOBECOM 2015 – 19

Channel Activity Matrix, Outcome, Reward, State (2/2) Blue Force Red Force Reward CH Outcome BF RF Comm Jammer Comm Jammer – – – 1 Jam +1 0 BF jamming success – – 3 Tx Tx 0 0 BF & RF comms collide – – 5 Jam Tx +1 0 BF jamming success – – – 7 Tx +1 0 BF comm Tx success – – – 9 Jam 0 0 RF jamming fail – – – 10 Jam 0 +1 RF jamming success GLOBECOM 2015 – 20

Fast Online Learning of Antijamming and Jamming Strategies Y. Gwon, - PowerPoint PPT Presentation

Fast Online Learning of Antijamming and Jamming Strategies Y. Gwon, S. Dastangoo, C. Fossa, H. T. Kung December 9, 2015 Presented at the 58 th IEEE Global Communications Conference, San Diego, CA This work is sponsored by the Department of

Online Learning Lorenzo Rosasco MIT, 9.520 L. Rosasco Online Learning About this class Goal

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Online Learning and Online Investing Jia Mao February 20, 2006 Jia Mao () Online Learning and

Fast Food and Your Health www.ddssafety.net Last updated October 2009 What is fast food?

Community Update MST T Fast st Facts cts MST T Fast st Facts cts MST T Fast st Facts

Lurssen 32,9 A classic fast Lurssen 32,9 A classic fast A F T D E C K Lurssen 32,9 A

Teaching with Online Platforms What is an Online Learning Platform? A n Online Learning Platform

ONLINE ADVERTISING What is SIBC online? SIBC Online is a leading online news source for the

Class Structure Last time: Midterm This time: Fast Learning Next time: Fast Learning Lecture 11:

Online Learning Tomaso Poggio and Lorenzo Rosasco 9.520 Class 15 March 30 2011 T. Poggio and L.

Redis for Fast Data Ingest Agenda Fast Data Ingest and its challenges Redis for Fast

IOTA/FAST Collaboration Meeting - Intro Vladimir SHILTSEV, AD/APC IOTA/FAST Workshop and

The Education for All The Education for All Fast Track Initiative Fast Track Initiative

Fast-track listing Fast-track listing process Time to market can be essential benefits of

Fast-SCNN: Fast Semantic Segmentation Network Rudra PK Poudel Stephan Liwicki Roberto Cipolla

Online Learning with Kernel Losses Aldo Pacchiano UC Berkeley Joint work with Niladri Chatterji

NCAA FAN JAM 2020- 2021 GROUP MEMBERS: JACOB LARGENT, WALEED AL-HAMED, MEENAL JAKATDAR,ULKA PATIL

CapMan 2018 Results Significant strategic projects were completed J o a k i m F r i m o d i g

Why Do Assignments Matter? RED MOUNTAIN WRITING PROJECT BIRMINGHAM, AL TONYA PERRY, PH.D.,

Proposal for the redevelopment of the former Chivers Factory into a quality new apartment

Guided Pathways Meta Majors Caada College - Inquiry Phase To begin: A little story about

Fox Watershed Flood Commission April 17, 2019 Sally McConkey P.E., CMF Coordinated Hazard

Alice Falls Hydroelectric Project (FERC No. 5867) Relicensing Joint Agency Meeting January 24,

Nason LWP First Bend Nason LWP First Bend Nason LWP First Bend Reach geomorphology