optimal online prediction in adversarial environments
play

Optimal Online Prediction in Adversarial Environments Peter - PowerPoint PPT Presentation

Optimal Online Prediction in Adversarial Environments Peter Bartlett EECS and Statistics UC Berkeley http://www.cs.berkeley.edu/ bartlett Online Prediction Probabilistic Model Batch : independent random data. Aim for small


  1. Optimal Online Prediction in Adversarial Environments Peter Bartlett EECS and Statistics UC Berkeley http://www.cs.berkeley.edu/ ∼ bartlett

  2. Online Prediction ◮ Probabilistic Model ◮ Batch : independent random data. ◮ Aim for small expected loss subsequently. ◮ Adversarial Model ◮ Online : Sequence of interactions with an adversary . ◮ Aim for small cumulative loss throughout.

  3. Online Learning: Motivations 1. Adversarial model is appropriate for ◮ Computer security. ◮ Computational finance.

  4. Web Spam Challenge (www.iw3c2.org)

  5. ACM

  6. Online Learning: Motivations 2. Understanding statistical prediction methods. ◮ Many statistical methods, based on probabilistic assumptions , can be effective in an adversarial setting. ◮ Analyzing their performance in adversarial settings provides perspective on their robustness. ◮ We would like violations of the probabilistic assumptions to have a limited impact.

  7. Online Learning: Motivations 3. Online algorithms are also effective in probabilistic settings. ◮ Easy to convert an online algorithm to a batch algorithm. ◮ Easy to show that good online performance implies good i.i.d. performance, for example.

  8. Prediction in Probabilistic Settings ◮ i.i.d. ( X , Y ) , ( X 1 , Y 1 ) , . . . , ( X n , Y n ) from X × Y . ◮ Use data ( X 1 , Y 1 ) , . . . , ( X n , Y n ) to choose f n : X → A with small risk, R ( f n ) = E ℓ ( Y , f n ( X )) .

  9. Online Learning ◮ Repeated game: Player chooses a t Adversary reveals ℓ t ◮ Example: ℓ t ( a t ) = loss ( y t , a t ( x t )) . � ◮ Aim: minimize ℓ t ( a t ) , compared to the best t (in retrospect) from some class: � � regret = ℓ t ( a t ) − min ℓ t ( a ) . a ∈A t t ◮ Data can be adversarially chosen.

  10. Outline 1. An Example from Computational Finance: The Dark Pools Problem. 2. Bounds on Optimal Regret for General Online Prediction Problems.

  11. The Dark Pools Problem ◮ Computational finance: adversarial setting is appropriate. ◮ Online algorithm improves on best known algorithm for probabilistic setting. Joint work with Alekh Agarwal and Max Dama.

  12. Dark Pools Instinet, International Securities Exchange, Chi-X, Investment Technology Group Knight Match, ... (POSIT), ◮ Crossing networks. ◮ Alternative to open exchanges. ◮ Avoid market impact by hiding transaction size and traders’ identities.

  13. Dark Pools

  14. Dark Pools

  15. Dark Pools

  16. Dark Pools

  17. Allocations for Dark Pools The problem: Allocate orders to several dark pools so as to maximize the volume of transactions. ◮ Volume V t must be allocated across K venues: v t 1 , . . . , v t K , such that � K k = 1 v t k = V t . ◮ Venue k can accommodate up to s t k , transacts r t k = min ( v t k , s t k ) . T K � � r t ◮ The aim is to maximize k . t = 1 k = 1

  18. Allocations for Dark Pools: Probabilistic Assumptions Previous work: (Ganchev, Kearns, Nevmyvaka and Wortman, 2008) ◮ Assume venue volumes are i.i.d.: { s t k , k = 1 , . . . , K , t = 1 , . . . , T } . ◮ In deciding how to allocate the first unit, choose the venue k where Pr ( s t k > 0 ) is largest. ◮ Allocate the second and subsequent units in decreasing order of venue tail probabilities. ◮ Algorithm: estimate the tail probabilities (Kaplan-Meier estimator—data is censored), and allocate as if the estimates are correct.

  19. Allocations for Dark Pools: Adversarial Assumptions Why i.i.d. is questionable: ◮ one party’s gain is another’s loss ◮ volume available now affects volume remaining in future ◮ volume available at one venue affects volume available at others In the adversarial setting, we allow an arbitrary sequence of venue capacities ( s t k ), and of total volume to be allocated ( V t ). The aim is to compete with any fixed allocation order.

  20. Continuous Allocations We wish to maximize a sum of (unknown) concave functions of the allocations: T K � � min ( v t k , s t J ( v ) = k ) , t = 1 k = 1 subject to the constraint � K k = 1 v t k ≤ V t . The allocations are parameterized as distributions over the K venues: x 1 t , x 2 t , . . . ∈ ∆ K − 1 = ( K − 1 ) -simplex . Here, x 1 t determines how the first unit is allocated, x 2 t the second, ... V t � The algorithm allocates to the k th venue: v t x v k = t , k . v = 1

  21. Continuous Allocations We wish to maximize a sum of (unknown) concave functions of the distributions: T K � � min ( v t k ( x v t , k ) , s t J = k ) . t = 1 k = 1 Want small regret with respect to an arbitrary distribution x v , and hence w.r.t. an arbitrary allocation. T K � � min ( v t k ( x v k ) , s t regret = k ) − J . t = 1 k = 1

  22. Continuous Allocations We use an exponentiated gradient algorithm: Initialize x v 1 , i = 1 K for v = { 1 , . . . , V } . for t = 1 , . . . , T do k = � V T Set v t v = 1 x v t , k . Receive r t k = min { v t k , s t k } . Set g v t , k = ∇ x v t , k J . Update x v t + 1 , k ∝ x v t , k exp ( η g v t , k ) . end for

  23. Continuous Allocations For all choices of V t ≤ V and of s t Theorem: k , ExpGrad has √ regret no more than 3 V T ln K .

  24. Continuous Allocations For all choices of V t ≤ V and of s t Theorem: k , ExpGrad has √ regret no more than 3 V T ln K . For every algorithm, there are sequences V t and s t Theorem: √ k T ln K / 16. such that regret is at least V

  25. Experimental results Cumulative Reward at Each Round 4 x 10 6 Exp3 3.5 ExpGrad OptKM 3 ParML Cumulative Reward 2.5 2 1.5 1 0.5 0 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Round

  26. Continuous Allocations: i.i.d. data ◮ Simple online-to-batch conversions show ExpGrad obtains per-trial utility within O ( T − 1 / 2 ) of optimal. ◮ Ganchev et al bounds: per-trial utility within O ( T − 1 / 4 ) of optimal.

  27. Discrete allocations ◮ Trades occur in quantized parcels. ◮ Hence, we cannot allocate arbitrary values. ◮ This is analogous to a multi-arm bandit problem: ◮ We cannot directly obtain the gradient at the current x . ◮ But, we can estimate it using importance sampling ideas. Theorem: There is an algorithm for discrete allocation with ex- pected regret ˜ O (( VTK ) 2 / 3 ) . Any algorithm has regret ˜ Ω(( VTK ) 1 / 2 ) .

  28. Dark Pools ◮ Allow adversarial choice of volumes and transactions. ◮ Per trial regret rate superior to previous best known bounds for probabilistic setting. ◮ In simulations, performance comparable to (correct) parametric model’s, and superior to nonparametric estimate.

  29. Outline 1. An Example from Computational Finance: The Dark Pools Problem. 2. Bounds on Optimal Regret for General Online Prediction Problems.

  30. Optimal Regret for General Online Decision Problems ◮ Parallels between probabilistic and online frameworks. ◮ Tools for the analysis of probabilistic problems: Rademacher averages. ◮ Analogous results in the online setting: ◮ Value of dual game. ◮ Bounds in terms of Rademacher averages. ◮ Open problems. Joint work with Jake Abernethy, Alekh Agarwal, Sasha Rakhlin, Karthik Sridharan and Ambuj Tewari.

  31. Prediction in Probabilistic Settings ◮ i.i.d. ( X , Y ) , ( X 1 , Y 1 ) , . . . , ( X n , Y n ) from X × Y . ◮ Use data ( X 1 , Y 1 ) , . . . , ( X n , Y n ) to choose f n : X → A with small risk, R ( f n ) = P ℓ ( Y , f n ( X )) , ideally not much larger than the minimum risk over some comparison class F : excess risk = R ( f n ) − inf f ∈ F R ( f ) .

  32. Parallels between Probabilistic and Online Settings ◮ Prediction with i.i.d. data: ◮ Convex F , strictly convex loss, ℓ ( y , f ( x )) = ( y − f ( x )) 2 : � � ≈ C ( F ) log n P R (ˆ sup f ) − inf f ∈ F R ( f ) . n P ◮ Nonconvex F , or (not strictly) convex loss, ℓ ( y , f ( x )) = | y − f ( x ) | : � � ≈ C ( F ) P R (ˆ sup f ) − inf f ∈ F R ( f ) √ n . P ◮ Online convex optimization: ◮ Convex A , strictly convex ℓ t : per trial regret ≈ c log n . n ◮ ℓ t (not strictly) convex: c √ n . per trial regret ≈

  33. Tools for the analysis of probabilistic problems � n For f n = arg min f ∈ F t = 1 ℓ ( Y t , f ( X t )) , � n � 1 � � � R ( f n ) − inf f ∈ F P ℓ ( Y , f ( X )) ≤ 2 sup ℓ ( Y t , f ( X t )) − P ℓ ( Y , f ( X )) � . � � � n � f ∈ F � t = 1 So supremum of empirical process, indexed by F , gives upper bound on excess risk.

  34. Tools for the analysis of probabilistic problems Typically, this supremum is concentrated about � n � 1 � � � P sup ( ℓ ( Y t , f ( X t )) − P ℓ ( Y , f ( X ))) � � n � � f ∈ F � � t = 1 � n � � P ′ 1 � � � ℓ ( Y t , f ( X t )) − ℓ ( Y ′ t , f ( X ′ � � = P sup t )) � � n � � f ∈ F � t = 1 � n � 1 � � � ℓ ( Y t , f ( X t )) − ℓ ( Y ′ t , f ( X ′ � � ≤ E sup ǫ t t )) � � � n � f ∈ F � � t = 1 n � � 1 � � � ≤ 2 E sup ǫ t ℓ ( Y t , f ( X t )) � , � � n � � f ∈ F � t = 1 where ( X ′ t , Y ′ t ) are independent, with same distribution as ( X , Y ) , and ǫ t are independent Rademacher (uniform ± 1) random variables.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend