two useful arrows darts in that quiver
play

Two Useful Arrows Darts in that Quiver Clment Canonne FOCS Workshop - PowerPoint PPT Presentation

Two Useful Arrows Darts in that Quiver Clment Canonne FOCS Workshop November 9, 2019 1 Avering, Bucketing, and Investing arguments 2 Suppose you have a : X [0,1] such that E [ a ( x )] . (Lets say you already proved that.)


  1. Two Useful Arrows Darts in that Quiver Clément Canonne FOCS Workshop – November 9, 2019 1

  2. Avering, Bucketing, and Investing arguments 2

  3. Suppose you have a : X → [0,1] such that E [ a ( x )] ≥ ε . (Let’s say you already proved that.) We think of a ( x ) as the quality of x , and “using” it has cost cost( a ( x )). 3

  4. Suppose you have a : X → [0,1] such that E [ a ( x )] ≥ ε . (Let’s say you already proved that.) We think of a ( x ) as the quality of x , and “using” it has cost cost( a ( x )). For instance, a population of coins, each with their own bias. The expected bias is ε ; for any given coin, checking bias 0 vs. bias α takes 1/ α 2 tosses. Goal: find a biased coin. 4

  5. How... to convert this into a useful thing? How to find an x with small cost? That is, can we get Pr x [ a ( x ) ≥ blah( ε )] ≥ bluh( ε ) for some “good” functions blah, bluh? 5

  6. “By a standard averaging argument...” First attempt: Markov Lemma (Markov) We have a ( x ) ≥ ε ≥ ε � � Pr 2 . (1) 2 x 6

  7. “By a standard averaging argument...” First attempt: Markov Lemma (Markov) We have a ( x ) ≥ ε ≥ ε � � Pr 2 . (1) 2 x Proof. ε ≤ E [ a ( x )] ≤ ε � a ( x ) < ε � � a ( x ) ≥ ε � 2 · Pr + 1 · Pr x 2 x 2 ≤ 1 7

  8. “By a standard averaging argument...” First attempt: Markov Strategy Sample O (1/ ε ) x ’s to find a “good” one; for each, pay cost( ε /2). 8

  9. “By a standard averaging argument...” First attempt: Markov Strategy Sample O (1/ ε ) x ’s to find a “good” one; for each, pay cost( ε /2). Yes, but... Typically, at least quadratic total cost in ε as cost( α ) = Ω (1/ α ). 9

  10. “By a standard averaging argument...” First attempt: Markov Strategy Sample O (1/ ε ) x ’s to find a “good” one; for each, pay cost( ε /2). Yes, but... Typically, at least quadratic total cost in ε as cost( α ) = Ω (1/ α ). We should not pay the worst of both worlds. 10

  11. “By a standard bucketing argument...” Second attempt: my bucket list Lemma (Bucketing) There exists 1 ≤ j ≤ ⌈ log(2/ ε ) ⌉ : = L s.t. ≥ 2 j ε � a ( x ) ≥ 2 − j � Pr 4 L . (2) x 11

  12. “By a standard bucketing argument...” Second attempt: my bucket list Lemma (Bucketing) There exists 1 ≤ j ≤ ⌈ log(2/ ε ) ⌉ : = L s.t. ≥ 2 j ε � a ( x ) ≥ 2 − j � Pr 4 L . (2) x Proof. Define buckets B 0 : = { x : a ( x ) ≤ ε /2}, B j : = { x : 2 − j ≤ a ( x ) ≤ 2 − j + 1 },1 ≤ j ≤ L Then L ε ≤ E [ a ( x )] ≤ ε � � � 2 − j + 1 · Pr 2 · Pr[ x ∈ B 0 ] + x ∈ B j j = 1 ≤ 1 � � so (averaging!) there exists j ∗ s.t. 2 − j + 1 · Pr ≥ ε /(2 L ). x ∈ B j 12

  13. “By a standard bucketing argument...” Second attempt: my bucket list Strategy For each j ∈ [ L ], in case it’s the good bucket: � sample O (log(1/ ε )/(2 j ε )) x ’s to find a “good” one in B j ; � for each such x , pay cost(2 − j ). 13

  14. “By a standard bucketing argument...” Second attempt: my bucket list Strategy For each j ∈ [ L ], in case it’s the good bucket: � sample O (log(1/ ε )/(2 j ε )) x ’s to find a “good” one in B j ; � for each such x , pay cost(2 − j ). Total cost (examples):  log 2 (1/ ε ) if cost( α ) ≍ 1/ α  L log(1/ ε )  ε � · cost(2 − j ) = log(1/ ε ) 2 j ε if cost( α ) ≍ 1/ α 2  j = 1  ε 2 14

  15. “By a standard bucketing argument...” Second attempt: my bucket list Strategy For each j ∈ [ L ], in case it’s the good bucket: � sample O (log(1/ ε )/(2 j ε )) x ’s to find a “good” one in B j ; � for each such x , pay cost(2 − j ). Total cost (examples):  log 2 (1/ ε ) if cost( α ) ≍ 1/ α  L log(1/ ε )  ε � · cost(2 − j ) = log(1/ ε ) 2 j ε if cost( α ) ≍ 1/ α 2  j = 1  ε 2 Yes, but... we lose log factors. Do we have to lose log factors? 15

  16. “By a refined averaging argument...” Third (and last) attempt: strategic investment Assume that cost( α ) is superlinear, e.g., cost( α ) = 1/ α 2 . Lemma (Levin’s Economical Work Investment Strategy) There exists 1 ≤ j ≤ ⌈ log(2/ ε ) ⌉ : = L s.t. 2 j ε � a ( x ) ≥ 2 − j � Pr 8( L + 1 − j ) 2 . (3) ≥ x 16

  17. “By a refined averaging argument...” Third (and last) attempt: strategic investment Assume that cost( α ) is superlinear, e.g., cost( α ) = 1/ α 2 . Lemma (Levin’s Economical Work Investment Strategy) There exists 1 ≤ j ≤ ⌈ log(2/ ε ) ⌉ : = L s.t. 2 j ε � a ( x ) ≥ 2 − j � Pr 8( L + 1 − j ) 2 . (3) ≥ x Proof. By contradiction: L L E [ a ( x )] ≤ ε ≤ ε � a ( x ) ≥ 2 − j � � � 2 − j + 1 · Pr � � 2 − j + 1 · Pr x ∈ B j 2 + 2 + j = 1 j = 1 L 2 j ε L ∞ < ε 8( L + 1 − j ) 2 = ε 2 + ε ℓ 2 < ε 1 2 + ε 1 � � � 2 − j + 1 · 2 + ℓ 2 < ε 4 4 j = 1 ℓ = 1 ℓ = 1 “Oops.” 17

  18. “By a refined averaging argument...” Third (and last) attempt: strategic investment Strategy For each j ∈ [ L ]: � sample O (( L + 1 − j ) 2 /(2 j ε )) x ’s to find a “good” one in B j ; � for each such x , pay cost(2 − j ) ≍ 2 2 j . 18

  19. “By a refined averaging argument...” Third (and last) attempt: strategic investment Strategy For each j ∈ [ L ]: � sample O (( L + 1 − j ) 2 /(2 j ε )) x ’s to find a “good” one in B j ; � for each such x , pay cost(2 − j ) ≍ 2 2 j . Total cost: L ( L + 1 − j ) 2 L ( L + 1 − j ) 2 · 2 j = 2 L + 1 L · 2 2 j = 1 � � � ℓ 2 · 2 − ℓ 2 j ε ε ε j = 1 j = 1 ℓ = 1 ∞ < 4 � ℓ 2 · 2 − ℓ (It’s 6.) ε 2 ℓ = 1 O (1) 19

  20. “By a refined averaging argument...” Third (and last) attempt: strategic investment Strategy For each j ∈ [ L ]: � sample O (( L + 1 − j ) 2 /(2 j ε )) x ’s to find a “good” one in B j ; � for each such x , pay cost(2 − j ) ≍ 2 2 j . Total cost: L ( L + 1 − j ) 2 L ( L + 1 − j ) 2 · 2 j = 2 L + 1 L · 2 2 j = 1 � � � ℓ 2 · 2 − ℓ 2 j ε ε ε j = 1 j = 1 ℓ = 1 ∞ < 4 � ℓ 2 · 2 − ℓ (It’s 6.) ε 2 ℓ = 1 O (1) Yes, but... No, actually, nothing. Works for any cost( α ) ≫ 1/ α 1 + δ . 20

  21. “By a refined averaging argument...” Third (and last) attempt: strategic investment Strategy For each j ∈ [ L ]: � sample O (( L + 1 − j ) 2 /(2 j ε )) x ’s to find a “good” one in B j ; � for each such x , pay cost(2 − j ) ≍ 2 2 j . Total cost: L ( L + 1 − j ) 2 L ( L + 1 − j ) 2 · 2 j = 2 L + 1 L · 2 2 j = 1 � � � ℓ 2 · 2 − ℓ 2 j ε ε ε j = 1 j = 1 ℓ = 1 ∞ < 4 � ℓ 2 · 2 − ℓ (It’s 6.) ε 2 ℓ = 1 O (1) Yes, but... No, actually, nothing. Works for any cost( α ) ≫ 1/ α 1 + δ . For cost( α ) ≍ 1/ α , not so easy, but some stuff exists. 21

  22. Thomas’ Favorite Lemma 22

  23. Kullback–Leibler Divergence Recall the definition of Kullback–Leibler divergence (a.k.a. relative entropy) between two discrete distributions p , q : p ( ω )log p ( ω ) � D ( p � q ) = q ( ω ) ω 23

  24. Kullback–Leibler Divergence Recall the definition of Kullback–Leibler divergence (a.k.a. relative entropy) between two discrete distributions p , q : p ( ω )log p ( ω ) � D ( p � q ) = q ( ω ) ω It has some issues (symmetry, triangle inequality), yes, but it is everywhere (for a reason). It also has many nice properties. 24

  25. Kullback–Leibler Divergence The dual characterization Theorem (First) For every q ≪ p, � � e f ( x ) �� � � D ( p � q ) = sup E x ∼ p f ( x ) − log E x ∼ q (4) f Theorem (Second) For every q ≪ p, and every λ � e λ x � � � log E x ∼ p = max λ E x ∼ q [ x ] − D ( q � p ) (5) q ≪ p Known as: Gibbs variational principle (1902?), Donsker-Varadhan (1975), special case of Fenchel duality, ... 25

  26. Kullback–Leibler Divergence The dual characterization Theorem (First) For every q ≪ p, � � e f ( x ) �� � � D ( p � q ) = sup E x ∼ p f ( x ) − log E x ∼ q (4) f Theorem (Second) For every q ≪ p, and every λ � e λ x � � � log E x ∼ p = max λ E x ∼ q [ x ] − D ( q � p ) (5) q ≪ p Known as: Gibbs variational principle (1902?), Donsker-Varadhan (1975), special case of Fenchel duality, ... 26

  27. An application Theorem Suppose p is subgaussian on R d . For every function a : R d → [0,1] (with α : = E x ∼ p [ a ( x )] > 0 ), � log 1 � E x ∼ p [ xa ( x )] � 2 ≤ C p α (6) α (constant C p depends on subgaussian parameter, not on d). 27

  28. An application Theorem Suppose p is subgaussian on R d . For every function a : R d → [0,1] (with α : = E x ∼ p [ a ( x )] > 0 ), � log 1 � E x ∼ p [ xa ( x )] � 2 ≤ C p α (6) α (constant C p depends on subgaussian parameter, not on d). The proof that follows was communicated to me by Himanshu Tyagi. 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend