announcement
play

Announcement Grades for HW2 and project proposal are released 1 - PowerPoint PPT Presentation

Announcement Grades for HW2 and project proposal are released 1 CS6501: T opics in Learning and Game Theory (Fall 2019) Learning from Strategically Transformed Samples Instructor: Haifeng Xu Part of the Slides are provided by Hanrui Zhang


  1. Announcement Ø Grades for HW2 and project proposal are released 1

  2. CS6501: T opics in Learning and Game Theory (Fall 2019) Learning from Strategically Transformed Samples Instructor: Haifeng Xu Part of the Slides are provided by Hanrui Zhang

  3. Outline Ø Introduction Ø The Model and Results 3

  4. Signaling Q : Why attending good universities? Q : Why publishing and presenting at top conferences? Q : Why doing internships? 4

  5. Signaling Q : Why attending good universities? Q : Why publishing and presenting at top conferences? Q : Why doing internships? Ø All in all, these are just signals (directly observable) to indicate “excellence” (not directly observable) 5

  6. Signaling Q : Why attending good universities? Q : Why publishing and presenting at top conferences? Q : Why doing internships? Ø All in all, these are just signals (directly observable) to indicate “excellence” (not directly observable) Ø Asymmetric information between employees and employers 2001 Nobel Econ Price is awarded to research on asymmetric information 6

  7. Signaling Ø A simple example • We want to hire an Applied ML researcher • Only two types of ML researchers in this world • Easy to tell COLT TML theoretical idea NeurIPs AML applied idea KDD Σ : Signals 𝑇 : Samples 𝑀 : hidden (observable) (unobservable) types/labels 7

  8. Signaling Ø A simple example • We want to hire an Applied ML researcher • Only two types of ML researchers in this world • Easy to tell COLT TML theoretical idea NeurIPs AML applied idea KDD Σ : Signals 𝑇 : Samples 𝑀 : hidden (observable) (unobservable) types/labels Our world is known to be noisy…. 8

  9. Signaling Ø A simple example • We want to hire an Applied ML researcher • Only two types of ML researchers in this world 0.8 COLT TML theoretical idea 0.2 NeurIPs 0.2 AML applied idea KDD 0.8 reporting strategy Σ : Signals 𝑇 : Samples 𝑀 : hidden (observable) (unobservable) types/labels 𝑚 ∈ 𝑀 is a distribution generated by 𝑚 over ideas 9

  10. Signaling Ø Agent’s problem: • How do I distinguish myself from other types? • How many ideas do I need for that? Ø Principle’s problem: • How do I tell AML agents from others (a classification problem)? • How many papers should I expect to read? Answers for this particular instance? 10

  11. Signaling Ø Agent’s problem: • How do I distinguish myself from other types? • How many ideas do I need for that? Ø Principle’s problem: • How do I tell AML agents from others (a classification problem)? • How many papers should I expect to read? Generally, classification with strategically transformed samples 11

  12. What Instances May Be Difficult? 0.4 COLT TML theoretical idea 0.4 0.2 NeurIPs middle idea 0.2 0.4 AML applied idea KDD 0.4 reporting strategy Σ : Signals 𝑇 : Samples 𝑀 : hidden (observable) (unobservable) types/labels Intuitions Ø Agent: try to report as far from others as possible Ø Principal: examine a set of signals that maximally separate AML from TML 12

  13. Outline Ø Introduction Ø The Model and Results 13

  14. Model Ø Two distribution types/labels: 𝑚 ∈ {𝑕, 𝑐} • 𝑕 should be interpreted as “desired”, not necessarily good or bad Ø 𝑕, 𝑐 ∈ Δ(𝑇) where 𝑇 is the set of samples Ø Bipartite graph 𝐻 = (𝑇 ∪ Σ, 𝐹) captures feasible signals for each sample: 𝑡, 𝜏 ∈ 𝐹 iff 𝜏 is a valid signal for 𝑡 Ø 𝑕, 𝑐, 𝐻 publicly known; 𝑇, Σ both discrete Ø Distribution 𝑚 ∈ 𝑕, 𝑐 generates 𝑈 samples 14

  15. Model Ø Two distribution types/labels: 𝑚 ∈ {𝑕, 𝑐} • 𝑕 should be interpreted as “desired”, not necessarily good or bad Ø 𝑕, 𝑐 ∈ Δ(𝑇) where 𝑇 is the set of samples Ø Bipartite graph 𝐻 = (𝑇 ∪ Σ, 𝐹) captures feasible signals for each sample: 𝑡, 𝜏 ∈ 𝐹 iff 𝜏 is a valid signal for 𝑡 Ø 𝑕, 𝑐, 𝐻 publicly known; 𝑇, Σ both discrete Ø Distribution 𝑚 ∈ 𝑕, 𝑐 generates 𝑈 samples Ø A few special cases • Agent can hide samples, as in last lecture (captured by adding a “empty signal”) • Signal space may be the same as samples (i.e., 𝑇 = Σ ); 𝐻 captures feasible “lies” 15

  16. The Game Agent’s reporting strategy 𝜌 transform 𝑈 samples to a set 𝑆 of 𝑈 signals Ø A reporting strategy is a signaling scheme • Fully described by 𝜌 𝜏 𝑡 = prob of sending signal 𝜏 for sample 𝑡 • ∑ = 𝜌 𝜏 𝑡 = 1 for all 𝑡 𝜌 𝜏 𝑡 16

  17. The Game Agent’s reporting strategy 𝜌 transform 𝑈 samples to a set 𝑆 of 𝑈 signals Ø A reporting strategy is a signaling scheme • Fully described by 𝜌 𝜏 𝑡 = prob of sending signal 𝜏 for sample 𝑡 • ∑ = 𝜌 𝜏 𝑡 = 1 for all 𝑡 Ø Given 𝑈 samples, 𝜌 generates 𝑈 signals (possibly randomly) as an agent report 𝑆 ∈ Σ ? Ø A special case is deterministic reporting strategy 𝜌 𝜏 𝑡 17

  18. The Game Agent’s reporting strategy 𝜌 transform 𝑈 samples to a set 𝑆 of 𝑈 signals Ø Objective: maximize probability of being accepted Principal’s action 𝑔: Σ ? → [0,1] maps agent’s report to an acceptance prob Ø Objective: minimize prob of mistakes (i.e., reject 𝑕 or accept 𝑐 ) Remark: Ø Timeline: principal announces 𝑔 first; agent then best responds Ø Type 𝑕 ’s [ 𝑐 ’s] incentive is aligned with [opposite to] principal 18

  19. A Simpler Case Ø Say 𝑚 ∈ {𝑕, 𝑐} generates 𝑈 = ∞ many samples Ø Any reporting strategy 𝜌 generates a distribution over Σ • Pr(𝜏) = ∑ H∈I 𝜌 𝜏 𝑡 ⋅ 𝑚(𝑡) = 𝜌 𝜏|𝑚 (slight abuse of notation) • 𝜌 𝜏|𝑚 is linear in variables 𝜌 𝜏 𝑡 Ø Intuitively, type 𝑕 should make his 𝜌 “far from” other’s distribution • Total variance (TV) distance turns out to be the right measure 19

  20. Total Variance Distance Ø Discrete distribution 𝑦, 𝑧 supported on Σ • Let 𝑦 𝐵 = ∑ =∈O 𝑦(𝜏) = Pr =∼Q (𝜏 ∈ 𝐵) 𝑒 ?S 𝑦, 𝑧 = max W [𝑦 𝐵 − 𝑧(𝐵)] = ∑ =: Q = YZ(=) [𝑦 𝜏 − 𝑧(𝜏)] [ [ \ ∑ =: Q = YZ(=) [𝑦 𝜏 − 𝑧(𝜏)] + \ ∑ =:Z = ^Q(=) [𝑧 𝜏 − 𝑦(𝜏)] = These two terms are equal 20

  21. Total Variance Distance Ø Discrete distribution 𝑦, 𝑧 supported on Σ • Let 𝑦 𝐵 = ∑ =∈O 𝑦(𝜏) = Pr =∼Q (𝜏 ∈ 𝐵) 𝑒 ?S 𝑦, 𝑧 = max W [𝑦 𝐵 − 𝑧(𝐵)] = ∑ =: Q = YZ(=) [𝑦 𝜏 − 𝑧(𝜏)] [ [ \ ∑ =: Q = YZ(=) [𝑦 𝜏 − 𝑧(𝜏)] + \ ∑ =:Z = ^Q(=) [𝑧 𝜏 − 𝑦(𝜏)] = [ \ ∑ = |𝑦 𝜏 − 𝑧 𝜏 | = [ \ | 𝑦 − 𝑧 | [ = 21

  22. How Can 𝑕 Distinguish Himself from 𝑐 ? Ø Type 𝑕 uses reporting strategy 𝜌 (and 𝑐 uses 𝜚 ) Ø Type 𝑕 wants 𝜌(⋅ |𝑕) to be far from 𝜚(⋅ |𝑐) à What about type 𝑐 ? Ø This naturally motivates a zero-sum game between 𝑕, 𝑐 max min c 𝑒 ?S ( 𝜌 ⋅ 𝑕 , 𝜚 ⋅ 𝑐 ) = 𝑒 d?S (𝑕, 𝑐) ` Game value of this zero-sum game 22

  23. How Can 𝑕 Distinguish Himself from 𝑐 ? Ø Type 𝑕 uses reporting strategy 𝜌 (and 𝑐 uses 𝜚 ) Ø Type 𝑕 wants 𝜌(⋅ |𝑕) to be far from 𝜚(⋅ |𝑐) à What about type 𝑐 ? Ø This naturally motivates a zero-sum game between 𝑕, 𝑐 max min c 𝑒 ?S ( 𝜌 ⋅ 𝑕 , 𝜚 ⋅ 𝑐 ) = 𝑒 d?S (𝑕, 𝑐) ` Note 𝑒 d?S 𝑕, 𝑐 ≥ 0 ….now, what happens if 𝑒 d?S 𝑕, 𝑐 > 0 ? 23

  24. How Can 𝑕 Distinguish Himself from 𝑐 ? Ø Type 𝑕 uses reporting strategy 𝜌 (and 𝑐 uses 𝜚 ) Ø Type 𝑕 wants 𝜌(⋅ |𝑕) to be far from 𝜚(⋅ |𝑐) à What about type 𝑐 ? Ø This naturally motivates a zero-sum game between 𝑕, 𝑐 max min c 𝑒 ?S ( 𝜌 ⋅ 𝑕 , 𝜚 ⋅ 𝑐 ) = 𝑒 d?S (𝑕, 𝑐) ` Note 𝑒 d?S 𝑕, 𝑐 ≥ 0 ….now, what happens if 𝑒 d?S 𝑕, 𝑐 > 0 ? Ø 𝑕 has a strategy 𝜌 ∗ such that d ij 𝜌 ∗ ⋅ 𝑕 , 𝜚 ⋅ 𝑐 > 0 for any 𝜚 Ø Using 𝜌 ∗ , 𝑕 can distinguish himself from 𝑐 with constant probability via [ Θ samples r l mno p,q [ • Recall: Θ( s r ) samples suffice to distinguish 𝑦, 𝑧 with 𝑒 ?S 𝑦, 𝑧 = 𝜗 • Principal only needs to check whether report 𝑆 is drawn from 𝜌 ∗ ⋅ 𝑕 or not 24

  25. How Can 𝑕 Distinguish Himself from 𝑐 ? Ø So 𝑒 d?S 𝑕, 𝑐 > 0 is sufficient for distinguishing 𝑕 from 𝑐 Ø It turns out that it is also necessary Theorem : If 𝑒 d?S 𝑕, 𝑐 = 𝜗 > 0 , then there is a policy 𝑔 that makes 1. [ w /𝜗 \ . mistakes with probability 𝜀 when #samples 𝑈 ≥ 2 ln If 𝑒 d?S 𝑕, 𝑐 = 0 , then no policy 𝑔 can separate 𝑕 from 𝑐 2. regardless how large is #samples 𝑈 . 25

  26. How Can 𝑕 Distinguish Himself from 𝑐 ? Ø So 𝑒 d?S 𝑕, 𝑐 > 0 is sufficient for distinguishing 𝑕 from 𝑐 Ø It turns out that it is also necessary Theorem : If 𝑒 d?S 𝑕, 𝑐 = 𝜗 > 0 , then there is a policy 𝑔 that makes 1. [ w /𝜗 \ . mistakes with probability 𝜀 when #samples 𝑈 ≥ 2 ln If 𝑒 d?S 𝑕, 𝑐 = 0 , then no policy 𝑔 can separate 𝑕 from 𝑐 2. regardless how large is #samples 𝑈 . Remarks: Ø Prob of mistake 𝜀 can be made arbitrarily small with more samples Ø We have shown the first part Ø Second part is more difficult to prove, uses an elegant result for matching theory 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend