Announcement Grades for HW2 and project proposal are released 1 - - PowerPoint PPT Presentation

announcement
SMART_READER_LITE
LIVE PREVIEW

Announcement Grades for HW2 and project proposal are released 1 - - PowerPoint PPT Presentation

Announcement Grades for HW2 and project proposal are released 1 CS6501: T opics in Learning and Game Theory (Fall 2019) Learning from Strategically Transformed Samples Instructor: Haifeng Xu Part of the Slides are provided by Hanrui Zhang


slide-1
SLIDE 1

1

Announcement

Ø Grades for HW2 and project proposal are released

slide-2
SLIDE 2

CS6501: T

  • pics in Learning and Game Theory

(Fall 2019) Learning from Strategically Transformed Samples

Instructor: Haifeng Xu

Part of the Slides are provided by Hanrui Zhang

slide-3
SLIDE 3

3

Outline

Ø Introduction Ø The Model and Results

slide-4
SLIDE 4

4

Signaling

Q: Why attending good universities? Q: Why publishing and presenting at top conferences? Q: Why doing internships?

slide-5
SLIDE 5

5

Signaling

Q: Why attending good universities? Q: Why publishing and presenting at top conferences? Q: Why doing internships?

Ø All in all, these are just signals (directly observable) to indicate “excellence” (not directly observable)

slide-6
SLIDE 6

6

Signaling

Q: Why attending good universities? Q: Why publishing and presenting at top conferences? Q: Why doing internships?

Ø All in all, these are just signals (directly observable) to indicate “excellence” (not directly observable) Ø Asymmetric information between employees and employers 2001 Nobel Econ Price is awarded to research on asymmetric information

slide-7
SLIDE 7

7

Signaling

Ø A simple example

  • We want to hire an Applied ML researcher
  • Only two types of ML researchers in this world
  • Easy to tell

AML

theoretical idea applied idea COLT KDD 𝑀: hidden types/labels 𝑇: Samples (unobservable) Σ: Signals (observable)

TML

NeurIPs

slide-8
SLIDE 8

8

Signaling

Ø A simple example

  • We want to hire an Applied ML researcher
  • Only two types of ML researchers in this world
  • Easy to tell

AML

theoretical idea applied idea COLT KDD 𝑀: hidden types/labels 𝑇: Samples (unobservable) Σ: Signals (observable)

TML

NeurIPs Our world is known to be noisy….

slide-9
SLIDE 9

9

Signaling

Ø A simple example

  • We want to hire an Applied ML researcher
  • Only two types of ML researchers in this world

AML

theoretical idea applied idea COLT KDD 𝑀: hidden types/labels

TML

NeurIPs 0.2 0.8 0.2 0.8

𝑚 ∈ 𝑀 is a distribution

  • ver ideas

generated by 𝑚

𝑇: Samples (unobservable) Σ: Signals (observable)

reporting strategy

slide-10
SLIDE 10

10

Signaling

Ø Agent’s problem:

  • How do I distinguish myself from other types?
  • How many ideas do I need for that?

Ø Principle’s problem:

  • How do I tell AML agents from others (a classification problem)?
  • How many papers should I expect to read?

Answers for this particular instance?

slide-11
SLIDE 11

11

Signaling

Ø Agent’s problem:

  • How do I distinguish myself from other types?
  • How many ideas do I need for that?

Ø Principle’s problem:

  • How do I tell AML agents from others (a classification problem)?
  • How many papers should I expect to read?

Generally, classification with strategically transformed samples

slide-12
SLIDE 12

12

What Instances May Be Difficult?

AML

theoretical idea applied idea COLT KDD 𝑀: hidden types/labels

TML

NeurIPs 0.2 0.4 0.2 0.4 𝑇: Samples (unobservable) Σ: Signals (observable)

reporting strategy

middle idea 0.4 0.4 Intuitions

ØAgent: try to report as far from others as possible ØPrincipal: examine a set of signals that maximally separate AML from TML

slide-13
SLIDE 13

13

Outline

Ø Introduction Ø The Model and Results

slide-14
SLIDE 14

14

Model

ØTwo distribution types/labels: 𝑚 ∈ {𝑕, 𝑐}

  • 𝑕 should be interpreted as “desired”, not necessarily good or bad

Ø𝑕, 𝑐 ∈ Δ(𝑇) where 𝑇 is the set of samples ØBipartite graph 𝐻 = (𝑇 ∪ Σ, 𝐹) captures feasible signals for each

sample: 𝑡, 𝜏 ∈ 𝐹 iff 𝜏 is a valid signal for 𝑡

Ø𝑕, 𝑐, 𝐻 publicly known; 𝑇, Σ both discrete ØDistribution 𝑚 ∈ 𝑕, 𝑐 generates 𝑈 samples

slide-15
SLIDE 15

15

Model

ØTwo distribution types/labels: 𝑚 ∈ {𝑕, 𝑐}

  • 𝑕 should be interpreted as “desired”, not necessarily good or bad

Ø𝑕, 𝑐 ∈ Δ(𝑇) where 𝑇 is the set of samples ØBipartite graph 𝐻 = (𝑇 ∪ Σ, 𝐹) captures feasible signals for each

sample: 𝑡, 𝜏 ∈ 𝐹 iff 𝜏 is a valid signal for 𝑡

Ø𝑕, 𝑐, 𝐻 publicly known; 𝑇, Σ both discrete ØDistribution 𝑚 ∈ 𝑕, 𝑐 generates 𝑈 samples ØA few special cases

  • Agent can hide samples, as in last lecture (captured by adding a

“empty signal”)

  • Signal space may be the same as samples (i.e., 𝑇 = Σ); 𝐻 captures

feasible “lies”

slide-16
SLIDE 16

16

The Game

Agent’s reporting strategy 𝜌 transform 𝑈 samples to a set 𝑆 of 𝑈 signals

ØA reporting strategy is a signaling scheme

  • Fully described by 𝜌 𝜏 𝑡 = prob of sending signal 𝜏 for sample 𝑡
  • ∑= 𝜌 𝜏 𝑡 = 1 for all 𝑡

𝜌 𝜏 𝑡

slide-17
SLIDE 17

17

The Game

Agent’s reporting strategy 𝜌 transform 𝑈 samples to a set 𝑆 of 𝑈 signals

ØA reporting strategy is a signaling scheme

  • Fully described by 𝜌 𝜏 𝑡 = prob of sending signal 𝜏 for sample 𝑡
  • ∑= 𝜌 𝜏 𝑡 = 1 for all 𝑡

ØGiven 𝑈 samples, 𝜌 generates 𝑈 signals (possibly randomly) as

an agent report 𝑆 ∈ Σ?

ØA special case is deterministic reporting strategy

𝜌 𝜏 𝑡

slide-18
SLIDE 18

18

The Game

Remark:

ØTimeline: principal announces 𝑔 first; agent then best responds ØType 𝑕’s [𝑐’s] incentive is aligned with [opposite to] principal

Principal’s action 𝑔: Σ? → [0,1] maps agent’s report to an acceptance prob Agent’s reporting strategy 𝜌 transform 𝑈 samples to a set 𝑆 of 𝑈 signals Ø Objective: minimize prob of mistakes (i.e., reject 𝑕 or accept 𝑐) Ø Objective: maximize probability of being accepted

slide-19
SLIDE 19

19

A Simpler Case

ØSay 𝑚 ∈ {𝑕, 𝑐} generates 𝑈 = ∞ many samples ØAny reporting strategy 𝜌 generates a distribution over Σ

  • Pr(𝜏) = ∑H∈I 𝜌 𝜏 𝑡 ⋅ 𝑚(𝑡)
  • 𝜌 𝜏|𝑚 is linear in variables 𝜌 𝜏 𝑡

ØIntuitively, type 𝑕 should make his 𝜌 “far from” other’s distribution

  • Total variance (TV) distance turns out to be the right measure

= 𝜌 𝜏|𝑚 (slight abuse of notation)

slide-20
SLIDE 20

20

Total Variance Distance

ØDiscrete distribution 𝑦, 𝑧 supported on Σ

  • Let 𝑦 𝐵 = ∑=∈O 𝑦(𝜏) = Pr

=∼Q(𝜏 ∈ 𝐵)

𝑒?S 𝑦, 𝑧 = max

W [𝑦 𝐵 − 𝑧(𝐵)]

= ∑=: Q = YZ(=)[𝑦 𝜏 − 𝑧(𝜏)] =

[ \ ∑=: Q = YZ(=)[𝑦 𝜏 − 𝑧(𝜏)] + [ \ ∑=:Z = ^Q(=)[𝑧 𝜏 − 𝑦(𝜏)]

These two terms are equal

slide-21
SLIDE 21

21

Total Variance Distance

ØDiscrete distribution 𝑦, 𝑧 supported on Σ

  • Let 𝑦 𝐵 = ∑=∈O 𝑦(𝜏) = Pr

=∼Q(𝜏 ∈ 𝐵)

𝑒?S 𝑦, 𝑧 = max

W [𝑦 𝐵 − 𝑧(𝐵)]

= ∑=: Q = YZ(=)[𝑦 𝜏 − 𝑧(𝜏)] =

[ \ ∑=: Q = YZ(=)[𝑦 𝜏 − 𝑧(𝜏)] + [ \ ∑=:Z = ^Q(=)[𝑧 𝜏 − 𝑦(𝜏)]

=

[ \ ∑= |𝑦 𝜏 − 𝑧 𝜏 |

=

[ \ | 𝑦 − 𝑧 |[

slide-22
SLIDE 22

22

How Can 𝑕 Distinguish Himself from 𝑐?

ØType 𝑕 uses reporting strategy 𝜌 (and 𝑐 uses 𝜚) ØType 𝑕 wants 𝜌(⋅ |𝑕) to be far from 𝜚(⋅ |𝑐) ØThis naturally motivates a zero-sum game between 𝑕, 𝑐

max

`

min

c 𝑒?S ( 𝜌 ⋅ 𝑕 , 𝜚 ⋅ 𝑐 ) = 𝑒d?S(𝑕, 𝑐)

à What about type 𝑐? Game value of this zero-sum game

slide-23
SLIDE 23

23

How Can 𝑕 Distinguish Himself from 𝑐?

ØType 𝑕 uses reporting strategy 𝜌 (and 𝑐 uses 𝜚) ØType 𝑕 wants 𝜌(⋅ |𝑕) to be far from 𝜚(⋅ |𝑐) ØThis naturally motivates a zero-sum game between 𝑕, 𝑐

max

`

min

c 𝑒?S ( 𝜌 ⋅ 𝑕 , 𝜚 ⋅ 𝑐 ) = 𝑒d?S(𝑕, 𝑐)

Note 𝑒d?S 𝑕, 𝑐 ≥ 0….now, what happens if 𝑒d?S 𝑕, 𝑐 > 0?

à What about type 𝑐?

slide-24
SLIDE 24

24

How Can 𝑕 Distinguish Himself from 𝑐?

ØType 𝑕 uses reporting strategy 𝜌 (and 𝑐 uses 𝜚) ØType 𝑕 wants 𝜌(⋅ |𝑕) to be far from 𝜚(⋅ |𝑐) ØThis naturally motivates a zero-sum game between 𝑕, 𝑐

max

`

min

c 𝑒?S ( 𝜌 ⋅ 𝑕 , 𝜚 ⋅ 𝑐 ) = 𝑒d?S(𝑕, 𝑐)

Note 𝑒d?S 𝑕, 𝑐 ≥ 0….now, what happens if 𝑒d?S 𝑕, 𝑐 > 0?

Ø𝑕 has a strategy 𝜌∗ such that dij 𝜌∗ ⋅ 𝑕 , 𝜚 ⋅ 𝑐

> 0 for any 𝜚

ØUsing 𝜌∗, 𝑕 can distinguish himself from 𝑐 with constant probability via

Θ

[ lmno p,q

r

samples

  • Recall: Θ(

[ sr) samples suffice to distinguish 𝑦, 𝑧 with 𝑒?S 𝑦, 𝑧 = 𝜗

  • Principal only needs to check whether report 𝑆 is drawn from 𝜌∗ ⋅ 𝑕 or not

à What about type 𝑐?

slide-25
SLIDE 25

25

How Can 𝑕 Distinguish Himself from 𝑐?

ØSo 𝑒d?S 𝑕, 𝑐 > 0 is sufficient for distinguishing 𝑕 from 𝑐 ØIt turns out that it is also necessary

Theorem: 1. If 𝑒d?S 𝑕, 𝑐 = 𝜗 > 0, then there is a policy 𝑔 that makes mistakes with probability 𝜀 when #samples 𝑈 ≥ 2 ln

[ w /𝜗\.

2. If 𝑒d?S 𝑕, 𝑐 = 0, then no policy 𝑔 can separate 𝑕 from 𝑐 regardless how large is #samples 𝑈.

slide-26
SLIDE 26

26

How Can 𝑕 Distinguish Himself from 𝑐?

ØSo 𝑒d?S 𝑕, 𝑐 > 0 is sufficient for distinguishing 𝑕 from 𝑐 ØIt turns out that it is also necessary

Theorem: 1. If 𝑒d?S 𝑕, 𝑐 = 𝜗 > 0, then there is a policy 𝑔 that makes mistakes with probability 𝜀 when #samples 𝑈 ≥ 2 ln

[ w /𝜗\.

2. If 𝑒d?S 𝑕, 𝑐 = 0, then no policy 𝑔 can separate 𝑕 from 𝑐 regardless how large is #samples 𝑈. Remarks:

ØProb of mistake 𝜀 can be made arbitrarily small with more samples ØWe have shown the first part ØSecond part is more difficult to prove, uses an elegant result for matching

theory

slide-27
SLIDE 27

27

But…Deciding Whether 𝑒d?S 𝑕, 𝑐 > 0 is Hard

ØRecall 𝑒d?S 𝑕, 𝑐 = max

`

min

c 𝑒?S ( 𝜌 ⋅ 𝑕 , 𝜚 ⋅ 𝑐 )

Theorem: it is NP-hard to check whether 𝑒d?S 𝑕, 𝑐 = 0 or not.

slide-28
SLIDE 28

28

But…Deciding Whether 𝑒d?S 𝑕, 𝑐 > 0 is Hard

ØRecall 𝑒d?S 𝑕, 𝑐 = max

`

min

c 𝑒?S ( 𝜌 ⋅ 𝑕 , 𝜚 ⋅ 𝑐 )

ØWait…this is a zero-sum game, and we can solve it in poly time?

Theorem: it is NP-hard to check whether 𝑒d?S 𝑕, 𝑐 = 0 or not. Q: What goes wrong?

slide-29
SLIDE 29

29

But…Deciding Whether 𝑒d?S 𝑕, 𝑐 > 0 is Hard

ØRecall 𝑒d?S 𝑕, 𝑐 = max

`

min

c 𝑒?S ( 𝜌 ⋅ 𝑕 , 𝜚 ⋅ 𝑐 )

ØWait…this is a zero-sum game, and we can solve it in poly time?

Theorem: it is NP-hard to check whether 𝑒d?S 𝑕, 𝑐 = 0 or not. Q: What goes wrong?

ØWe can only solve normal-form zero-sum games in poly time ØIn that case, utility fnc is linear in both players’ strategies

  • Can generalize to concave-convex utility fnc
  • But here, utility fnc is convex in both player’s strategies
slide-30
SLIDE 30

30

But…Deciding Whether 𝑒d?S 𝑕, 𝑐 > 0 is Hard

ØRecall 𝑒d?S 𝑕, 𝑐 = max

`

min

c 𝑒?S ( 𝜌 ⋅ 𝑕 , 𝜚 ⋅ 𝑐 )

Theorem: it is NP-hard to check whether 𝑒d?S 𝑕, 𝑐 = 0 or not. Proof:

ØWill argue if we can compute 𝜌∗, then we can check 𝑒d?S 𝑕, 𝑐 = 0 or not

  • Thus computing 𝜌∗ must be hard (actually “harder” than checking 𝑒d?S 𝑕, 𝑐 = 0)

ØIf we computed 𝜌∗, to compute 𝑒d?S 𝑕, 𝑐 , we only need to solve

min

c 𝑒?S ( 𝜌∗ ⋅ 𝑕 , 𝜚 ⋅ 𝑐 which is convex in 𝜚

  • Minimize convex fnc can be done efficiently in poly time (well-known)

ØFirst example of reduction in this class

Corollary: it is NP-hard to compute 𝑕’s best strategy 𝜌∗.

slide-31
SLIDE 31

31

Some Remarks

ØSeparability is determined by some “distance” between 𝑕, 𝑐

  • A generalization of TV distance to strategic setting
  • The principal’s policy is relatively simple
  • It is more of our own job to distinguish ourselves from others, rather

than the employer’s

ØThe model can be generalized to many “good” (𝑕y) and “bad”(𝑐

z)

distributions

  • Principal wants to accept any 𝑕y and reject any 𝑐

z

  • Separability is determined by min

y,z 𝑒d?S (𝑕y, 𝑐 z)

ØThe agent’s reporting strategy can even be adaptive

  • i.e., the 𝜌 is different for different samples and may depend on past

signals

  • Results do not change
slide-32
SLIDE 32

32

Next Lecture will talk about how to utilize strategic manipulations to induce desirable social outcome

slide-33
SLIDE 33

Thank You

Haifeng Xu

University of Virginia hx4ad@virginia.edu