Sample Amplification: Increasing Dataset Size even when Learning is - - PowerPoint PPT Presentation

sample amplification
SMART_READER_LITE
LIVE PREVIEW

Sample Amplification: Increasing Dataset Size even when Learning is - - PowerPoint PPT Presentation

Sample Amplification: Increasing Dataset Size even when Learning is Impossible Brian Axelrod Shivam Garg Greg Valiant Vatsal Sharan 0 0 0 2 , 4 3 $ o r f r s u Y o What does it mean that a GAN made this image? (Does it mean


slide-1
SLIDE 1

Shivam Garg Greg Valiant Brian Axelrod

Sample Amplification:

Increasing Dataset Size even when Learning is Impossible

Vatsal Sharan

slide-2
SLIDE 2

What does it mean that a GAN made this image?

(Does it mean that GANs “know” the distribution of renaissance portraits?)

Y

  • u

r s f

  • r

$ 4 3 2 ,

slide-3
SLIDE 3

When can you make more data? Could you generate new samples from a distribution, without even ``learning’’ it?

slide-4
SLIDE 4

Input: m samples, distribution D Output: ACCEPT or REJECT Promise: If input is m i.i.d. draws from D, then w. prob > ¾, must ACCEPT. Verifier Input: n i.i.d. samples from D Output: m > n “samples” Amplifier Verifier: 1. Knows D 2. Is computationally unbounded 3. Does not know training set

New Problem: Sample Amplification

slide-5
SLIDE 5

Sample Amplification

Definition: A class of distributions C admits (n,m)-amplification, if there is an (n,m) Amplifer s.t. for all D ∈ C, any Verifier will ACCEPT with prob > 2/3.

Verifier: knows D, is computationally unbounded

slide-6
SLIDE 6

Sample Amplification

Definition: A class of distributions C admits (n,m)-amplification, if there is an (n,m) Amplifer s.t. for all D ∈ C, any Verifier will ACCEPT with prob > 2/3.

  • Every class C admits (n,n)-amplification (why?)
  • Verifier does not see Amplifier’s n input samples. (Otherwise

equivalent to learning)

  • Up to constant factors, equivalent to asking whether Amplifier

can output m samples, whose T.V. distance to m i.i.d. samples from D is small.

slide-7
SLIDE 7

Sample Amplification

Definition: A class of distributions C admits (n,m)-amplification, if there is an (n,m) Amplifer s.t. for all D ∈ C, any Verifier will ACCEPT with prob > 2/3. Connection to GANs: Amplifier -> Generator, Verifier -> Discriminator? Not quite.. Similarities in how samples are used and evaluated.

slide-8
SLIDE 8

RESULTS

slide-9
SLIDE 9

Sample Amplification

Thm 2: Let C be class of Gaussians in d dimensions, with fixed covariance (e.g. “isotropic”), and unknown mean. (n, n + n/sqrt(d))-amplification is possible (and optimal, to constant factors) * Nontrivial amplification possible as soon as n > sqrt(k). * Learning to nontrivial accuracy requires n=𝜄(k) samples * Even with n >> k can never amplify by arbitrary amount. * Nontrivial amplification possible as soon as n > sqrt(d). * Learning to nontrivial accuracy requires n=𝜄(d) samples Thm 1: Let C be class of discrete distributions supported on ≤ k elements. (n, n + n/sqrt(k))-amplification is possible (and optimal, to constant factors)

slide-10
SLIDE 10

GAUSSIAN DISTRIBUTION

slide-11
SLIDE 11

Intuitively, issue is new “samples” would be too correlated with originals: Algorithm: 1) Draw xn+1…xm using empirical mean u* of input samples. 2) For each input sample xi “decorrelate” it from u*. 3) Return xn+1…xm along with “decorrelated” original samples. Thm 3: If output ⊃ input samples, require n > d/ log d for nontrivial amp. Thm 2: For Gaussians in d dimensions, with fixed covariance, and unknown mean:

  • Learning requires n = d.
  • Amplification possible starting at n = sqrt(d).

(n, n + n/sqrt(d))-amplification is possible (and optimal, to constant factors)

slide-12
SLIDE 12

IS AMPLIFICATION USEFUL?

slide-13
SLIDE 13
slide-14
SLIDE 14

Amplification does not add new information, but could make original information more easily accessible.

Can widely used statistical tools do better on amplified samples?

Statistical estimator

Data

slide-15
SLIDE 15

Amplification does not add new information, but could make original information more easily accessible.

Can widely used statistical tools do better on amplified samples?

Statistical estimator

Data

Amplifier

Amplified Data

slide-16
SLIDE 16

Given examples (𝑦, 𝑧)~𝐸 estimate error of best linear model Standard unbiased estimator: Error of least-squares model, scaled down

Error of classical estimator vs. same estimator on (𝑜, 𝑜 + 2) amplified samples.

𝑦~ 𝐻𝑏𝑣𝑡𝑡𝑗𝑏𝑜(𝑒 = 50), 𝑧 = 𝜄8𝑦 + 𝐻𝑏𝑣𝑡𝑡𝑗𝑏𝑜 𝑜𝑝𝑗𝑡𝑓

Amplification Maybe Useful?

slide-17
SLIDE 17

Amplification Maybe Useful?

Statistical estimator

Data

Amplifier

Amplified Data

slide-18
SLIDE 18

FUTURE DIRECTIONS

slide-19
SLIDE 19

What property of a class of distributions determines threshold at which non-trivial amplification is possible? More general amplification schemes?

How much does Verifier need to know about n input samples to preclude amplification without learning? [How much do we need to know about a GAN’s input, to evaluate its output?] What if Verifier doesn’t know D, only gets sample access? MORE powerful Verifier? L E S S p

  • w

e r f u l V e r i f i e r ?

slide-20
SLIDE 20

THANK YOU!

Amplifier