Doubly-Competitive Distribution Estimation Yi Hao and Alon Orlitsky - - PowerPoint PPT Presentation

doubly competitive distribution estimation
SMART_READER_LITE
LIVE PREVIEW

Doubly-Competitive Distribution Estimation Yi Hao and Alon Orlitsky - - PowerPoint PPT Presentation

Doubly-Competitive Distribution Estimation Yi Hao and Alon Orlitsky Department of Electrical and Computer Engineering University of California, San Diego ICML, June 11, 2019 Yi Hao and Alon Orlitsky (UCSD) Doubly-Competitive Distribution


slide-1
SLIDE 1

Doubly-Competitive Distribution Estimation

Yi Hao and Alon Orlitsky

Department of Electrical and Computer Engineering University of California, San Diego

ICML, June 11, 2019

Yi Hao and Alon Orlitsky (UCSD) Doubly-Competitive Distribution Estimation ICML, June 11, 2019 1 / 7

slide-2
SLIDE 2

Distribution Estimation

p - unknown distribution over {1, 2, . . . , k} X n := X1, X2, . . . , Xn ∼ p independently qXn - estimate based on X n Loss: Kullback-Leibler divergence ℓ(p, qXn) :=

k

  • x=1

p(x) log p(x) qXn(x)

Yi Hao and Alon Orlitsky (UCSD) Doubly-Competitive Distribution Estimation ICML, June 11, 2019 2 / 7

slide-3
SLIDE 3

Competitive Distribution Estimation

All reasonable estimators are natural

Same probability to symbols appearing same # times qabbc(a) = qabbc(c)

Goal: Estimate every p as well as best natural estimator Genie-estimator: knows p, but natural, hence incurs a loss Opt(p, X n) := min

q - natural ℓ(p, qXn)

(Orlitsky & Suresh, 2015) Good-Turing variation qGT

For every p, with high probability ℓ(p, qGT

Xn ) ≤ Opt(p, X n) + O

1 √n ∧ k n

  • Yi Hao and Alon Orlitsky (UCSD)

Doubly-Competitive Distribution Estimation ICML, June 11, 2019 3 / 7

slide-4
SLIDE 4

Doubly-Competitive Distribution Estimation

DΦ := # of distinct frequencies of symbols in X n X n = a b a c d e = ⇒ a appeared twice, b c d e appeared once = ⇒ DΦ = 2 Single estimator q⋆ achieving (w.h.p.) ℓ(p, q⋆

Xn) ≤ Opt(p, X n) + O

DΦ n

  • Uniform bound: DΦ ≤

√ 2n ∧ k = ⇒ (Orlitsky & Suresh, 2015) Better bounds for many distribution classes:

T-step: DΦ T · n

1 3 ; Uniform: DΦ n 1 3

Log-concave with SD ≈ σ: DΦ σ ∧

  • n2

σ

1

3

Enveloped power-law {p : p(x) x−α}: DΦ n−

α α+1

Log-convex distribution families, etc.

Yi Hao and Alon Orlitsky (UCSD) Doubly-Competitive Distribution Estimation ICML, June 11, 2019 4 / 7

slide-5
SLIDE 5

Estimator Construction

Φ(t) := # of symbols appearing t times Good-Turing Estimator qGT(x) := t + 1 n · Φ(t + 1) Φ(t) Observation: For x appearing t log n times, and Φ(t) log2 n qGT has sub-optimal variance in estimating p(x) Averaging unbiased estimators reduces the variance D(t) := weighted average of Φ(t′) for |t′ − t|

  • t/ log n

q⋆(x) := t + 1 n · D(t + 1) D(t) , For other x, use Good-Turing or empirical

Yi Hao and Alon Orlitsky (UCSD) Doubly-Competitive Distribution Estimation ICML, June 11, 2019 5 / 7

slide-6
SLIDE 6

Experimental Results

Two-step distribution

Yi Hao and Alon Orlitsky (UCSD) Doubly-Competitive Distribution Estimation ICML, June 11, 2019 6 / 7

slide-7
SLIDE 7

Thank You

Yi Hao and Alon Orlitsky (UCSD) Doubly-Competitive Distribution Estimation ICML, June 11, 2019 7 / 7