Minimax Statistical Learning with Wasserstein distances Jaeho Lee - - PowerPoint PPT Presentation
Minimax Statistical Learning with Wasserstein distances Jaeho Lee - - PowerPoint PPT Presentation
Minimax Statistical Learning with Wasserstein distances Jaeho Lee & Maxim Raginsky NeurIPS 2018 Poster #86 Minimax learning Goal: find the hypothesis minimizing the worst-case risk is an ambiguity set representing
“Minimax” learning
Goal: find the hypothesis minimizing the worst-case risk
- domain drift ( mismatch of training & test distribution )
- adversarial attack ( enhancing robustness of hypothesis )
… is an ambiguity set representing uncertainty, e.g. Γ(P, ϱ)
“Minimax” learning
Goal: find the hypothesis minimizing the worst-case risk
- domain drift ( mismatch of training & test distribution )
- adversarial attack ( enhancing robustness of hypothesis )
… is an ambiguity set representing uncertainty, e.g. Γ(P, ϱ)
Approach: find the hypothesis minimizing the empirical risk
“Minimax” learning
Goal: find the hypothesis minimizing the worst-case risk
- domain drift ( mismatch of training & test distribution )
- adversarial attack ( enhancing robustness of hypothesis )
… is an ambiguity set representing uncertainty, e.g. Γ(P, ϱ)
Approach: find the hypothesis minimizing the empirical risk Question: what is the speed of convergence
“Minimax” learning
Goal: find the hypothesis minimizing the worst-case risk
- domain drift ( mismatch of training & test distribution )
- adversarial attack ( enhancing robustness of hypothesis )
… is an ambiguity set representing uncertainty, e.g. Γ(P, ϱ)
Approach: find the hypothesis minimizing the empirical risk Question: what is the speed of convergence
Focus on 1-Wasserstein ambiguity ball! (we have results for p-Wasserstein balls, too! See Poster#86)
P ϱ
Taming the supremum
Main challenge is to handle the supremum.
Taming the supremum
Main challenge is to handle the supremum. Trick: (1) write down the dual form
Taming the supremum
Main challenge is to handle the supremum. (2) empirical risk minimization is now joint minimization Trick: (1) write down the dual form
Taming the supremum
Main challenge is to handle the supremum. (2) empirical risk minimization is now joint minimization Trick: (1) write down the dual form (3) gauge the complexity of the “set of all possible ”
With high probability,
Result
Theorem) Under mild assumptions, with high probability,
- vanishes to 0 as the sample size grows.
- does not require Lipschitz-type assumptions on f
- similar procedure could be applied for any ambiguity set
with suitable dual form
Come to poster #86 for…
- applications to domain adaptation
- complementary generalization bound recovering
classic bound as
- Results on p-Wasserstein balls