Robust Estimation and Generative Adversarial Networks Weizhi ZHU - PowerPoint PPT Presentation

Robust Estimation and Generative Adversarial Networks Weizhi ZHU Hong Kong University of Science and Technology wzhuai@ust.hk April 3, 2019 Robust Estimation and Generative Adversarial Nets [GLYZ18] Generative Adversarial Nets for Robust Scatter Estimation: A Proper Scoring Rule Perspective [GYZ19] Weizhi ZHU (HKUST) Robust Estimation and GANs April 3, 2019 1 / 24

Huber’s Contamination Model Huber’s contamination model [Huber, 1964] , P = (1 − ǫ ) P θ + ǫ Q . Strong contamination model [Diakonikolas et al., 2016a] , TV ( P , P θ ) ≤ ǫ. Can we recover θ by data drawn from P with arbitrary unknown contamination ( ǫ, Q )? Weizhi ZHU (HKUST) Robust Estimation and GANs April 3, 2019 2 / 24

Example: Robust Mean Estimation Let’s firstly consider the robust estimation of location parameter θ in normal distribution, X 1 , . . . , X n ∼ (1 − ǫ ) N ( θ, I p ) + ǫ Q Coordinate-wise median. Tukey median [Tukey, 1978] . � � � � n n � � � u T X i > u T η u T X i ≤ u T η θ = argmax min 1 ∧ 1 � u � 2 =1 η ∈ R p i =1 i =1 Weizhi ZHU (HKUST) Robust Estimation and GANs April 3, 2019 3 / 24

Comparison Median Tukey Median statistical convergence rate p p n n (no contamination) statistical convergence rate p p n ∨ p ǫ 2 n ∨ ǫ 2 , [minimax] (Huber’s ǫ contamination) computational complexity Polynomial NP-Hard Weizhi ZHU (HKUST) Robust Estimation and GANs April 3, 2019 4 / 24

Example: Robust Covariance Estimation We can also estimate the covariance matrix Σ in normal distribution, X 1 , . . . , X n ∼ (1 − ǫ ) N (0 , Σ) + ǫ Q Covariance depth [Chen-Gao-Ren, 2017] . � n � n � � � � | u T X i | 2 > u T Γ u | u T X i | 2 ≤ u T Γ u � ∧ Γ = argmax min 1 1 , � u � 2 =1 Γ > 0 i =1 i =1 (1) � � � � Γ = 3 � Σ = β , P N (0 , 1) < β 4 . � � Σ − Σ � op ≤ C ( p n + ǫ 2 ) with high probability uniformly over Σ and Q . Weizhi ZHU (HKUST) Robust Estimation and GANs April 3, 2019 5 / 24

Computational Complexity Polynomial algorithms are proposed [Lai et al., 2016; Diakonikolas et al., 2018] of nearly minimax optimal statistical precision. - Prior knowledge on ǫ . - Needs some moment constraints. Advantages of the depth estimation. - Does not need prior knowledge on ǫ . - Adaptive to any elliptical distributions. - A well defined objective function. - Any feasible algorithms in practice? Weizhi ZHU (HKUST) Robust Estimation and GANs April 3, 2019 6 / 24

f -divergence Given a convex function f satisfying f (1) = 0, the f -divergence of P from Q is defined as, � � dP � D f ( P � Q ) = f dQ (2) dQ Let f ∗ be the convex conjugate of f , then a variational lower bound of (2) is given by, � � � t p ( x ) q ( x ) − f ∗ ( t ) D f ( P � Q ) = q ( x ) sup dx , t ∈ dom f ∗ E x ∼ P [ T ( x )] − E x ∼ Q [ f ∗ ( T ( x ))] . ≥ sup (3) T ∈T The equality holds in (3) if f ′ � � p ∈ T . q � ˜ � � � � ˜ �� n � 1 q ( X i ) q ( X i ) f ′ f ∗ f ′ D f ( P � Q ) ≥ max − E X ∼ Q . n q ( X i ) q ( X i ) Q ∈ ˜ ˜ Q i =1 (4) Weizhi ZHU (HKUST) Robust Estimation and GANs April 3, 2019 7 / 24

f -GAN and f -Learning f -Learning. Let ˜ Q be a distribution family, � ˜ � � � � ˜ �� n 1 q ( X i ) q ( X i ) � f ′ f ∗ f ′ P = argmin max − E X ∼ Q . n q ( X i ) q ( X i ) Q ∈ ˜ ˜ Q ∈Q Q i =1 f -GAN [Nowozin et al., 2016] , n � 1 � T ( X i ) − E X ∼ Q [ f ∗ ( T ( x ))] , P = argmin max n T ∈T Q ∈Q i =1 where T is usually parametrized by a neural network. - f -GAN can smooth f -Learning’s objective function. - f-divergence is robust. - There exist practical efficient algorithms to solve. Weizhi ZHU (HKUST) Robust Estimation and GANs April 3, 2019 8 / 24

Example f(x) = x log x (KL-divergence), p ∈ ˜ Q (or f ′ ( p / q ) ∈ T ), then KL-Learning (or KL-GAN) becomes maximal likelihood estimate. f(x) = x log x − ( x + 1) log 1+ x (JS-divergence), which leads to the 2 original JS-GAN [Goodfellow et al., 2014] , � n 1 � P = argmin max log (sigmoid ( T ( X i )))+ E x ∼ Q log (1 − sigmoid ( T ( x ))) . n T ∈T Q ∈Q i =1 Weizhi ZHU (HKUST) Robust Estimation and GANs April 3, 2019 9 / 24

Example (Continued) f(x) = ( x − 1) + (TV-divergence) and f ∗ ( t ) = t , 0 ≤ t ≤ 1 . - When taking Q = {N ( θ, I p ) : θ ∈ R p } , Q ( θ, r ) = {N (˜ ˜ θ, I p ) : � ˜ θ − θ � 2 ≤ r } . TV-Learning is defined as, � ˜ � � ˜ � n � 1 q ( X i ) q min max 1 q ( X i ) ≥ 1 − Q q ≥ 1 n Q ∈Q Q ∈ ˜ ˜ Q ( θ, r ) i =1 r → 0 - TV-Learning → Tukey median, � � � n u T X i > u T η max η ∈ R p min � u � 2 =1 i =1 1 . - With T parameterized by the class of neural networks, TV-GAN is defined as, � n 1 � P = argmin max sigmoid ( T ( X i )) − E x ∼ Q [sigmoid ( T ( x ))] . n T ∈T Q ∈Q i =1 Weizhi ZHU (HKUST) Robust Estimation and GANs April 3, 2019 10 / 24

Proper Scoring Rule { S ( · , 1) , S ( · , 0) } is the forecaster’s reward if a player quotes t when event 1 or 0 occurs. S ( t ; p ) = pS ( t , 1) + (1 − p ) S ( t , 0) is the expected reward when the event occurs with probability p . { S ( · , 1) , S ( · , 0) } is a proper scoring rule if S ( p ; p ) ≥ S ( t ; p ) , ∀ t ∈ [0 , 1] . (Savage representation) S is proper iff there exists a convex function G ( · ) such that, � S ( t , 1) = G ( t ) + (1 − t ) G ′ ( t ) , S ( t , 0) = G ( t ) − tG ′ ( t ) . Weizhi ZHU (HKUST) Robust Estimation and GANs April 3, 2019 11 / 24

Proper Scoring Rule and f-divergence We consider a natural cost function with assumption X | y = 1 ∼ P and X | y = 0 ∼ Q with prior P ( y = 1) = 1 / 2, that is, 1 1 2 S ( T ( X ) , 1) + E X ∼ Q 2 S ( T ( X ) , 0) . E X ∼ P Then one can find a good classification rule T ( · ) by maximizing the above objective over T ∈ T , � 1 � 2 E X ∼ P S ( T ( X ) , 1) + 1 2 E X ∼ Q S ( T ( X ) , 0) − G (1 D T ( P , Q ) = max 2) T ∈T Log Score (JS-divergence). S ( t , 1) = log t , S ( t , 0) = log(1 − t ) Zero-One Score (TV-divergence). S ( t , 1) = I { t ≥ 1 / 2 } , S ( t , 0) = I { t < 1 / 2 } . Weizhi ZHU (HKUST) Robust Estimation and GANs April 3, 2019 12 / 24

(Multi-layers) JS-GAN is Statistical Optimal � � n � 1 � θ = argmin max log T ( X i ) + E N ( η, I p ) log(1 − T ( X i )) + log 4 , n T ∈T η ∈ R p i =1 Theorem (Gao-Liu-Yao-Zhu’ 2018) With i.i.d. observations X 1 , ..., X n ∼ (1 − ǫ ) N ( θ, I p ) + ǫ Q and some regularizations on weight matrix, we have � p n ∨ ǫ 2 , at least one bounded activation θ − θ � 2 � � � p log p ∨ ǫ 2 , ReLU n with high probability uniformly over all θ ∈ R p and all Q. It can be generalized to elliptical distribution µ + Σ 1 / 2 ξ U and the strong contamination model. Covariance and mean can be estimated simultaneously. Weizhi ZHU (HKUST) Robust Estimation and GANs April 3, 2019 13 / 24

Proof Sketch �� p log(1 /δ ) sup D ∈D | E P n D ( X ) − E P D ( X ) | ≤ C n + . n �� log(1 /δ ) p sup D ∈D | E P θ D ( X ) − E P ˆ θ ( D ( X )) | ≤ 2 C n + + 2 ǫ. n | f ( t ) − f (0) | ≥ c ′ | t | , | t | < τ for some τ > 0, where f ( t ) = E N (0 , 1) (sigmoid( z − t )) satisfies, � w � 2 =1 , b = − w T θ θ D ( X ) = f ( w T ( θ − ˆ E P θ D ( X ) = = = = = = = = = = = = f (0) , E P ˆ θ )) . Weizhi ZHU (HKUST) Robust Estimation and GANs April 3, 2019 14 / 24

Covariance Matrix Estimation: Improper Network Structure       � � w j sigmoid( u T  : | w j | ≤ κ, u j ∈ R p T 1 =  T ( x ) = sigmoid j x )  . j ≥ 1 j ≥ 1       � � w j ReLU( u T  : T 2 =  T ( x ) = sigmoid j x ) | w j | ≤ κ, � u j � ≤ 1  . j ≥ 1 j ≥ 1 Weizhi ZHU (HKUST) Robust Estimation and GANs April 3, 2019 15 / 24

Covariance Matrix Estimation: Proper Network Structure   � � w j sigmoid( u T  : T 3 = T ( x ) = sigmoid j x + b j ) j ≥ 1 � � | w j | ≤ κ, u j ∈ R p , b j ∈ R . j ≥ 1  � � � H � � v jl ReLU( u T  : T 4 = T ( x ) = sigmoid w j sigmoid l x ) j ≥ 1 l =1 � H � � | w j | ≤ κ 1 , | v jl | ≤ κ 2 , � u l � ≤ 1 . j ≥ 1 l =1 Weizhi ZHU (HKUST) Robust Estimation and GANs April 3, 2019 16 / 24

� � � n 1 � Σ = argmin max S ( T ( X i ) , 1) + E X ∼ N (0 , Γ) S ( T ( X ) , 0) n T ∈T Γ ∈E p ( M ) i =1 Theorem (Gao-Yao-Zhu’ 2019) With i.i.d. observations X 1 , ..., X n ∼ (1 − ǫ ) N (0 , Σ) + ǫ Q and some regularizations on network weight matrix, we have op � p � � Σ − Σ � 2 n ∨ ǫ 2 with high probability uniformly over all � Σ � op ≤ M = O (1) and all Q. Weizhi ZHU (HKUST) Robust Estimation and GANs April 3, 2019 17 / 24

Robust Estimation and Generative Adversarial Networks Weizhi ZHU - PowerPoint PPT Presentation

Robust Estimation and Generative Adversarial Networks Weizhi ZHU Hong Kong University of Science and Technology wzhuai@ust.hk April 3, 2019 Robust Estimation and Generative Adversarial Nets [GLYZ18] Generative Adversarial Nets for Robust

Generative Adversarial Networks Benjamin Striner CMU 11-785 March 21, 2018 Benjamin Striner

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

GAN-based Photo Video Synthesis Summary of Generative Adversarial Nets Lei Zhang What is

Robust Statistics and Generative Adversarial Networks Yuan YAO HKUST Chao Gao (Chicago) Jiyu

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

CSC321 Lecture 19: Generative Adversarial Networks Roger Grosse Roger Grosse CSC321 Lecture 19:

Generative Adversarial Networks presented by Ian Goodfellow presentation co-developed with Aaron

Robust Statistics and Generative Adversarial Networks Yuan YAO HKUST 1 Chao GAO Jiyi LIU

Adversarial Training Attacks on Deep Networks and Generative Adversarial Networks Erkut Erdem

Generative Adversarial Networks Aaron Mishkin UBC MLRG 2018W2 1 Generative Adversial Networks

Applications of GANs Photo-Realistic Single Image Super-Resolution Using a Generative

Applications of GANs Photo-Realistic Single Image Super-Resolution Using a Generative

Bregman and Wasserstein, with Applications to Generative Adversarial Networks (GANs) and beyond

LAB MEETING: A Connection Between Generative Adversarial Networks, Inverse Reinforcement Learning

Generative Adversarial Networks Sahin Olut Department of Computer Engineering Istanbul Technical

Subjectivity of Autonomous August 28, 2012 Agents. Some Philosophical and Legal Remarks

Intelligent Vehicle Systems Southwest Research Ins6tute Mapviz 2D

Be A Developer Experience Super Hero: Robust Dev Scripts For Peace and Joy 1 Hi, Im Dustin

CENG4480 Lecture 08: Kalman Filter Bei Yu byu@cse.cuhk.edu.hk (Latest update: November 18,

Learning based Robust Control: Guaranteeing Stability while Improving Performance Felix

COMPSTAT 2010 Robust forecasting of non-stationary time series Koen Mahieu K.U.Leuven

Robust Scene Categorization by Learning Image Statistics in Context for BBC rushes Jan van

Robust Monopoly Regulation Yingni Guo, Eran Shmaya Northwestern University CCET, Sep 2019