Performance-Aligned Learning Algorithms with Statistical Guarantees - - PowerPoint PPT Presentation

β–Ά
performance aligned learning algorithms
SMART_READER_LITE
LIVE PREVIEW

Performance-Aligned Learning Algorithms with Statistical Guarantees - - PowerPoint PPT Presentation

Performance-Aligned Learning Algorithms with Statistical Guarantees Rizal Zaini Ahmad Fathony Committee: Prof. Brian Ziebart (Chair) Prof. Bhaskar DasGupta Prof. Xinhua Zhang Prof. Lev Reyzin Prof. Simon Lacoste-Julien 1 Outline New


slide-1
SLIDE 1

Performance-Aligned Learning Algorithms

with Statistical Guarantees

1

Rizal Zaini Ahmad Fathony

Committee: Prof. Brian Ziebart (Chair)

  • Prof. Bhaskar DasGupta
  • Prof. Xinhua Zhang
  • Prof. Lev Reyzin
  • Prof. Simon Lacoste-Julien
slide-2
SLIDE 2

2

Outline

Introduction & Motivation

1

General Multiclass Classification

2

Graphical Models

3

Bipartite Matching in Graphs

4

Conclusion & Future Directions

5 β€œNew learning algorithms that align with performance/loss metrics and provide the statistical guarantees of Fisher consistency”

slide-3
SLIDE 3

Introduction and Motivation

3

slide-4
SLIDE 4

Data

Data Distribution 𝑄(π’š, 𝑧)

π’š1 𝑧1 π’š2 𝑧2 π’šπ‘œ π‘§π‘œ

…

Training

Supervised Learning

π’šπ‘œ+1 ො π‘§π‘œ+1 Testing π’šπ‘œ+2

…

Loss/Performance Metrics: loss ො 𝑧, 𝑧 / score(ො 𝑧, 𝑧)

4

ො π‘§π‘œ+2 Multiclass Classification

  • Zero one loss / accuracy metric
  • Absolute loss (for ordinal regression)

Multivariate Performance

  • F1-score
  • Precision@k

Structured Prediction

  • Hamming loss (sum of 0-1 loss)
slide-5
SLIDE 5

5

  • Assume a family of parametric hypothesis function 𝑔 (e.g. linear discriminator)
  • Find the hypothesis π‘”βˆ— that minimize the empirical risk:

Non-convex, non-continuous metrics β†’ Intractable optimization

Convex surrogate loss need to be employed!

Empirical Risk Minimization (ERM) (Vapnik, 1992) Fisher Consistency

Under ideal condition: optimize surrogate β†’ minimizes the loss metric

(given the true distribution and fully expressive model) A desirable property of convex surrogates:

slide-6
SLIDE 6

6

Two Main Approaches

Probabilistic Approach 1 Large-Margin Approach 2

  • Construct prediction probability model
  • Employ the logistic loss surrogate

Logistic Regression, Conditional Random Fields (CRF)

  • Maximize the margin that separates correct prediction from the incorrect one
  • Employ the hinge loss surrogate

Support Vector Machine (SVM), Structured SVM

* Pictures are taken from MLPP book (Kevin Murphy)

slide-7
SLIDE 7

7

Multiclass Classification | Logistic Regression vs SVM

Multiclass Logistic Regression 1 Multiclass SVM 2

Statistical guarantee of Fisher consistency (minimizes the zero-one loss metric in the limit) No dual parameter sparsity Computational efficiency

(via the kernel trick & dual parameter sparsity)

Current multiclass SVM formulations:

  • Lack Fisher consistency property, or
  • Doesn’t perform well in practice
slide-8
SLIDE 8

8

Structured Prediction| CRF vs Structured SVM

Conditional Random Fields (CRF) 1 Structured SVM 2

No easy mechanism to incorporate customized loss/performance metrics Flexibility to incorporate customized loss/performance metrics Computation of the normalization term may be intractable Relatively more efficient in computation Statistical guarantee of Fisher consistency No Fisher consistency guaranteees

slide-9
SLIDE 9

9

New Learning Algorithms?

Provide Fisher consistency guarantee Align better with the loss/performance metric

(by incorporating the metric into its learning objective)

Computationally efficient Perform well in practice

How?

β€œWhat predictor best maximizes the performance metric (or minimizes the loss metric) in the worst case given the statistical summaries of the empirical distributions?”

Robust adversarial learning approach

slide-10
SLIDE 10

Performance-Aligned Surrogate Losses for General Multiclass Classification

10

Based on: Fathony, R., Asif, K., Liu, A., Bashiri, M. A., Xing, W., Behpour, S., Zhang, X., and Ziebart, B. D.: Consistent robust adversarial prediction for general multiclass classification. arXiv preprint arXiv:1812.07526, 2018. (Submitted to JMLR).

Fathony, R., Liu, A., Asif, K., and Ziebart, B.: Adversarial multiclass classification: A risk minimization perspective. NIPS 2016. Fathony, R., Bashiri, M. A., and Ziebart, B.: Adversarial surrogate losses for ordinal regression. NIPS 2017.

slide-11
SLIDE 11

Data

Data Distribution 𝑄(π’š, 𝑧)

π’š1 𝑧1 π’š2 𝑧2 π’šπ‘œ π‘§π‘œ

…

Training

Supervised Learning | Multiclass Classification

π’šπ‘œ+1 ො π‘§π‘œ+1 Testing π’šπ‘œ+2

…

Loss/Performance Metrics: loss ො 𝑧, 𝑧 / score(ො 𝑧, 𝑧)

11

ො π‘§π‘œ+2

… 1 2 3 𝑙

Finite set of possible value of 𝑧

slide-12
SLIDE 12

12

Multiclass Classification | Zero-One Loss

Example: Digit Recognition

… 1 2 3 …

Loss Metric: loss ො 𝑧, 𝑧 = 𝐽(ො 𝑧 β‰  𝑧)

Loss Metric: Zero-One Loss

L =

slide-13
SLIDE 13

13

Multiclass Classification | Ordinal Classification

Loss Metric: loss ො 𝑧, 𝑧 = |ො 𝑧 βˆ’ 𝑧|

Loss Metric: Absolute Loss

L =

… 1 2 5 …

Predicted vs Actual Label: Distance Loss

Example: Movie Rating Prediction

slide-14
SLIDE 14

14

Multiclass Classification | Classification with Abstention

Predictor can say β€˜abstain’

1 2 3

Loss Metric: loss ො 𝑧, 𝑧 = α‰Š 𝛽 if abstain 𝐽(ො 𝑧 β‰  𝑧)

  • therwise

Loss Metric: Abstention Loss

L =

Prediction Abstain

slide-15
SLIDE 15

15

Multiclass Classification | Other Loss Metrics

Squared loss metric loss ො 𝑧, 𝑧 = ො 𝑧 βˆ’ 𝑧 2 Cost-sensitive loss metric Taxonomy-based loss metric loss ො 𝑧, 𝑧 = β„Ž βˆ’ 𝑀 ො 𝑧, 𝑧 + 1 loss ො 𝑧, 𝑧 = 𝐃 ො

𝑧,𝑧

slide-16
SLIDE 16

Robust Adversarial Learning

16

slide-17
SLIDE 17

17

Robust Adversarial Learning (Grunwald & Dawid, 2004; Delage & Ye, 2010; Asif et.al, 2015)

Empirical Risk Minimization

Approximate the loss

Original Loss Metric

Non-convex, non-continuous

with convex surrogates

Probabilistic prediction Evaluate against an adversary, instead of using empirical data Adversary’s probabilistic prediction Constraint the statistics of the adversary’s distribution to match the empirical statistics

Robust Adversarial Learning

slide-18
SLIDE 18

18

Robust Adversarial Dual Formulation

Primal: Dual:

Lagrange multiplier, minimax duality

ERM with the adversarial surrogate loss (AL):

Simplified notation where:

Convex in πœ„

slide-19
SLIDE 19

19

Adversarial Surrogate Loss

Adversarial Surrogate Loss Convert to a Linear Program Convex Polytope formed by the constraints Example for a four class classification Extreme points of the (bounded) polytope

There is always an optimal solution that is an extreme point of the domain.

Computing AL = finding the best extreme point

LP Solver 𝑃(𝑙3.5)

slide-20
SLIDE 20

20

Zero-One Loss : AL0-1| Convex Polytope

Extreme points of the polytope The Adversarial Surrogate Loss for Zero-One Loss Metrics (AL0-1) Computation of AL0-1

  • Sort 𝑔

𝑗 in non-increasing order

  • Incrementally add potentials to the set 𝑇,

until adding more potential decrease the loss value

O(𝑙 log 𝑙), where 𝑙 is the number of classes

𝒇𝑗 is a vector with a single 1 at the i-th index, and 0 elsewhere.

Convex Polytope of the AL0-1

slide-21
SLIDE 21

21

AL0-1| Loss Surface

Binary Classification Three Class Classification

  • Plots over the space of potential differences πœ”π‘— = 𝑔

𝑗 βˆ’ 𝑔 𝑧

  • The true label is 𝑧 = 1
slide-22
SLIDE 22

22

Other Multiclass Loss Metrics

Extreme points of the polytope: 𝒇𝑗 is a vector with a single 1 at the i-th index, and 0 elsewhere.

Ordinal Regression with Absolute Loss Metric

Adversarial Surrogate Loss ALord: Computation cost: O(𝑙), where 𝑙 is the number of classes

slide-23
SLIDE 23

23

Other Multiclass Loss Metrics

Extreme points of the polytope: 𝒇𝑗 is a vector with a single 1 at the i-th index, and 0 elsewhere.

Classification with Abstention (0 ≀ 𝛽 ≀ 0.5)

Adversarial Surrogate Loss ALabstain: Computation cost: O(𝑙), where 𝑙 is the number of classes

slide-24
SLIDE 24

24

Fisher Consistency

Fisher Consistency Requirement in Multiclass Classification

  • 𝑄(𝑍|π’š) is the true conditional distribution
  • 𝑔 is optimized over all measurable functions

Consistency Fisher consistent

Bayes risk minimizer

Minimizer Property

  • 𝐞 is the true conditional distribution
  • 𝑧⋄is the Bayes optimal predictor

Under π βˆ—:

slide-25
SLIDE 25

25

Optimization

Sub-gradient descent Incorporate Rich Feature Spaces via the Kernel Trick

input space π’šπ‘— rich feature space πœ•(π’šπ‘—) Compute the dot products

  • 1. Dual Optimization (benefit: dual parameter sparsity)
  • 2. Primal Optimization (via PEGASOS (Shalev-Shwartz, 2010))

π‘‡βˆ— is the set that maximize AL0-1 Example: AL0-1

slide-26
SLIDE 26

Experiments:

Example: Multiclass Classification (0-1 loss)

26

slide-27
SLIDE 27

27

Multiclass Classification | Related Works

  • 1. The WW Model (Weston et.al., 2002)

Multiclass Support Vector Machine (SVM)

  • 2. The CS Model (Crammer and Singer, 1999)
  • 3. The LLW Model (Lee et.al., 2004)

with:

Fisher Consistent?

(Tewari and Bartlett, 2007) (Liu, 2007)

Perform well in low dimensional feature?

(Dogan et.al., 2016)

Relative Margin Model Relative Margin Model Absolute Margin Model

slide-28
SLIDE 28

28

AL0-1 | Experiments

Dataset properties and AL0-1 constraints

12 datasets dual parameter sparsity

slide-29
SLIDE 29

29

AL0-1 | Experiments | Results

Results for Linear Kernel and Gaussian Kernel

The mean (standard deviation) of the accuracy. Bold numbers: best or not significantly worse than the best

Linear Kernel

AL01: slight benefit LLW: poor perf.

  • Gauss. Kernel

LLW: improved perf. AL01: maintain benefit

slide-30
SLIDE 30

30

Multiclass Zero-One Classification

  • 1. The SVM WW Model (Weston et.al., 2002)
  • 2. The SVM CS Model (Crammer and Singer, 1999)
  • 3. The SVM LLW Model (Lee et.al., 2004)

Fisher Consistent? Perform well in low dimensional feature? Relative Margin Model Relative Margin Model Absolute Margin Model

  • 4. The AL0-1 (Adversarial Surrogate Loss)

Relative Margin Model

slide-31
SLIDE 31

Other results

General Multiclass Classification

31

General Multiclass Classification

  • 1. Zero-One Loss Metric
  • 2. Ordinal Classification with the Absolute

Loss Metric

  • 3. Ordinal Classification with the Squared

Loss Metric

  • 4. Weighted Multiclass Loss Metrics
  • 5. Classification with Abstention / Reject

Option

slide-32
SLIDE 32

Performance-Aligned Graphical Models

32

Based on: Rizal Fathony, Ashkan Rezaei, Mohammad Bashiri, Xinhua Zhang, Brian D. Ziebart. β€œDistributionally Robust Graphical Models”. Advances in Neural Information Processing Systems 31 (NIPS), 2018

slide-33
SLIDE 33

33

Conditional Graphical Models

Some Popular Graphical Structure in Structured Prediction Chain Structure Tree Structure Lattice Structure

Activity Prediction, Sequence Tagging, NLP tasks: e.g. Named Entity Recognition Parse Tree-Based NLP tasks: Semantic Role Labeling and Sentiment Analysis Computer Vision Tasks: e.g. Image Segmentation

slide-34
SLIDE 34

34

Previous Approaches for Conditional Graphical Models

Conditional Random Fields (CRF) 1 Structured SVM (SSVM) 2

Fisher Consistent

Produce Bayes optimal prediction in ideal case.

No easy mechanism to incorporate customized loss/performance metrics

The algorithm optimized the conditional likelihood. Loss/performance metric-based prediction can be performed after learning process.

Align with the loss/performance metrics

The algorithm accept customized loss/performance metric in its optimization objective.

No Fisher consistency guarantee

Based on Multiclass SVM-CS. Not consistent for distribution with no majority label.

(Tsochantaridis et. al., 2005) (Lafferty et. al., 2001)

slide-35
SLIDE 35

35

Adversarial Graphical Models (AGM)

Primal:

  • Feature function Ξ¦ 𝐘, 𝐙 is additively decomposed over cliques, Ξ¦ 𝐲, 𝐳 = Οƒc Ο• x, yc
  • The loss metric is additively decomposed over each 𝑧𝑗 variables, loss ෝ

𝒛, ΰ·• 𝒛 = Οƒi=1

n

loss ෝ yi, ΰ·• yi

  • Focus on pairwise graphical models: interactions between label = edges in graphs

Dual:

πœ„π‘“: Lagrange multipliers for constraints with edge features πœ„π‘€: Lagrange multipliers for constraints with node features

size: π‘™π‘œ Γ— π‘™π‘œ

Intractable

for modestly-sized π‘œ

slide-36
SLIDE 36

36

AGM | Marginal Formulation

Dual | Marginal Formulation:

The objective depends on ΰ·  𝑄(ො 𝐳|𝐲) only through its node marginal probability ΰ·  𝑄(ෝ 𝑧𝑗|𝐲) The objective depends on ෘ 𝑄(ΰ·” 𝐳|𝐲) only through its node and edge marginal probability ෘ 𝑄(ΰ·• 𝑧𝑗|𝐲) and ෘ 𝑄(ΰ·• 𝑧𝑗, ΰ·• π‘§π‘˜|𝐲)

General Graphical Models: Intractable Similar to CRF and SSVM: Focus: Graphs with low tree-width, e.g.: chain, tree, simple loops. Tractable optimization

Dual:

slide-37
SLIDE 37

37

AGM | Optimization

Matrix Notation (Tree Structure AGM):

  • Stochastic (sub)-gradient descent

(outer optimization for πœ„π‘“ and πœ„π‘€)

  • Dual decomposition (inner 𝐑 optimization)
  • Discrete optimal transport solver (recovering 𝐑)
  • Closed-form solution (inner πͺ optimization)

Optimization Techniques:

  • Depends on the loss metric used
  • For the additive zero-one loss (Hamming loss)

𝑃(π‘œπ‘šπ‘™ log 𝑙 + π‘œπ‘™2) 𝑙: # classes, π‘œ: # nodes, π‘š: # iterations in dual decomposition

Runtime (for a single subgradient update):

CRF 𝑃(π‘œπ‘™2) SSVM 𝑃(π‘œπ‘™2)

General graphs low tree-width

𝑃 π‘œπ‘šπ‘₯𝑙(π‘₯+1) log 𝑙 + π‘œπ‘™2(π‘₯+1) π‘œ: # cliques, π‘₯: treewidth of the graph

slide-38
SLIDE 38

38

AGM | Consistency

when 𝑔 is optimized over all measurable functions on the input space

AGM is consistent

when 𝑔 is optimized over a restricted set of functions: all measurable function that are additive over the edge and node potentials.

AGM is also consistent If the loss function is additive

slide-39
SLIDE 39

39

AGM | Experiments (1)

Facial Emotion Intensity Prediction (Chain Structure, Labels with Ordinal Category)

  • Each node: 3 class classification: neutral = 1< increasing = 2 < apex = 3
  • 167 sequences
  • Ordinal loss metrics: zero-one loss, absolute loss, and squared loss
  • Weighted and unweighted. Weights reflect the focus of prediction (e.g. focus more on latest nodes)

Results: The mean (standard deviation) of the average loss metrics.

Bold numbers: best or not significantly worse than the best

slide-40
SLIDE 40

40

AGM | Experiments (2)

Semantic Role Labeling (Tree Structure)

  • Predict label of each node given known parse tree.
  • CoNLL 2005 dataset
  • Cost-sensitive loss metric is used reflect the importance of each label

Results:

slide-41
SLIDE 41

41

Conditional Graphical Models

Performance-Aligned?

Conditional Random Field (CRF) Structured SVM Adversarial Graphical Models

(Lafferty et. al., 2001) (Tsochantaridis et. al., 2005) (our approach)

Consistent?

slide-42
SLIDE 42

Bipartite Matching in Graphs

42

Based on: Rizal Fathony*, Sima Behpour*, Xinhua Zhang, Brian D. Ziebart. β€œEfficient and Consistent Adversarial Bipartite Matching”. International Conference on Machine Learning (ICML), 2018.

slide-43
SLIDE 43

43

Bipartite Matching Task

Maximum weighted bipartite matching:

1 2 3 4 1 2 3 4 A B

𝜌 = [4, 3, 1, 2]

Machine learning task: Learn the appropriate weights πœ”π‘—(β‹…) Objective: Minimize a loss metric, e.g., the Hamming loss

slide-44
SLIDE 44

44

Learning Bipartite Matching | Applications

Word alignment

(Taskar et. al., 2005; Pado & Lapta, 2006; Mac-Cartney et. al., 2008)

1 natΓΌrlich ist das haus klein

  • f course the house is small

Correspondence between images

(Belongie et. al., 2002; Dellaert et. al., 2003)

2

Learning to rank documents

(Dwork et. al., 2001; Le & Smola, 2007)

3 1 2 3 4

A non-bipartite matching task can be converted to a bipartite matching problem

slide-45
SLIDE 45

45

Previous Approaches for Bipartite Matching

CRF 1 Structured SVM 2

Fisher Consistent

Produce Bayes optimal prediction in ideal case

Computationally intractable

Normalization term requires matrix permanent computation (a #P-hard problem). Approximation is needed for modestly sized problems.

Computationally Efficient

Hungarian algorithm for computing the maximum violated constraints

No Fisher consistency guarantee

Based on Multiclass SVM-CS Not consistent for distribution with no majority label solved using constraint generation

(Tsochantaridis et. al., 2005) (Petterson et. al., 2009; Volkovs & Zemel, 2012)

slide-46
SLIDE 46

46

Adversarial Bipartite Matching (our approach)

Primal: Dual:

Hamming loss Lagrangian term πœ€

Augmented Hamming loss matrix for π‘œ = 3 permutations

size: π‘œ! Γ— π‘œ!

Intractable

for modestly-sized π‘œ

slide-47
SLIDE 47

47

Polytope of the Permutation Mixtures

Marginal Distribution Matrices:

Predictor Adversary 𝐐 = 𝐑 =

π‘žπ‘—,π‘˜ = ΰ·  𝑄(ො πœŒπ‘— = π‘˜) π‘Ÿπ‘—,π‘˜ = ෘ 𝑄 ( ΰ·• πœŒπ‘— = π‘˜)

Dual:

Birkhoff – Von Neumann theorem:

123 132 213 231 312 321

convex polytope whose points are doubly stochastic matrices reduce the space of optimization: from 𝑃(π‘œ!) to 𝑃(π‘œ2)

slide-48
SLIDE 48

48

Marginal Distribution Formulation

Dual: Marginal Formulation:

  • Outer (Q)

: projected Quasi-Newton (Schmidt, et.al., 2009)

  • Inner (πœ„)

: closed-form solution

  • Inner (P)

: projection to doubly-stochastic matrix

  • Projection to doubly-stochastic matrix : ADMM

Optimization Techniques Used:

Rearrange the optimization order and add regularization and smoothing penalties

slide-49
SLIDE 49

49

Consistency

Empirical Risk Perspective of Adversarial Bipartite Matching

when 𝑔 is optimized over all measurable functions on the input space (𝑦, 𝜌)

ALperm is consistent

𝑔 is optimized over a restricted set of functions: 𝑔 𝑦, 𝜌 = σ𝑗 𝑕𝑗(𝑦, πœŒπ‘—) when 𝑕 is allowed to be optimized over all measurable functions on the individual input space (𝑦, πœŒπ‘—)

ALperm is also consistent

slide-50
SLIDE 50

50

Experiments

1.0 1.3 1.5 2.5 2.8 1.0 1.2 1.4 4.2 5.0 relative: 12=1.0 relative: 1.96=1.0

Application: Video Tracking Empirical runtime (until convergence)

  • Adversarial. Marginal Formulation:

grows (roughly) quadratically in π‘œ

CRF: impractical even for π‘œ = 20

(Petterson et. al., 2009)

Public Benchmark Datasets

slide-51
SLIDE 51

51

Experiment Results

6 pairs of dataset

significantly

  • utperforms SSVM

2 pairs of dataset

competitive with SSVM

slide-52
SLIDE 52

52

Bipartite Matching in Graphs

Efficient? Perform well?

Conditional Random Field (CRF) Structured SVM Adversarial Bipartite Matching

(Petterson et. al., 2009; Volkovs & Zemel, 2012) (Tsochantaridis et. al., 2005) (our approach)

Consistent?

?

slide-53
SLIDE 53

Conclusion

53

slide-54
SLIDE 54

54

Robust Adversarial Learning Algorithms

Provide Fisher consistency guarantee Align better with the loss/performance metric

(by incorporating the metric into its learning objective)

Computationally efficient Perform well in practice

slide-55
SLIDE 55

Future Directions

55

slide-56
SLIDE 56

56

Future Directions (1)

  • 1. Fairness in Machine Learning

Our formulation only enforces constraints

  • n the adversary.

Add fairness constraints to the predictor? Important issues in automated decision using ML algorithms. Requires the algorithm to produce fair prediction.

  • 2. Statistical Theory of Loss Functions

Is there any stronger statistical guarantee that can separate the high-performing Fisher consistent algorithm from the low-performing ones? In multiclass classification problem, both AL0-1 and SVM-LLW are Fisher consistent. However, their performances are quite different.

slide-57
SLIDE 57

57

Future Directions (2)

  • 3. Structured Prediction &

Graphical Models

  • 4. Deep Learning

Can we develop learning algorithms for general graphical models? More complex graphical structures are popular in some applications, e.g. computer vision. Deep learning has been successfully applied to many prediction problems. How can the robust adversarial learning approach help designing deep learning architectures? What kind of approximation algorithms can be applicable? Exact learning algorithms for AGM in this case may be intractable. Most of deep learning architectures are not designed to optimize customized loss metrics.

slide-58
SLIDE 58

58

Collaborators

MB

Mohammad Bashiri

AR

Ashkan Rezaei

KA

Kaiser Asif

AL

Anqi Liu

SB

Sima Behpour

XZ

  • Prof. Xinhua Zhang

BZ

  • Prof. Brian Ziebart

WX

Wei Xing

slide-59
SLIDE 59

59

Publications

  • Consistent Robust Adversarial Prediction for General Multiclass Classification

Rizal Fathony, Kaiser Asif, Anqi Liu, Mohammad Bashiri, Wei Xing, Sima Behpour, Xinhua Zhang, Brian D. Ziebart. Submitted to JMLR.

  • Distributionally Robust Graphical Models

Rizal Fathony, Ashkan Rezaei, Mohammad Bashiri, Xinhua Zhang, Brian D. Ziebart. Advances in Neural Information Processing Systems 31 (NeurIPS), 2018.

  • Efficient and Consistent Adversarial Bipartite Matching

Rizal Fathony*, Sima Behpour*, Xinhua Zhang, Brian D. Ziebart. International Conference on Machine Learning (ICML), 2018.

  • Adversarial Surrogate Losses for Ordinal Regression

Rizal Fathony, Mohammad Bashiri, Brian D. Ziebart. Advances in Neural Information Processing Systems 30 (NIPS), 2017.

  • Adversarial Multiclass Classification: A Risk Minimization Perspective

Rizal Fathony, Anqi Liu, Kaiser Asif, Brian D. Ziebart. Advances in Neural Information Processing Systems 29 (NIPS), 2016.

  • Kernel Robust Bias-Aware Prediction under Covariate Shift

Anqi Liu, Rizal Fathony, Brian D. Ziebart. ArXiv Preprints, 2016.

slide-60
SLIDE 60

Thank You

60