Adversarial Bipartite Matching Rizal Fathony* ,# , Sima Behpour*, - - PowerPoint PPT Presentation

β–Ά
adversarial bipartite matching
SMART_READER_LITE
LIVE PREVIEW

Adversarial Bipartite Matching Rizal Fathony* ,# , Sima Behpour*, - - PowerPoint PPT Presentation

Efficient and Consistent Adversarial Bipartite Matching Rizal Fathony* ,# , Sima Behpour*, Xinhua Zhang, Brian D. Ziebart *) equal contribution # ) presenter 1 Bipartite Matching Tasks 1 1 2 2 = [4, 3, 1, 2] 3 3 4 4 B A


slide-1
SLIDE 1

Efficient and Consistent

Adversarial Bipartite Matching

Rizal Fathony*,#, Sima Behpour*, Xinhua Zhang, Brian D. Ziebart

1

*) equal contribution

#) presenter

slide-2
SLIDE 2

Bipartite Matching Tasks

2

1 2 3 4 1 2 3 4 A B

𝜌 = [4, 3, 1, 2] Maximum weighted bipartite matching: Machine learning task: Learn the appropriate weights πœ”π‘—(β‹…)

slide-3
SLIDE 3

Learning Bipartite Matching | Applications

3

Word alignment (Taskar et. al., 2005; Pado & Lapta, 2006; Mac-Cartney et. al., 2008)

1 natΓΌrlich ist das haus klein

  • f course the house is small

Correspondence between images (Belongie et. al., 2002; Dellaert et. al., 2003)

2

Learning to rank documents (Dwork et. al., 2001; Le & Smola, 2007)

3 1 2 3 4

slide-4
SLIDE 4

Desiderata for a Predictor

4

Efficiency

1

Consistency

2 runtime: (low degree) polynomial time must also minimize Hamming loss under ideal condition

(given the true distribution and fully expressive model parameters)

Learning objective:

seek pairwise potentials that most compatible with training data

Challenge:

loss functions (e.g. Hamming loss): non-continuous & non convex

Desiderata for predictor:

slide-5
SLIDE 5

Exponential Family Random Field Approach

5

Consistent? Efficient?

(Petterson et. al., 2009; Volkovs & Zemel, 2012)

Probabilistic model:

normalization term π‘Žπœ” involves matrix permanent computation

#P-hard

(Valiant, 1979)

impractical even for modestly-size π‘œ = 20 produce Bayes optimal prediction in an ideal condition

slide-6
SLIDE 6

Maximum Margin Approach

6

Efficient? Consistent?

(Tsochantaridis et. al., 2005)

Max-margin model:

  • based on Crammer & Singer multiclass SVM formulation
  • is not consistent for distribution with no majority label (Liu, 2007)

polynomial algorithm for computing maximum violated constraint: (Hungarian algorithm)

slide-7
SLIDE 7

Adversarial Bipartite Matching

7

(our approach)

Predictor:

  • makes a probabilistic prediction ΰ· 

𝑄(ො 𝜌|𝑦)

  • aims to minimize the loss
  • is pitted with an adversary instead of the empirical distribution

Seek a predictor that robustly minimize Hamming loss

against the worst-case permutation mixture probability

Adversary:

  • makes a probabilistic prediction ෘ

𝑄(ΰ·” 𝜌|𝑦)

  • aims to maximize the loss
  • constrained to select probability that match the statistics of empirical distribution ( ΰ·¨

𝑄) via moment matching on the features 𝜚 𝑦, 𝜌 = σ𝑗=1

π‘œ

πœšπ‘—(𝑦, πœŒπ‘—)

slide-8
SLIDE 8

Adversarial Bipartite Matching | Dual

8

Augmented Hamming loss matrix for π‘œ = 3 permutations

Dual Formulation of the Adversarial Bipartite Matching

(methods of Lagrange multipliers, Von Neumann & Sion minimax duality) where πœ„ is the dual variable for moment matching constraints

size: π‘œ! Γ— π‘œ!

Intractable

for modestly-sized π‘œ

Hamming loss Lagrangian term

slide-9
SLIDE 9

Efficient Algorithms

9

Double Oracle Method (Constraint Generations) 1 Marginal Distribution Formulation 2

slide-10
SLIDE 10

Double Oracle Method

10

Based on the observation:

equilibrium is usually supported by small number of permutations ො 𝜌=123 ΰ·” 𝜌=123

0+πœ€123

Adversary’s best response: ΰ·” 𝜌=213 ΰ·” 𝜌=213 Predictor’s best response: ො 𝜌=213

2+πœ€213

ො 𝜌=213

2+πœ€123 0+πœ€213

Adversary’s best response: ΰ·” 𝜌=312 ΰ·” 𝜌=312 Predictor’s best response: ො 𝜌=312 ො 𝜌=312

3+πœ€213 2+πœ€213 3+πœ€123 2+πœ€213 0+πœ€213

Iterative procedure:

  • no formal polynomial bound is known
  • runtime: cannot be characterized as polynomial
slide-11
SLIDE 11

Marginal Distribution Formulation

11

Marginal Distribution Matrices:

Predictor Adversary 𝐐 = 𝐑 =

π‘žπ‘—,π‘˜ = ΰ·  𝑄(ො πœŒπ‘— = π‘˜) π‘Ÿπ‘—,π‘˜ = ෘ 𝑄 ( ΰ·• πœŒπ‘— = π‘˜)

Birkhoff –Von Neumann theorem:

123 132 213 231 312 321

convex polytope whose points are doubly stochastic matrix reduce the space of optimization: from 𝑃(π‘œ!) to 𝑃(π‘œ2)

slide-12
SLIDE 12

Marginal Formulation | Optimization

12

Optimization:

add regularization and smoothing penalty

  • Outer (Q): * projected Quasi-Newton (Schmidt, et.al., 2009)

* projection to doubly-stochastic matrix

  • Inner (πœ„): closed-form solution
  • Inner (P): projection to doubly-stochastic matrix
  • Projection to doubly-stochastic matrix : ADMM

Techniques:

slide-13
SLIDE 13

Consistency

13

Empirical Risk Perspective of Adversarial Bipartite Matching Consistency:

  • ur method also minimize the Hamming loss in ideal case.

arg-max of 𝑔 is in the Bayes optimal responses

slide-14
SLIDE 14

Experiment Setup

14

1.0 1.3 1.5 2.5 2.8 1.0 1.2 1.4 4.2 5.0 relative: 12=1.0 relative: 1.96=1.0

Application: Video Tracking Empirical runtime (until convergence)

  • Adv. Marginal Form.:

grows (roughly) quadratically in π‘œ

CRF: impractical even for π‘œ = 20

(Petterson et. al., 2009)

slide-15
SLIDE 15

Experiment Results

15

6 pairs of dataset

significantly

  • utperforms SSVM

2 pairs of dataset

competitive with SSVM

  • Adv. Double Oracle:

small number of permutations

slide-16
SLIDE 16

Conclusions

16

Efficient? Perform well?

Exponential Family Random Field Maximum Margin Adversarial Bipartite Matching

(Petterson et. al., 2009; Volkovs & Zemel, 2012) (Tsochantaridis et. al., 2005) (our approach)

Consistent?

? ??

slide-17
SLIDE 17

THANK YOU

17