Adversarial Bipartite Matching Rizal Fathony* ,# , Sima Behpour*, - - PowerPoint PPT Presentation

▶

Oct 05, 2023 797 likes •977 views

Efficient and Consistent Adversarial Bipartite Matching Rizal Fathony* ,# , Sima Behpour*, Xinhua Zhang, Brian D. Ziebart *) equal contribution # ) presenter 1 Bipartite Matching Tasks 1 1 2 2 = [4, 3, 1, 2] 3 3 4 4 B A

SLIDE 1

Efficient and Consistent

Adversarial Bipartite Matching

Rizal Fathony*,#, Sima Behpour*, Xinhua Zhang, Brian D. Ziebart

*) equal contribution

#) presenter

SLIDE 2

Bipartite Matching Tasks

1 2 3 4 1 2 3 4 A B

𝜌 = [4, 3, 1, 2] Maximum weighted bipartite matching: Machine learning task: Learn the appropriate weights 𝜔𝑗(⋅)

SLIDE 3

Learning Bipartite Matching | Applications

Word alignment (Taskar et. al., 2005; Pado & Lapta, 2006; Mac-Cartney et. al., 2008)

1 natürlich ist das haus klein

f course the house is small

Correspondence between images (Belongie et. al., 2002; Dellaert et. al., 2003)

Learning to rank documents (Dwork et. al., 2001; Le & Smola, 2007)

3 1 2 3 4

SLIDE 4

Desiderata for a Predictor

Efficiency

Consistency

2 runtime: (low degree) polynomial time must also minimize Hamming loss under ideal condition

(given the true distribution and fully expressive model parameters)

Learning objective:

seek pairwise potentials that most compatible with training data

Challenge:

loss functions (e.g. Hamming loss): non-continuous & non convex

Desiderata for predictor:

SLIDE 5

Exponential Family Random Field Approach

Consistent? Efficient?

(Petterson et. al., 2009; Volkovs & Zemel, 2012)

Probabilistic model:

normalization term 𝑎𝜔 involves matrix permanent computation

#P-hard

(Valiant, 1979)

impractical even for modestly-size 𝑜 = 20 produce Bayes optimal prediction in an ideal condition

SLIDE 6

Maximum Margin Approach

Efficient? Consistent?

(Tsochantaridis et. al., 2005)

Max-margin model:

based on Crammer & Singer multiclass SVM formulation
is not consistent for distribution with no majority label (Liu, 2007)

polynomial algorithm for computing maximum violated constraint: (Hungarian algorithm)

SLIDE 7

Adversarial Bipartite Matching

(our approach)

Predictor:

makes a probabilistic prediction ෠

𝑄(ො 𝜌|𝑦)

aims to minimize the loss
is pitted with an adversary instead of the empirical distribution

Seek a predictor that robustly minimize Hamming loss

against the worst-case permutation mixture probability

Adversary:

makes a probabilistic prediction ෘ

𝑄(ු 𝜌|𝑦)

aims to maximize the loss
constrained to select probability that match the statistics of empirical distribution ( ෨

𝑄) via moment matching on the features 𝜚 𝑦, 𝜌 = σ𝑗=1

𝑜

𝜚𝑗(𝑦, 𝜌𝑗)

SLIDE 8

Adversarial Bipartite Matching | Dual

Augmented Hamming loss matrix for 𝑜 = 3 permutations

Dual Formulation of the Adversarial Bipartite Matching

(methods of Lagrange multipliers, Von Neumann & Sion minimax duality) where 𝜄 is the dual variable for moment matching constraints

size: 𝑜! × 𝑜!

Intractable

for modestly-sized 𝑜

Hamming loss Lagrangian term

SLIDE 9

Efficient Algorithms

Double Oracle Method (Constraint Generations) 1 Marginal Distribution Formulation 2

SLIDE 10

Double Oracle Method

Based on the observation:

equilibrium is usually supported by small number of permutations ො 𝜌=123 ු 𝜌=123

0+𝜀123

Adversary’s best response: ු 𝜌=213 ු 𝜌=213 Predictor’s best response: ො 𝜌=213

2+𝜀213

ො 𝜌=213

2+𝜀123 0+𝜀213

Adversary’s best response: ු 𝜌=312 ු 𝜌=312 Predictor’s best response: ො 𝜌=312 ො 𝜌=312

3+𝜀213 2+𝜀213 3+𝜀123 2+𝜀213 0+𝜀213

Iterative procedure:

no formal polynomial bound is known
runtime: cannot be characterized as polynomial

SLIDE 11

Marginal Distribution Formulation

Marginal Distribution Matrices:

Predictor Adversary 𝐐 = 𝐑 =

𝑞𝑗,𝑘 = ෠ 𝑄(ො 𝜌𝑗 = 𝑘) 𝑟𝑗,𝑘 = ෘ 𝑄 ( ෕ 𝜌𝑗 = 𝑘)

Birkhoff –Von Neumann theorem:

123 132 213 231 312 321

convex polytope whose points are doubly stochastic matrix reduce the space of optimization: from 𝑃(𝑜!) to 𝑃(𝑜2)

SLIDE 12

Marginal Formulation | Optimization

Optimization:

add regularization and smoothing penalty

Outer (Q): * projected Quasi-Newton (Schmidt, et.al., 2009)

* projection to doubly-stochastic matrix

Inner (𝜄): closed-form solution
Inner (P): projection to doubly-stochastic matrix
Projection to doubly-stochastic matrix : ADMM

Techniques:

SLIDE 13

Consistency

Empirical Risk Perspective of Adversarial Bipartite Matching Consistency:

ur method also minimize the Hamming loss in ideal case.

arg-max of 𝑔 is in the Bayes optimal responses

SLIDE 14

Experiment Setup

1.0 1.3 1.5 2.5 2.8 1.0 1.2 1.4 4.2 5.0 relative: 12=1.0 relative: 1.96=1.0

Application: Video Tracking Empirical runtime (until convergence)

Adv. Marginal Form.:

grows (roughly) quadratically in 𝑜

CRF: impractical even for 𝑜 = 20

(Petterson et. al., 2009)

SLIDE 15

Experiment Results

6 pairs of dataset

significantly

utperforms SSVM

2 pairs of dataset

competitive with SSVM

Adv. Double Oracle:

small number of permutations

SLIDE 16

Conclusions

Efficient? Perform well?

Exponential Family Random Field Maximum Margin Adversarial Bipartite Matching

(Petterson et. al., 2009; Volkovs & Zemel, 2012) (Tsochantaridis et. al., 2005) (our approach)

Consistent?

? ??

SLIDE 17

Efficient and Consistent

Adversarial Bipartite Matching

Bipartite Matching Tasks

Learning Bipartite Matching | Applications

Desiderata for a Predictor

Exponential Family Random Field Approach

Consistent? Efficient?

Maximum Margin Approach

Efficient? Consistent?

Adversarial Bipartite Matching

Adversarial Bipartite Matching | Dual

Intractable

Efficient Algorithms

Double Oracle Method

Marginal Distribution Formulation

Marginal Formulation | Optimization

Consistency

Experiment Setup

Experiment Results

6 pairs of dataset

2 pairs of dataset

Conclusions

? ??

THANK YOU