Efficient and Consistent
Adversarial Bipartite Matching
Rizal Fathony*,#, Sima Behpour*, Xinhua Zhang, Brian D. Ziebart
1
*) equal contribution
#) presenter
Adversarial Bipartite Matching Rizal Fathony* ,# , Sima Behpour*, - - PowerPoint PPT Presentation
Efficient and Consistent Adversarial Bipartite Matching Rizal Fathony* ,# , Sima Behpour*, Xinhua Zhang, Brian D. Ziebart *) equal contribution # ) presenter 1 Bipartite Matching Tasks 1 1 2 2 = [4, 3, 1, 2] 3 3 4 4 B A
Rizal Fathony*,#, Sima Behpour*, Xinhua Zhang, Brian D. Ziebart
1
*) equal contribution
#) presenter
2
1 2 3 4 1 2 3 4 A B
π = [4, 3, 1, 2] Maximum weighted bipartite matching: Machine learning task: Learn the appropriate weights ππ(β )
3
Word alignment (Taskar et. al., 2005; Pado & Lapta, 2006; Mac-Cartney et. al., 2008)
1 natΓΌrlich ist das haus klein
Correspondence between images (Belongie et. al., 2002; Dellaert et. al., 2003)
2
Learning to rank documents (Dwork et. al., 2001; Le & Smola, 2007)
3 1 2 3 4
4
Efficiency
1
Consistency
2 runtime: (low degree) polynomial time must also minimize Hamming loss under ideal condition
(given the true distribution and fully expressive model parameters)
Learning objective:
seek pairwise potentials that most compatible with training data
Challenge:
loss functions (e.g. Hamming loss): non-continuous & non convex
Desiderata for predictor:
5
(Petterson et. al., 2009; Volkovs & Zemel, 2012)
Probabilistic model:
normalization term ππ involves matrix permanent computation
#P-hard
(Valiant, 1979)
impractical even for modestly-size π = 20 produce Bayes optimal prediction in an ideal condition
6
(Tsochantaridis et. al., 2005)
Max-margin model:
polynomial algorithm for computing maximum violated constraint: (Hungarian algorithm)
7
(our approach)
Predictor:
π(ΰ· π|π¦)
Seek a predictor that robustly minimize Hamming loss
against the worst-case permutation mixture probability
Adversary:
π(ΰ· π|π¦)
π) via moment matching on the features π π¦, π = Οπ=1
π
ππ(π¦, ππ)
8
Augmented Hamming loss matrix for π = 3 permutations
Dual Formulation of the Adversarial Bipartite Matching
(methods of Lagrange multipliers, Von Neumann & Sion minimax duality) where π is the dual variable for moment matching constraints
size: π! Γ π!
for modestly-sized π
Hamming loss Lagrangian term
9
Double Oracle Method (Constraint Generations) 1 Marginal Distribution Formulation 2
10
Based on the observation:
equilibrium is usually supported by small number of permutations ΰ· π=123 ΰ· π=123
0+π123
Adversaryβs best response: ΰ· π=213 ΰ· π=213 Predictorβs best response: ΰ· π=213
2+π213
ΰ· π=213
2+π123 0+π213
Adversaryβs best response: ΰ· π=312 ΰ· π=312 Predictorβs best response: ΰ· π=312 ΰ· π=312
3+π213 2+π213 3+π123 2+π213 0+π213
Iterative procedure:
11
Marginal Distribution Matrices:
Predictor Adversary π = π =
ππ,π = ΰ· π(ΰ· ππ = π) ππ,π = ΰ· π ( ΰ· ππ = π)
Birkhoff βVon Neumann theorem:
123 132 213 231 312 321
convex polytope whose points are doubly stochastic matrix reduce the space of optimization: from π(π!) to π(π2)
12
Optimization:
add regularization and smoothing penalty
* projection to doubly-stochastic matrix
Techniques:
13
Empirical Risk Perspective of Adversarial Bipartite Matching Consistency:
arg-max of π is in the Bayes optimal responses
14
1.0 1.3 1.5 2.5 2.8 1.0 1.2 1.4 4.2 5.0 relative: 12=1.0 relative: 1.96=1.0
Application: Video Tracking Empirical runtime (until convergence)
grows (roughly) quadratically in π
CRF: impractical even for π = 20
(Petterson et. al., 2009)
15
significantly
competitive with SSVM
small number of permutations
16
Efficient? Perform well?
Exponential Family Random Field Maximum Margin Adversarial Bipartite Matching
(Petterson et. al., 2009; Volkovs & Zemel, 2012) (Tsochantaridis et. al., 2005) (our approach)
Consistent?
17