Optimal Learning of Joint Alignments with a Faulty Oracle - - PowerPoint PPT Presentation

optimal learning of joint alignments with a faulty oracle
SMART_READER_LITE
LIVE PREVIEW

Optimal Learning of Joint Alignments with a Faulty Oracle - - PowerPoint PPT Presentation

Optimal Learning of Joint Alignments with a Faulty Oracle Charalampos E. Tsourakakis ctsourak@bu.edu Boston University ISIT 2020 Optimal Learning of Joint Alignments with a Faulty Oracle 1 / 35 Joint work with: Kasper Green Larsen Michael


slide-1
SLIDE 1

Optimal Learning of Joint Alignments with a Faulty Oracle

Charalampos E. Tsourakakis ctsourak@bu.edu Boston University ISIT 2020

Optimal Learning of Joint Alignments with a Faulty Oracle 1 / 35

slide-2
SLIDE 2

Joint work with:

Kasper Green Larsen Michael Mitzenmacher

Optimal Learning of Joint Alignments with a Faulty Oracle 2 / 35

slide-3
SLIDE 3

Datasets modeled as graphs

World Wide Web Internet Social network Connectome Airline network Images

Optimal Learning of Joint Alignments with a Faulty Oracle 3 / 35

slide-4
SLIDE 4

Graphs from probing/testing pairs of items

(a) (b) (a) Humans in the loop for entity resolution (b) Protein-protein interactions

Optimal Learning of Joint Alignments with a Faulty Oracle 4 / 35

slide-5
SLIDE 5

Joint alignment from pairwise differences

[Next four slides use material from Y. Chen’s slides]

  • n unknown variables g(0), . . . , g(n − 1)
  • k possible states,
  • described as the latent function g : [n] → [k]

g(0) = 5 g(1) = 7 . . .

  • Think of [n] as a set of nodes, and g(u) as the cluster id that

corresponds node u ∈ [n]

Optimal Learning of Joint Alignments with a Faulty Oracle 5 / 35

slide-6
SLIDE 6

Joint alignment from pairwise differences

  • Goal: learn latent function g : [n] → [k]
  • We obtain a noisy measurement of pairwise difference

˜ f (i, j) := (g(i) − g(j)+ some iid noise) mod k.

Optimal Learning of Joint Alignments with a Faulty Oracle 6 / 35

slide-7
SLIDE 7

Joint alignment from pairwise differences

Typical input to a multi-image alignment problem. We may compute pairwise noisy estimates of relative angles of rotation.

Optimal Learning of Joint Alignments with a Faulty Oracle 7 / 35

slide-8
SLIDE 8

Joint alignment from noisy pairwise differences

Desired output

Optimal Learning of Joint Alignments with a Faulty Oracle 8 / 35

slide-9
SLIDE 9

Joint alignment from pairwise differences

  • Clusters: k groups, numbered {0, 1, ..., k − 1} and that we think
  • f as being arranged modulo k
  • Cluster ids: g(u) refers to the cluster number associated with a

vertex u

  • Query/measurement: when we query an edge e = (x, y), we
  • btain

˜ f (x, y) =

  • g(x) − g(y) + ηxy
  • mod k

(1) where the additive noise values ηxy are i.i.d. random variables supported on {0, 1, · · · , k − 1}.

  • Problem. Recover g (up to a cyclic offset) with high

probability using as few measurements as possible and as fast as possible.

Optimal Learning of Joint Alignments with a Faulty Oracle 9 / 35

slide-10
SLIDE 10

Noise probability distribution

  • When we query an edge e = (x, y), we obtain

˜ f (x, y) =

  • g(x) − g(y) + ηxy
  • mod k

where the additive noise values ηxy are i.i.d. random variables supported on {0, 1, · · · , k − 1}. Pr [ηxy = i] =

  • 1

k + δ,

if i = 0;

1 k − δ k−1,

for each i = 0. (2)

  • We choose which pairs to query in a non-adaptive way.
  • We obtain a set of noisy measurements

{˜ f (i, j) = g(i) − g(j) + noise mod k}(i,j)∈Ω where Ω ⊆ [n]

2

  • is a

symmetric index set, wlog, a set of pairs {i, j} with i < j.

Optimal Learning of Joint Alignments with a Faulty Oracle 10 / 35

slide-11
SLIDE 11

Remark

Our MLE problem is a discrete, non-convex problem.

Optimal Learning of Joint Alignments with a Faulty Oracle 11 / 35

slide-12
SLIDE 12

Joint alignment - k = 2

  • Let V = [n] be the set of items
  • (Unknown) g : V → {−1, +1}
  • Red (R = {v ∈ V (G) : g(v) = −1})
  • Blue (B = {v ∈ V (G) : g(v) = +1})
  • Observation:

Define τ(u, v) = g(u)g(v) ∈ {±1} for any u, v ∈ V . Then, if τ(u, v) = −1, then u is in a different cluster than v

Optimal Learning of Joint Alignments with a Faulty Oracle 12 / 35

slide-13
SLIDE 13

Joint alignment - k = 2

  • Model: We can query any pair of nodes {u, v} once to get a

noisy measurement of τ(u, v). The oracle returns

  • ˜

τ(u, v) = g(u)g(v)ηu,v, where

  • ηu,v ∈ {±1} is iid noise in the edge observations
  • E [ηu,v] = δ for all pairs u, v ∈ V
  • Equivalently, for each query we receive the correct answer with

probability 1 − q = 1

2 + δ 2, where q > 0 is the corruption

probability.

  • Problem (k = 2): Recover g whp with as few queries to the
  • racle as possible.

Optimal Learning of Joint Alignments with a Faulty Oracle 13 / 35

slide-14
SLIDE 14

Related work – Overview

Optimal Learning of Joint Alignments with a Faulty Oracle 14 / 35

slide-15
SLIDE 15

Related Work – k = 2, Correlation Clustering

  • Correlation Clustering: given an undirected signed graph,

partition the nodes into clusters so that the total number of disagreements is minimized [Bansal et al., 2004, Shamir et al., 2004] (NP-hard)

  • Excellent survey by Bonchi et al. [Bonchi et al., 2014]
  • Mathieu and Schudy initiated the study of noisy correlation

clustering [Mathieu and Schudy, 2010]

  • complete information (all

n

2

  • signs)
  • cardinality constraints on clusters (Ω(√n)))

Optimal Learning of Joint Alignments with a Faulty Oracle 15 / 35

slide-16
SLIDE 16

Related Work – k = 2, Planted Partition

Planted Partition Model

  • Two groups (clusters) of nodes
  • A graph is generated as follows. Edge probabilities are
  • p within each cluster,
  • and q < p across the clusters.
  • Problem: Recover the two clusters given such a graph.

Results

  • If the two clusters are balanced, i.e., each cluster has O(n)

nodes, then one can recover the clusters whp , see [McSherry, 2001, Vu, 2014, Abbe et al., 2016, Hajek et al., 2016].

Optimal Learning of Joint Alignments with a Faulty Oracle 16 / 35

slide-17
SLIDE 17

Related Work – k = 2, # Queries as a function of the imbalance

  • Matrix completion techniques [Cand`

es et al., 2006] can be used to predict signs of edges [Chiang et al., 2014]

  • γ =

max

cluster C n |C|

  • The number of queries needed for exact recovery is

O(γ4n log2 n),

  • Finally, Mazumdar and Saha study the case k = 2 and achieve

recovery in poly-time using O(n log n/δ4) queries [Mazumdar and Saha, 2016]

  • State-of-the-art is due to [CT, Mitzenmacher, Larsen Webconf

2020]

Optimal Learning of Joint Alignments with a Faulty Oracle 17 / 35

slide-18
SLIDE 18

Related Work – k ≥ 3

  • Joint alignment: Chen and Candes consider a similar setting as
  • urs, and propose a projected power method to solve the

non-convex maximum likelihood estimation problem [Chen and Candes, 2016].

Optimal Learning of Joint Alignments with a Faulty Oracle 18 / 35

slide-19
SLIDE 19

Related Work – k ≥ 3

  • Chen and Candes formulate the problem as a constrained PCA

problem, and show that a non-convex, projected, power method solves the problem with high probability when the random queries form a random Erd¨

  • s-R´

enyi graph.

Optimal Learning of Joint Alignments with a Faulty Oracle 19 / 35

slide-20
SLIDE 20

Related Work – k ≥ 3

  • The Chen-Candes algorithm is non-adaptive, and the

underlying queries form a random binomial graph

  • They show that, in the setting where queries form a random

binomial graph, the minimax probability of error tends to 1 if the number of queries is less than Ω n log n

kδ2

  • The query complexity matches the lower bound
  • Before, inferior results were obtained by Mitzenmacher and

Tsourakakis.

Optimal Learning of Joint Alignments with a Faulty Oracle 20 / 35

slide-21
SLIDE 21

Older result (2018) – Mitzenmacher-T.

We prove the following result. Our proof uses BFS as its subroutine. Theorem There exists a polynomial time algorithm that performs O(n1+o(1)) queries, and recovers g (up to some global offset) whp for any 1 − q = 1+δ

2 , where 0 < δ ≤ 1 is any positive constant.

  • The o(1) term in the exponent is

1 log log n.

Optimal Learning of Joint Alignments with a Faulty Oracle 21 / 35

slide-22
SLIDE 22

Upper bound – Larsen- Mitzenmacher-T. (2019)

Theorem 1. (extremely small bias) If (lg n/nk)1/4 ≤ δ ≤ 1/2k and k ≤ no(1), then there is a non-adaptive and deterministic query algorithm that makes O( n log n

δ2k ) queries, runs in O( n log n δ2k ) time and is

correct whp . Theorem 2. (larger bias) If 1/2k ≤ δ ≤ 1/4 and k ≤ no(1), then there is a non-adaptive and deterministic query algorithm that makes O( n log n

δ

) queries, runs in O( n log n

δ

) time and is correct whp .

Optimal Learning of Joint Alignments with a Faulty Oracle 22 / 35

slide-23
SLIDE 23

Proposed algorithm - Step 1 O(n log n

kδ2 ) queries

Optimal Learning of Joint Alignments with a Faulty Oracle 23 / 35

slide-24
SLIDE 24

Proposed algorithm – Step 2 “grounding”

Optimal Learning of Joint Alignments with a Faulty Oracle 24 / 35

slide-25
SLIDE 25

Proposed algorithm – Learn {g(x)}x∈S up to cyclic

  • ffset

Optimal Learning of Joint Alignments with a Faulty Oracle 25 / 35

slide-26
SLIDE 26

Proposed algorithm – Learn {g(x)}x∈V \S up to (the same) cyclic offset

Optimal Learning of Joint Alignments with a Faulty Oracle 26 / 35

slide-27
SLIDE 27

Learning Joint Alignment with a Faulty Oracle

1 Choose S ⊆ V such that |S| = O( log n kδ2 ) if 0 ≤ δ ≤ 1/2k and

|S| = O( lg n

δ ) if 1/2k ≤ δ ≤ 1/4. 2 Perform all queries between S and V \ S. 3 Fix a node s ∈ S and assign it the label ˆ

g(s) = 0.

4 For each s′ ∈ S \ {s}, compute an estimate µs′ of

(g(s′) − g(s)) mod k using the plurality vote among the queries {˜ f (s′, b) − ˜ f (s, b)}b∈V \S and assign s′ the label ˆ g(s′) = µs′.

5 For each v /

∈ V \ S, assign it a label corresponding to the result

  • f the plurality vote among {ˆ

g(s) + ˜ f (v, s)}s∈S.

Optimal Learning of Joint Alignments with a Faulty Oracle 27 / 35

slide-28
SLIDE 28

Optimality (Lower Bound)

We also prove a matching lower bound. Theorem 3. (extremely small bias) If 1/n1/4 ≤ δ ≤ 1/2k and k ≤ no(1), then any non-adaptive and possibly randomized query algorithm making o( n log n

δ2k ) queries has success probability at most

exp(−nΩ(1)). Theorem 3. (larger bias) If 1/2k ≤ δ ≤ 1/4 and k ≤ no(1), then any non-adaptive and possibly randomized query algorithm making

  • ( n log n

δ

) queries has success probability at most exp(−nΩ(1)).

Optimal Learning of Joint Alignments with a Faulty Oracle 28 / 35

slide-29
SLIDE 29

Anti-concentration – small δ

Lemma: Let k ≥ 2 be an integer, let 0 ≤ δ ≤ 1/2k and let X1, . . . , Xn be i.i.d. random variables such that each Xi takes the value 1 with probability 1/k + δ, the value −1 with probability 1/k − δ/(k − 1) and the value 0 otherwise. There exists constants c1, c2 > 0 such that: Pr[

  • i

Xi ≤ 0] ≤ c1 exp(−δ2nk/c1) and Pr[

  • i

Xi ≤ 0] ≥ c−1

2

exp(−δ2nkc2).

Optimal Learning of Joint Alignments with a Faulty Oracle 29 / 35

slide-30
SLIDE 30

Anti-concentration – large δ

Lemma: Let k ≥ 2 be an integer, let 1/2k < δ ≤ 1/4 and let X1, . . . , Xn be i.i.d. random variables such that each Xi takes the value 1 with probability 1/k + δ, the value −1 with probability 1/k − δ/(k − 1) and the value 0 otherwise. There exists constants c1, c2 > 0 such that: Pr[

  • i

Xi ≤ 0] ≤ c1 exp(−δn/c1) and Pr[

  • i

Xi ≤ 0] ≥ c−1

2

exp(−δnc2).

Optimal Learning of Joint Alignments with a Faulty Oracle 30 / 35

slide-31
SLIDE 31

Open Questions

  • Does there exist adaptive algorithm with better query

complexity?

  • Can we characterize the performance of existing joint alignment

algorithms if one is satisfied with approximate solutions? thank you!

Optimal Learning of Joint Alignments with a Faulty Oracle 31 / 35

slide-32
SLIDE 32

references I

Abbe, E., Bandeira, A. S., and Hall, G. (2016). Exact recovery in the stochastic block model. IEEE Transactions on Information Theory, 62(1):471–487. Bansal, N., Blum, A., and Chawla, S. (2004). Correlation clustering. Machine Learning, 56(1-3):89–113. Bonchi, F., Garcia-Soriano, D., and Liberty, E. (2014). Correlation clustering: from theory to practice. In KDD, page 1972.

Optimal Learning of Joint Alignments with a Faulty Oracle 32 / 35

slide-33
SLIDE 33

references II

Cand` es, E. J., Romberg, J., and Tao, T. (2006). Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on information theory, 52(2):489–509. Chen, Y. and Candes, E. (2016). The projected power method: An efficient algorithm for joint alignment from pairwise differences. arXiv preprint arXiv:1609.05820. Chiang, K.-Y., Hsieh, C.-J., Natarajan, N., Dhillon, I. S., and Tewari, A. (2014). Prediction and clustering in signed networks: a local to global perspective. Journal of Machine Learning Research, 15(1):1177–1213.

Optimal Learning of Joint Alignments with a Faulty Oracle 33 / 35

slide-34
SLIDE 34

references III

Hajek, B., Wu, Y., and Xu, J. (2016). Achieving exact cluster recovery threshold via semidefinite programming. IEEE Transactions on Information Theory, 62(5):2788–2797. Mathieu, C. and Schudy, W. (2010). Correlation clustering with noisy input. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pages 712–728. Society for Industrial and Applied Mathematics. Mazumdar, A. and Saha, B. (2016). Clustering via crowdsourcing. arXiv preprint arXiv:1604.01839.

Optimal Learning of Joint Alignments with a Faulty Oracle 34 / 35

slide-35
SLIDE 35

references IV

McSherry, F. (2001). Spectral partitioning of random graphs. In Proceedings. 42nd IEEE Symposium on Foundations of Computer Science (FOCS), pages 529–537. IEEE. Shamir, R., Sharan, R., and Tsur, D. (2004). Cluster graph modification problems. Discrete Applied Mathematics, 144(1):173–182. Vu, V. (2014). A simple svd algorithm for finding hidden partitions. arXiv preprint arXiv:1404.3918.

Optimal Learning of Joint Alignments with a Faulty Oracle 35 / 35