Optimal Learning of Joint Alignments with a Faulty Oracle - PowerPoint PPT Presentation

Optimal Learning of Joint Alignments with a Faulty Oracle Charalampos E. Tsourakakis ctsourak@bu.edu Boston University ISIT 2020 Optimal Learning of Joint Alignments with a Faulty Oracle 1 / 35

Joint work with: Kasper Green Larsen Michael Mitzenmacher Optimal Learning of Joint Alignments with a Faulty Oracle 2 / 35

Datasets modeled as graphs World Wide Web Internet Social network Connectome Airline network Images Optimal Learning of Joint Alignments with a Faulty Oracle 3 / 35

Graphs from probing/testing pairs of items (a) (b) (a) Humans in the loop for entity resolution (b) Protein-protein interactions Optimal Learning of Joint Alignments with a Faulty Oracle 4 / 35

Joint alignment from pairwise differences [Next four slides use material from Y. Chen’s slides] • n unknown variables g (0) , . . . , g ( n − 1) • k possible states, • described as the latent function g : [ n ] → [ k ] g (0) = 5 g (1) = 7 . . . • Think of [ n ] as a set of nodes, and g ( u ) as the cluster id that corresponds node u ∈ [ n ] Optimal Learning of Joint Alignments with a Faulty Oracle 5 / 35

Joint alignment from pairwise differences • Goal: learn latent function g : [ n ] → [ k ] • We obtain a noisy measurement of pairwise difference ˜ f ( i , j ) := ( g ( i ) − g ( j )+ some iid noise) mod k . Optimal Learning of Joint Alignments with a Faulty Oracle 6 / 35

Joint alignment from pairwise differences Typical input to a multi-image alignment problem. We may compute pairwise noisy estimates of relative angles of rotation. Optimal Learning of Joint Alignments with a Faulty Oracle 7 / 35

Joint alignment from noisy pairwise differences Desired output Optimal Learning of Joint Alignments with a Faulty Oracle 8 / 35

Joint alignment from pairwise differences • Clusters: k groups, numbered { 0 , 1 , ..., k − 1 } and that we think of as being arranged modulo k • Cluster ids: g ( u ) refers to the cluster number associated with a vertex u • Query/measurement: when we query an edge e = ( x , y ), we obtain ˜ � � f ( x , y ) = g ( x ) − g ( y ) + η xy mod k (1) where the additive noise values η xy are i.i.d. random variables supported on { 0 , 1 , · · · , k − 1 } . • Problem. Recover g (up to a cyclic offset) with high probability using as few measurements as possible and as fast as possible. Optimal Learning of Joint Alignments with a Faulty Oracle 9 / 35

Noise probability distribution • When we query an edge e = ( x , y ), we obtain ˜ � � f ( x , y ) = g ( x ) − g ( y ) + η xy mod k where the additive noise values η xy are i.i.d. random variables supported on { 0 , 1 , · · · , k − 1 } . 1 � k + δ, if i = 0; Pr [ η xy = i ] = (2) 1 δ k − k − 1 , for each i � = 0 . • We choose which pairs to query in a non-adaptive way. • We obtain a set of noisy measurements { ˜ � [ n ] � f ( i , j ) = g ( i ) − g ( j ) + noise mod k } ( i , j ) ∈ Ω where Ω ⊆ is a 2 symmetric index set, wlog, a set of pairs { i , j } with i < j . Optimal Learning of Joint Alignments with a Faulty Oracle 10 / 35

Remark Our MLE problem is a discrete, non-convex problem. Optimal Learning of Joint Alignments with a Faulty Oracle 11 / 35

Joint alignment - k = 2 - Let V = [ n ] be the set of items - (Unknown) g : V → {− 1 , +1 } • Red ( R = { v ∈ V ( G ) : g ( v ) = − 1 } ) • Blue ( B = { v ∈ V ( G ) : g ( v ) = +1 } ) - Observation: Define τ ( u , v ) = g ( u ) g ( v ) ∈ {± 1 } for any u , v ∈ V . Then, if τ ( u , v ) = − 1, then u is in a different cluster than v Optimal Learning of Joint Alignments with a Faulty Oracle 12 / 35

Joint alignment - k = 2 - Model: We can query any pair of nodes { u , v } once to get a noisy measurement of τ ( u , v ). The oracle returns • ˜ τ ( u , v ) = g ( u ) g ( v ) η u , v , where • η u , v ∈ {± 1 } is iid noise in the edge observations • E [ η u , v ] = δ for all pairs u , v ∈ V - Equivalently, for each query we receive the correct answer with probability 1 − q = 1 2 + δ 2 , where q > 0 is the corruption probability. - Problem ( k = 2 ): Recover g whp with as few queries to the oracle as possible. Optimal Learning of Joint Alignments with a Faulty Oracle 13 / 35

Related work – Overview Optimal Learning of Joint Alignments with a Faulty Oracle 14 / 35

Related Work – k = 2, Correlation Clustering • Correlation Clustering: given an undirected signed graph, partition the nodes into clusters so that the total number of disagreements is minimized [Bansal et al., 2004, Shamir et al., 2004] ( NP-hard ) • Excellent survey by Bonchi et al. [Bonchi et al., 2014] • Mathieu and Schudy initiated the study of noisy correlation clustering [Mathieu and Schudy, 2010] � n • complete information (all � signs) • cardinality constraints on clusters (Ω( √ n ))) 2 Optimal Learning of Joint Alignments with a Faulty Oracle 15 / 35

Related Work – k = 2, Planted Partition Planted Partition Model • Two groups (clusters) of nodes • A graph is generated as follows. Edge probabilities are • p within each cluster, • and q < p across the clusters. • Problem: Recover the two clusters given such a graph. Results • If the two clusters are balanced, i.e., each cluster has O ( n ) nodes, then one can recover the clusters whp , see [McSherry, 2001, Vu, 2014, Abbe et al., 2016, Hajek et al., 2016]. Optimal Learning of Joint Alignments with a Faulty Oracle 16 / 35

Related Work – k = 2, # Queries as a function of the imbalance • Matrix completion techniques [Cand` es et al., 2006] can be used to predict signs of edges [Chiang et al., 2014] • γ = n max | C | cluster C • The number of queries needed for exact recovery is O ( γ 4 n log 2 n ), • Finally, Mazumdar and Saha study the case k = 2 and achieve recovery in poly-time using O ( n log n /δ 4 ) queries [Mazumdar and Saha, 2016] • State-of-the-art is due to [CT, Mitzenmacher, Larsen Webconf 2020] Optimal Learning of Joint Alignments with a Faulty Oracle 17 / 35

Related Work – k ≥ 3 • Joint alignment: Chen and Candes consider a similar setting as ours, and propose a projected power method to solve the non-convex maximum likelihood estimation problem [Chen and Candes, 2016]. Optimal Learning of Joint Alignments with a Faulty Oracle 18 / 35

Related Work – k ≥ 3 • Chen and Candes formulate the problem as a constrained PCA problem, and show that a non-convex, projected, power method solves the problem with high probability when the random queries form a random Erd¨ os-R´ enyi graph. Optimal Learning of Joint Alignments with a Faulty Oracle 19 / 35

Related Work – k ≥ 3 • The Chen-Candes algorithm is non-adaptive, and the underlying queries form a random binomial graph • They show that, in the setting where queries form a random binomial graph, the minimax probability of error tends to 1 if the � n log n � number of queries is less than Ω k δ 2 • The query complexity matches the lower bound • Before, inferior results were obtained by Mitzenmacher and Tsourakakis. Optimal Learning of Joint Alignments with a Faulty Oracle 20 / 35

Older result (2018) – Mitzenmacher-T. We prove the following result. Our proof uses BFS as its subroutine. Theorem There exists a polynomial time algorithm that performs O ( n 1+ o (1) ) queries, and recovers g (up to some global offset) whp for any 1 − q = 1+ δ 2 , where 0 < δ ≤ 1 is any positive constant. 1 • The o (1) term in the exponent is log log n . Optimal Learning of Joint Alignments with a Faulty Oracle 21 / 35

Upper bound – Larsen- Mitzenmacher-T. (2019) Theorem 1. (extremely small bias) If (lg n / nk ) 1 / 4 ≤ δ ≤ 1 / 2 k and k ≤ n o (1) , then there is a non-adaptive and deterministic query algorithm that makes O ( n log n δ 2 k ) queries, runs in O ( n log n δ 2 k ) time and is correct whp . Theorem 2. (larger bias) If 1 / 2 k ≤ δ ≤ 1 / 4 and k ≤ n o (1) , then there is a non-adaptive and deterministic query algorithm that makes O ( n log n ) queries, runs in O ( n log n ) time and is correct whp . δ δ Optimal Learning of Joint Alignments with a Faulty Oracle 22 / 35

Proposed algorithm - Step 1 O ( n log n k δ 2 ) queries Optimal Learning of Joint Alignments with a Faulty Oracle 23 / 35

Proposed algorithm – Step 2 “grounding” Optimal Learning of Joint Alignments with a Faulty Oracle 24 / 35

Proposed algorithm – Learn { g ( x ) } x ∈ S up to cyclic offset Optimal Learning of Joint Alignments with a Faulty Oracle 25 / 35

Proposed algorithm – Learn { g ( x ) } x ∈ V \ S up to (the same) cyclic offset Optimal Learning of Joint Alignments with a Faulty Oracle 26 / 35

Learning Joint Alignment with a Faulty Oracle 1 Choose S ⊆ V such that | S | = O ( log n k δ 2 ) if 0 ≤ δ ≤ 1 / 2 k and | S | = O ( lg n δ ) if 1 / 2 k ≤ δ ≤ 1 / 4. 2 Perform all queries between S and V \ S . 3 Fix a node s ∈ S and assign it the label ˆ g ( s ) = 0. 4 For each s ′ ∈ S \ { s } , compute an estimate µ s ′ of ( g ( s ′ ) − g ( s )) mod k using the plurality vote among the queries f ( s , b ) } b ∈ V \ S and assign s ′ the label ˆ { ˜ f ( s ′ , b ) − ˜ g ( s ′ ) = µ s ′ . 5 For each v / ∈ V \ S , assign it a label corresponding to the result g ( s ) + ˜ of the plurality vote among { ˆ f ( v , s ) } s ∈ S . Optimal Learning of Joint Alignments with a Faulty Oracle 27 / 35

Optimal Learning of Joint Alignments with a Faulty Oracle - PowerPoint PPT Presentation

Optimal Learning of Joint Alignments with a Faulty Oracle Charalampos E. Tsourakakis ctsourak@bu.edu Boston University ISIT 2020 Optimal Learning of Joint Alignments with a Faulty Oracle 1 / 35 Joint work with: Kasper Green Larsen Michael

CSCE 471/871 Lecture 2: Alignments Pairwise Alignments Stephen Scott Alignments Scoring

Multiple Alignments and Phylogenies Mark Voorhies 3/29/2012 Mark Voorhies Multiple Alignments

Multiple Alignments and Phylogenies Mark Voorhies 3/31/2011 Mark Voorhies Multiple Alignments

DEBRA Members Weekend 2017 EB therapy research in Dundee: an update Knocking out the faulty

Oracle Buys AmberPoint Strengthens Oracle Fusion Middleware SOA Suite and Oracle Enterprise

Oracle eBusiness Suite 11i Integration Ulrich Janke Oracle Consulting Deutschland Page 1

Lecture 13: Oracle Turing Machines Arijit Bishnu 13.04.2010 Oracle Turing Machines

Pairwise sequence alignments Volker Flegel Vassilios Ioannidis VI - 2004 Page 1 Outline

Multiple Sequence Multiple Sequence Alignments Alignments Multiple alignment Pairwise

Global and local alignments Global vs. local alignments Global: align all nucleotides

Oracle SOA Suite Enterprise Service Bus Oracle Integration Product Management Multi Tiered

Oracle SOA Suite Enterprise Service Bus Oracle Integration Product Management Oracle ESB Header

Oracle Buys Ksplice Oracle Linux Enhanced with Zero Downtime Software Updates July 21, 2011

Oracle Database 11g Highly Available Grid made easy with Oracle Enterprise Manager Venkat

Oracle Partner Network (OPN) Specialisms Andy Butchart - Prject (EU) Ltd Frank Lauer - Oracle

. Bruno Durand LIRMM CNRS Universit de Montpellier II November26 th 2011 . . . 1.

Diagrammatic Quantum Reasoning: Completeness and Incompleteness Simon Perdrix CNRS, Loria,

New Trends on Exploratory Methods for Data Analytics Davide Mottin, Matteo Lissandrini , Yannis

Towards a Coalgebraic Chomsky Hierarchy Sergey Goncharov , Stefan Milius, Alexandra Silva CMCS

Graphical Linear Algebra PhD Open, University of Warsaw Pawel Sobocinski University of

Hypergraph categories as cospan algebras Brendan Fong, with David Spivak Category Theory 2018

Brzozowskis algorithm (co)algebraically Jan Rutten CWI & Radboud University 1. Example

Mining Heavy Subgraphs in Time-Evolving Networks Petko Bogdanov (petko@cs.ucsb.edu) Misael

Landmark indexing for scalable evaluation of label-constrained reachability queries Lucien

Optimal Learning of Joint Alignments with a Faulty Oracle - PowerPoint PPT Presentation

Optimal Learning of Joint Alignments with a Faulty Oracle Charalampos E. Tsourakakis ctsourak@bu.edu Boston University ISIT 2020 Optimal Learning of Joint Alignments with a Faulty Oracle 1 / 35 Joint work with: Kasper Green Larsen Michael

CSCE 471/871 Lecture 2: Alignments Pairwise Alignments Stephen Scott Alignments Scoring

Multiple Alignments and Phylogenies Mark Voorhies 3/29/2012 Mark Voorhies Multiple Alignments

Multiple Alignments and Phylogenies Mark Voorhies 3/31/2011 Mark Voorhies Multiple Alignments

DEBRA Members Weekend 2017 EB therapy research in Dundee: an update Knocking out the faulty

Oracle Buys AmberPoint Strengthens Oracle Fusion Middleware SOA Suite and Oracle Enterprise

Oracle eBusiness Suite 11i Integration Ulrich Janke Oracle Consulting Deutschland Page 1

Lecture 13: Oracle Turing Machines Arijit Bishnu 13.04.2010 Oracle Turing Machines

Pairwise sequence alignments Volker Flegel Vassilios Ioannidis VI - 2004 Page 1 Outline

Multiple Sequence Multiple Sequence Alignments Alignments Multiple alignment Pairwise

Global and local alignments Global vs. local alignments Global: align all nucleotides

Oracle SOA Suite Enterprise Service Bus Oracle Integration Product Management Multi Tiered

Oracle SOA Suite Enterprise Service Bus Oracle Integration Product Management Oracle ESB Header

Oracle Buys Ksplice Oracle Linux Enhanced with Zero Downtime Software Updates July 21, 2011

Oracle Database 11g Highly Available Grid made easy with Oracle Enterprise Manager Venkat

Oracle Partner Network (OPN) Specialisms Andy Butchart - Prject (EU) Ltd Frank Lauer - Oracle

. Bruno Durand LIRMM CNRS Universit de Montpellier II November26 th 2011 . . . 1.

Diagrammatic Quantum Reasoning: Completeness and Incompleteness Simon Perdrix CNRS, Loria,

New Trends on Exploratory Methods for Data Analytics Davide Mottin, Matteo Lissandrini , Yannis

Towards a Coalgebraic Chomsky Hierarchy Sergey Goncharov , Stefan Milius, Alexandra Silva CMCS

Graphical Linear Algebra PhD Open, University of Warsaw Pawel Sobocinski University of

Hypergraph categories as cospan algebras Brendan Fong, with David Spivak Category Theory 2018

Brzozowskis algorithm (co)algebraically Jan Rutten CWI &amp; Radboud University 1. Example

Mining Heavy Subgraphs in Time-Evolving Networks Petko Bogdanov (petko@cs.ucsb.edu) Misael

Landmark indexing for scalable evaluation of label-constrained reachability queries Lucien

Brzozowskis algorithm (co)algebraically Jan Rutten CWI & Radboud University 1. Example