Graph Resistance and Learning from Pairwise Comparisons pairwise - - PowerPoint PPT Presentation

graph resistance and learning from pairwise comparisons
SMART_READER_LITE
LIVE PREVIEW

Graph Resistance and Learning from Pairwise Comparisons pairwise - - PowerPoint PPT Presentation

Alex Olshevsky Department of ECE, Boston University Joint work with Julien Hendrickx (UC Louvain) and Venkatesh Saligrama (BU) Graph Resistance and Learning from Pairwise Comparisons pairwise comparisons of items. In many contexts,


slide-1
SLIDE 1

Graph Resistance and Learning from Pairwise Comparisons

Alex Olshevsky

Department of ECE, Boston University Joint work with Julien Hendrickx (UC Louvain) and Venkatesh Saligrama (BU)

slide-2
SLIDE 2

Problem Statement

  • Given a collection of items with unknown qualities w1, . . . , wn,

we want to compute w = (w1, . . . , wn) up to scaling from pairwise comparisons of items.

  • In many contexts, comparisons are the right way to model the

available data:

  • A patient compares how painful or helpful two treatments have

been.

  • A customer purchases one of several items recommended by an

e-commerce site.

  • A user clicks on one of the items suggested by a search engine.
  • A user chooses one of several movies recommended by a

streaming site.

1

slide-3
SLIDE 3

Problem Statement

  • Given a collection of items with unknown qualities w1, . . . , wn,

we want to compute w = (w1, . . . , wn) up to scaling from pairwise comparisons of items.

  • In many contexts, comparisons are the right way to model the

available data:

  • A patient compares how painful or helpful two treatments have

been.

  • A customer purchases one of several items recommended by an

e-commerce site.

  • A user clicks on one of the items suggested by a search engine.
  • A user chooses one of several movies recommended by a

streaming site.

1

slide-4
SLIDE 4

Problem Statement

  • Given a collection of items with unknown qualities w1, . . . , wn,

we want to compute w = (w1, . . . , wn) up to scaling from pairwise comparisons of items.

  • In many contexts, comparisons are the right way to model the

available data:

  • A patient compares how painful or helpful two treatments have

been.

  • A customer purchases one of several items recommended by an

e-commerce site.

  • A user clicks on one of the items suggested by a search engine.
  • A user chooses one of several movies recommended by a

streaming site.

1

slide-5
SLIDE 5

Problem Statement

  • Given a collection of items with unknown qualities w1, . . . , wn,

we want to compute w = (w1, . . . , wn) up to scaling from pairwise comparisons of items.

  • In many contexts, comparisons are the right way to model the

available data:

  • A patient compares how painful or helpful two treatments have

been.

  • A customer purchases one of several items recommended by an

e-commerce site.

  • A user clicks on one of the items suggested by a search engine.
  • A user chooses one of several movies recommended by a

streaming site.

1

slide-6
SLIDE 6

Problem Statement

  • Given a collection of items with unknown qualities w1, . . . , wn,

we want to compute w = (w1, . . . , wn) up to scaling from pairwise comparisons of items.

  • In many contexts, comparisons are the right way to model the

available data:

  • A patient compares how painful or helpful two treatments have

been.

  • A customer purchases one of several items recommended by an

e-commerce site.

  • A user clicks on one of the items suggested by a search engine.
  • A user chooses one of several movies recommended by a

streaming site.

1

slide-7
SLIDE 7

Problem Statement

  • Given a collection of items with unknown qualities w1, . . . , wn,

we want to compute w = (w1, . . . , wn) up to scaling from pairwise comparisons of items.

  • In many contexts, comparisons are the right way to model the

available data:

  • A patient compares how painful or helpful two treatments have

been.

  • A customer purchases one of several items recommended by an

e-commerce site.

  • A user clicks on one of the items suggested by a search engine.
  • A user chooses one of several movies recommended by a

streaming site.

1

slide-8
SLIDE 8

The Simplest Possible Model: BTL over a graph

  • Items are compared according to the Bradley-Terry-Luce (BTL)

model: probability that item i wins against item j is wi wi + wj

  • There are a number of models for item comparisons, and the

BTL model is arguably the simplest.

  • We assume that there is an underlying “comparison graph” G

and if i j is an edge in this graph, items i and j are compared k times.

  • We do not choose the comparison graph.
  • Goal: understand how fast the error decays with k and G.

2

slide-9
SLIDE 9

The Simplest Possible Model: BTL over a graph

  • Items are compared according to the Bradley-Terry-Luce (BTL)

model: probability that item i wins against item j is wi wi + wj

  • There are a number of models for item comparisons, and the

BTL model is arguably the simplest.

  • We assume that there is an underlying “comparison graph” G

and if i j is an edge in this graph, items i and j are compared k times.

  • We do not choose the comparison graph.
  • Goal: understand how fast the error decays with k and G.

2

slide-10
SLIDE 10

The Simplest Possible Model: BTL over a graph

  • Items are compared according to the Bradley-Terry-Luce (BTL)

model: probability that item i wins against item j is wi wi + wj

  • There are a number of models for item comparisons, and the

BTL model is arguably the simplest.

  • We assume that there is an underlying “comparison graph” G

and if (i, j) is an edge in this graph, items i and j are compared k times.

  • We do not choose the comparison graph.
  • Goal: understand how fast the error decays with k and G.

2

slide-11
SLIDE 11

The Simplest Possible Model: BTL over a graph

  • Items are compared according to the Bradley-Terry-Luce (BTL)

model: probability that item i wins against item j is wi wi + wj

  • There are a number of models for item comparisons, and the

BTL model is arguably the simplest.

  • We assume that there is an underlying “comparison graph” G

and if (i, j) is an edge in this graph, items i and j are compared k times.

  • We do not choose the comparison graph.
  • Goal: understand how fast the error decays with k and G.

2

slide-12
SLIDE 12

The Simplest Possible Model: BTL over a graph

  • Items are compared according to the Bradley-Terry-Luce (BTL)

model: probability that item i wins against item j is wi wi + wj

  • There are a number of models for item comparisons, and the

BTL model is arguably the simplest.

  • We assume that there is an underlying “comparison graph” G

and if (i, j) is an edge in this graph, items i and j are compared k times.

  • We do not choose the comparison graph.
  • Goal: understand how fast the error decays with k and G.

2

slide-13
SLIDE 13

Example

1 1 2 1 3 1 4

  • Each edge label represents the outcomes of noisy comparisons.
  • Need to compute (scaled versions of) w1, w2, w3, w4 from these

measurements.

3

slide-14
SLIDE 14

Previous Work – I

  • The dominant approach has been to construct a Markov chain

based on the data whose stationary distribution is an estimate

  • f the true weights.
  • First proposed by [Dwork, Kumar, Naor, Sivakumar, WWW 2001]

and first analyzed [Neghaban, Oh, Shah, NeurIPS 2012]. Under the assumption max

i j

wi wj b the estimate W satisfies

w w

1

W

2 2 w w

1

2 2

O 1 k b5 log n

2 2

dmax d2

min

  • Worst case scaling is O n7 k .
  • Scaling with degrees recently improved by [Agarwal, Patil,

Agarwal, ICML 2018].

4

slide-15
SLIDE 15

Previous Work – I

  • The dominant approach has been to construct a Markov chain

based on the data whose stationary distribution is an estimate

  • f the true weights.
  • First proposed by [Dwork, Kumar, Naor, Sivakumar, WWW 2001]

and first analyzed [Neghaban, Oh, Shah, NeurIPS 2012]. Under the assumption max

i,j

wi wj ≤ b, the estimate ˆ W satisfies

  • w

||w||1 − ˆ

W

  • 2

2

  • w

||w||1

  • 2

2

≤ O ( 1 k ) b5 log n λ2

2

dmax d2

min

,

  • Worst case scaling is O n7 k .
  • Scaling with degrees recently improved by [Agarwal, Patil,

Agarwal, ICML 2018].

4

slide-16
SLIDE 16

Previous Work – I

  • The dominant approach has been to construct a Markov chain

based on the data whose stationary distribution is an estimate

  • f the true weights.
  • First proposed by [Dwork, Kumar, Naor, Sivakumar, WWW 2001]

and first analyzed [Neghaban, Oh, Shah, NeurIPS 2012]. Under the assumption max

i,j

wi wj ≤ b, the estimate ˆ W satisfies

  • w

||w||1 − ˆ

W

  • 2

2

  • w

||w||1

  • 2

2

≤ O ( 1 k ) b5 log n λ2

2

dmax d2

min

,

  • Worst case scaling is O(n7/k).
  • Scaling with degrees recently improved by [Agarwal, Patil,

Agarwal, ICML 2018].

4

slide-17
SLIDE 17

Previous Work – I

  • The dominant approach has been to construct a Markov chain

based on the data whose stationary distribution is an estimate

  • f the true weights.
  • First proposed by [Dwork, Kumar, Naor, Sivakumar, WWW 2001]

and first analyzed [Neghaban, Oh, Shah, NeurIPS 2012]. Under the assumption max

i,j

wi wj ≤ b, the estimate ˆ W satisfies

  • w

||w||1 − ˆ

W

  • 2

2

  • w

||w||1

  • 2

2

≤ O ( 1 k ) b5 log n λ2

2

dmax d2

min

,

  • Worst case scaling is O(n7/k).
  • Scaling with degrees recently improved by [Agarwal, Patil,

Agarwal, ICML 2018].

4

slide-18
SLIDE 18

Previous Work and Motivation

  • Computing the maximum likelihood estimator (which can be

done in polynomial time) was considered in [Shah, Balakrishnan, Bradley, Parekh, Ramchandran, Wainwright, JMLR 16].

  • The error bound was

Ob 1 m n

2 L

E W log w

2 2 b

1 m max n2 max

l 2 n l i 0 99l

1

i L

after m samples, where L is the Laplacian of the comparison graph, and Ob

b

denotes that the constant within the O notation depends on b.

  • Our concern I: we want matching upper and lower bounds.
  • Our concern II: what is the relevant graph-theoretic quantity?

5

slide-19
SLIDE 19

Previous Work and Motivation

  • Computing the maximum likelihood estimator (which can be

done in polynomial time) was considered in [Shah, Balakrishnan, Bradley, Parekh, Ramchandran, Wainwright, JMLR 16].

  • The error bound was

Ob ( 1 m ) n λ2(L) ≥ E [

  • ˆ

W − log w

  • 2

2

] ≥ Ωb ( 1 m ) max  n2, max

l=2,...,n l

i=⌈0.99l⌉

1 λi(L)  

after m samples, where L is the Laplacian of the comparison graph, and Ob(·), Ωb(·) denotes that the constant within the O(·) notation depends on b.

  • Our concern I: we want matching upper and lower bounds.
  • Our concern II: what is the relevant graph-theoretic quantity?

5

slide-20
SLIDE 20

Previous Work and Motivation

  • Computing the maximum likelihood estimator (which can be

done in polynomial time) was considered in [Shah, Balakrishnan, Bradley, Parekh, Ramchandran, Wainwright, JMLR 16].

  • The error bound was

Ob ( 1 m ) n λ2(L) ≥ E [

  • ˆ

W − log w

  • 2

2

] ≥ Ωb ( 1 m ) max  n2, max

l=2,...,n l

i=⌈0.99l⌉

1 λi(L)  

after m samples, where L is the Laplacian of the comparison graph, and Ob(·), Ωb(·) denotes that the constant within the O(·) notation depends on b.

  • Our concern I: we want matching upper and lower bounds.
  • Our concern II: what is the relevant graph-theoretic quantity?

5

slide-21
SLIDE 21

Previous Work and Motivation

  • Computing the maximum likelihood estimator (which can be

done in polynomial time) was considered in [Shah, Balakrishnan, Bradley, Parekh, Ramchandran, Wainwright, JMLR 16].

  • The error bound was

Ob ( 1 m ) n λ2(L) ≥ E [

  • ˆ

W − log w

  • 2

2

] ≥ Ωb ( 1 m ) max  n2, max

l=2,...,n l

i=⌈0.99l⌉

1 λi(L)  

after m samples, where L is the Laplacian of the comparison graph, and Ob(·), Ωb(·) denotes that the constant within the O(·) notation depends on b.

  • Our concern I: we want matching upper and lower bounds.
  • Our concern II: what is the relevant graph-theoretic quantity?

5

slide-22
SLIDE 22

Our results - I

  • We give satisfactory answers to these concerns but only when k

is large.

  • The standard way to measure the distance between subspaces

is through a sine of the angle: sin W w inf W w 2 w 2 This same as measures considered above up to factors of b.

  • First main result: we give a method such that when

k E log2 n , then with probability 1 , sin2 W w O b2Rmax 1 log 1 k sin2 W w O b4Ravg 1 log 1 k where Rmax Ravg are, respectively, the maximum and average resistance of the comparison graph.

6

slide-23
SLIDE 23

Our results - I

  • We give satisfactory answers to these concerns but only when k

is large.

  • The standard way to measure the distance between subspaces

is through a sine of the angle: | sin( ˆ W, w)| = inf

α

||α ˆ W − w||2 ||w||2 . This same as measures considered above up to factors of b.

  • First main result: we give a method such that when

k E log2 n , then with probability 1 , sin2 W w O b2Rmax 1 log 1 k sin2 W w O b4Ravg 1 log 1 k where Rmax Ravg are, respectively, the maximum and average resistance of the comparison graph.

6

slide-24
SLIDE 24

Our results - I

  • We give satisfactory answers to these concerns but only when k

is large.

  • The standard way to measure the distance between subspaces

is through a sine of the angle: | sin( ˆ W, w)| = inf

α

||α ˆ W − w||2 ||w||2 . This same as measures considered above up to factors of b.

  • First main result: we give a method such that when

k ≥ Ω ( |E| log2(n/δ) ) , then with probability 1 − δ, sin2( ˆ W, w) = O (b2Rmax(1 + log(1/δ)) k ) sin2( ˆ W, w) = O (b4Ravg(1 + log(1/δ)) k ) , where Rmax, Ravg are, respectively, the maximum and average resistance of the comparison graph.

6

slide-25
SLIDE 25

Our results - II

  • Second main result: when k ≥ √dmaxnRavg,

E [ sin2( ˆ W, w) ] ≥ Ravg k .

  • Punchline: the relevant graph-theoretic quantity is the graph

resistance.

  • Worst-case for sin2 W w (or other notions of squared distance)

is actually O n k when b O 1 .

7

slide-26
SLIDE 26

Our results - II

  • Second main result: when k ≥ √dmaxnRavg,

E [ sin2( ˆ W, w) ] ≥ Ravg k .

  • Punchline: the relevant graph-theoretic quantity is the graph

resistance.

  • Worst-case for sin2 W w (or other notions of squared distance)

is actually O n k when b O 1 .

7

slide-27
SLIDE 27

Our results - II

  • Second main result: when k ≥ √dmaxnRavg,

E [ sin2( ˆ W, w) ] ≥ Ravg k .

  • Punchline: the relevant graph-theoretic quantity is the graph

resistance.

  • Worst-case for sin2( ˆ

W, w) (or other notions of squared distance) is actually O(n/k) when b = O(1).

7

slide-28
SLIDE 28

Our method

  • We do the simplest possible thing.
  • On edge i j let Fij be the fraction of times i wins against j.
  • Observe that

E Fij E Fji wi wi wj wj wi wj wi wj

  • Our approach: solve the linear system of equations

log Fij Fji zi zj in the least-square sense, and set Wi ezi.

  • Can be done in nearly linear time due to work by [Spielman,

Teng, 2004].

8

slide-29
SLIDE 29

Our method

  • We do the simplest possible thing.
  • On edge (i, j) let Fij be the fraction of times i wins against j.
  • Observe that

E Fij E Fji wi wi wj wj wi wj wi wj

  • Our approach: solve the linear system of equations

log Fij Fji zi zj in the least-square sense, and set Wi ezi.

  • Can be done in nearly linear time due to work by [Spielman,

Teng, 2004].

8

slide-30
SLIDE 30

Our method

  • We do the simplest possible thing.
  • On edge (i, j) let Fij be the fraction of times i wins against j.
  • Observe that

E[Fij] E[Fji] = wi/(wi + wj) wj/(wi + wj) = wi wj

  • Our approach: solve the linear system of equations

log Fij Fji zi zj in the least-square sense, and set Wi ezi.

  • Can be done in nearly linear time due to work by [Spielman,

Teng, 2004].

8

slide-31
SLIDE 31

Our method

  • We do the simplest possible thing.
  • On edge (i, j) let Fij be the fraction of times i wins against j.
  • Observe that

E[Fij] E[Fji] = wi/(wi + wj) wj/(wi + wj) = wi wj

  • Our approach: solve the linear system of equations

log Fij Fji = zi − zj, in the least-square sense, and set ˆ Wi = ezi.

  • Can be done in nearly linear time due to work by [Spielman,

Teng, 2004].

8

slide-32
SLIDE 32

Our method

  • We do the simplest possible thing.
  • On edge (i, j) let Fij be the fraction of times i wins against j.
  • Observe that

E[Fij] E[Fji] = wi/(wi + wj) wj/(wi + wj) = wi wj

  • Our approach: solve the linear system of equations

log Fij Fji = zi − zj, in the least-square sense, and set ˆ Wi = ezi.

  • Can be done in nearly linear time due to work by [Spielman,

Teng, 2004].

8

slide-33
SLIDE 33

Why Resistance? The upper bound

  • As a toy example, imagine that the comparison graph is a line.
  • Our method learns something about the ratios

w1 w2 w2 w3 wn

1 wn. The squared error in estimating each

  • f these will decay like 1 k.
  • Relative errors multiply, e.g.

w3 w1 w2 w1 w3 w2 so if the two quantities on the right are known to some error, those errors will multiply.

  • But 1

n

1 n when errors are small, the total squared error will scale linearly with n.

  • Now imagine an arbitrary graph. Now for any two nodes i and j,

we can think about the error over all paths from i to j.

  • Error for each path will scale with length but will decreases

when you get to average more paths.

  • Clear parallel to resistance.

9

slide-34
SLIDE 34

Why Resistance? The upper bound

  • As a toy example, imagine that the comparison graph is a line.
  • Our method learns something about the ratios

w1/w2, w2/w3, . . . , wn−1/wn. The squared error in estimating each

  • f these will decay like 1/k.
  • Relative errors multiply, e.g.

w3 w1 w2 w1 w3 w2 so if the two quantities on the right are known to some error, those errors will multiply.

  • But 1

n

1 n when errors are small, the total squared error will scale linearly with n.

  • Now imagine an arbitrary graph. Now for any two nodes i and j,

we can think about the error over all paths from i to j.

  • Error for each path will scale with length but will decreases

when you get to average more paths.

  • Clear parallel to resistance.

9

slide-35
SLIDE 35

Why Resistance? The upper bound

  • As a toy example, imagine that the comparison graph is a line.
  • Our method learns something about the ratios

w1/w2, w2/w3, . . . , wn−1/wn. The squared error in estimating each

  • f these will decay like 1/k.
  • Relative errors multiply, e.g.

w3 w1 = w2 w1 w3 w2 , so if the two quantities on the right are known to some error, those errors will multiply.

  • But 1

n

1 n when errors are small, the total squared error will scale linearly with n.

  • Now imagine an arbitrary graph. Now for any two nodes i and j,

we can think about the error over all paths from i to j.

  • Error for each path will scale with length but will decreases

when you get to average more paths.

  • Clear parallel to resistance.

9

slide-36
SLIDE 36

Why Resistance? The upper bound

  • As a toy example, imagine that the comparison graph is a line.
  • Our method learns something about the ratios

w1/w2, w2/w3, . . . , wn−1/wn. The squared error in estimating each

  • f these will decay like 1/k.
  • Relative errors multiply, e.g.

w3 w1 = w2 w1 w3 w2 , so if the two quantities on the right are known to some error, those errors will multiply.

  • But (1 + ϵ)n ≈ 1 + nϵ when errors are small, the total squared

error will scale linearly with n.

  • Now imagine an arbitrary graph. Now for any two nodes i and j,

we can think about the error over all paths from i to j.

  • Error for each path will scale with length but will decreases

when you get to average more paths.

  • Clear parallel to resistance.

9

slide-37
SLIDE 37

Why Resistance? The upper bound

  • As a toy example, imagine that the comparison graph is a line.
  • Our method learns something about the ratios

w1/w2, w2/w3, . . . , wn−1/wn. The squared error in estimating each

  • f these will decay like 1/k.
  • Relative errors multiply, e.g.

w3 w1 = w2 w1 w3 w2 , so if the two quantities on the right are known to some error, those errors will multiply.

  • But (1 + ϵ)n ≈ 1 + nϵ when errors are small, the total squared

error will scale linearly with n.

  • Now imagine an arbitrary graph. Now for any two nodes i and j,

we can think about the error over all paths from i to j.

  • Error for each path will scale with length but will decreases

when you get to average more paths.

  • Clear parallel to resistance.

9

slide-38
SLIDE 38

Why Resistance? The upper bound

  • As a toy example, imagine that the comparison graph is a line.
  • Our method learns something about the ratios

w1/w2, w2/w3, . . . , wn−1/wn. The squared error in estimating each

  • f these will decay like 1/k.
  • Relative errors multiply, e.g.

w3 w1 = w2 w1 w3 w2 , so if the two quantities on the right are known to some error, those errors will multiply.

  • But (1 + ϵ)n ≈ 1 + nϵ when errors are small, the total squared

error will scale linearly with n.

  • Now imagine an arbitrary graph. Now for any two nodes i and j,

we can think about the error over all paths from i to j.

  • Error for each path will scale with length but will decreases

when you get to average more paths.

  • Clear parallel to resistance.

9

slide-39
SLIDE 39

Why Resistance? The upper bound

  • As a toy example, imagine that the comparison graph is a line.
  • Our method learns something about the ratios

w1/w2, w2/w3, . . . , wn−1/wn. The squared error in estimating each

  • f these will decay like 1/k.
  • Relative errors multiply, e.g.

w3 w1 = w2 w1 w3 w2 , so if the two quantities on the right are known to some error, those errors will multiply.

  • But (1 + ϵ)n ≈ 1 + nϵ when errors are small, the total squared

error will scale linearly with n.

  • Now imagine an arbitrary graph. Now for any two nodes i and j,

we can think about the error over all paths from i to j.

  • Error for each path will scale with length but will decreases

when you get to average more paths.

  • Clear parallel to resistance.

9

slide-40
SLIDE 40

Why Resistance? The lower bound

  • What sort of argument might yield a lower bound of resistance?
  • There is a natural way resistance comes up:

Ravg Tr L n where L is the graph Laplacian and L is the Moore-Penrose pseudonverse.

  • One can prove a lower bound by exhibiting w1

w2 and demonstrating that the expected (total variation) distance between the two distributions on k E outcomes is small.

10

slide-41
SLIDE 41

Why Resistance? The lower bound

  • What sort of argument might yield a lower bound of resistance?
  • There is a natural way resistance comes up:

Ravg = Tr(L†) n , where L is the graph Laplacian and L† is the Moore-Penrose pseudonverse.

  • One can prove a lower bound by exhibiting w1

w2 and demonstrating that the expected (total variation) distance between the two distributions on k E outcomes is small.

10

slide-42
SLIDE 42

Why Resistance? The lower bound

  • What sort of argument might yield a lower bound of resistance?
  • There is a natural way resistance comes up:

Ravg = Tr(L†) n , where L is the graph Laplacian and L† is the Moore-Penrose pseudonverse.

  • One can prove a lower bound by exhibiting w1 ̸= w2 and

demonstrating that the expected (total variation) distance between the two distributions on k|E| outcomes is small.

10

slide-43
SLIDE 43

Why Resistance? The lower bound - II

  • Choose

w =    1 . . . 1    + 1 √ k

n

i=2

Zi vi √λi , where vi are the eigenvectors the Laplacian of the comparison graph (normalized so that ||v||2 = 1), with λi the corresponding eigenvalues, and Zi ∈ {−1, 1} is a Bernoulli random variable.

  • Suppose the error in estimating each Zi is C, i.e., for any Zi, the

error in estimating Zi satisfies E Zi Zi

2

C Then for any W, E W w 2

2

w 2

2

C 1 k

n i 2 1 i

n CTr L n CRavg

  • Key lemma: C is constant.

11

slide-44
SLIDE 44

Why Resistance? The lower bound - II

  • Choose

w =    1 . . . 1    + 1 √ k

n

i=2

Zi vi √λi , where vi are the eigenvectors the Laplacian of the comparison graph (normalized so that ||v||2 = 1), with λi the corresponding eigenvalues, and Zi ∈ {−1, 1} is a Bernoulli random variable.

  • Suppose the error in estimating each Zi is C, i.e., for any

Zi, the error in estimating Zi satisfies E [( ˆ Zi − Zi )2] ≥ C Then for any ˆ W, E|| ˆ W − w||2

2

||w||2

2

≥ C(1/k) ∑n

i=2 1/λi

n = Ω ( CTr(L†) n ) = Ω (CRavg)

  • Key lemma: C is constant.

11

slide-45
SLIDE 45

Why Resistance? The lower bound - II

  • Choose

w =    1 . . . 1    + 1 √ k

n

i=2

Zi vi √λi , where vi are the eigenvectors the Laplacian of the comparison graph (normalized so that ||v||2 = 1), with λi the corresponding eigenvalues, and Zi ∈ {−1, 1} is a Bernoulli random variable.

  • Suppose the error in estimating each Zi is C, i.e., for any

Zi, the error in estimating Zi satisfies E [( ˆ Zi − Zi )2] ≥ C Then for any ˆ W, E|| ˆ W − w||2

2

||w||2

2

≥ C(1/k) ∑n

i=2 1/λi

n = Ω ( CTr(L†) n ) = Ω (CRavg)

  • Key lemma: C is constant.

11

slide-46
SLIDE 46

Simulations

The following figures show, respectively, evolution on the 2D grid (left, where resistances grows as O(log n)) and 3D grid (right, where resistance is constant).

12

slide-47
SLIDE 47

Conclusion and Future Work

  • Our results prove that the squared error decay is O(Ravg/k) for k

large enough. Simulations show that this actually seems to be true for all k.

  • Conjecture: Ravg is also the sample complexity of learning in the

Bradley-Terry-Luce model.

  • Simulations show that our method performs similarly to Markov

chain methods, suggesting that resistance is the right scaling for those methods as well.

  • Getting the correct scaling is still open, as the upper and lower

bounds do not match in factors of b as well as in the gap between maximum and average resistance.

13

slide-48
SLIDE 48

Conclusion and Future Work

  • Our results prove that the squared error decay is O(Ravg/k) for k

large enough. Simulations show that this actually seems to be true for all k.

  • Conjecture: Ravg is also the sample complexity of learning in the

Bradley-Terry-Luce model.

  • Simulations show that our method performs similarly to Markov

chain methods, suggesting that resistance is the right scaling for those methods as well.

  • Getting the correct scaling is still open, as the upper and lower

bounds do not match in factors of b as well as in the gap between maximum and average resistance.

13

slide-49
SLIDE 49

Conclusion and Future Work

  • Our results prove that the squared error decay is O(Ravg/k) for k

large enough. Simulations show that this actually seems to be true for all k.

  • Conjecture: Ravg is also the sample complexity of learning in the

Bradley-Terry-Luce model.

  • Simulations show that our method performs similarly to Markov

chain methods, suggesting that resistance is the right scaling for those methods as well.

  • Getting the correct scaling is still open, as the upper and lower

bounds do not match in factors of b as well as in the gap between maximum and average resistance.

13

slide-50
SLIDE 50

Conclusion and Future Work

  • Our results prove that the squared error decay is O(Ravg/k) for k

large enough. Simulations show that this actually seems to be true for all k.

  • Conjecture: Ravg is also the sample complexity of learning in the

Bradley-Terry-Luce model.

  • Simulations show that our method performs similarly to Markov

chain methods, suggesting that resistance is the right scaling for those methods as well.

  • Getting the correct scaling is still open, as the upper and lower

bounds do not match in factors of b as well as in the gap between maximum and average resistance.

13