Reconstructing Patterns of Information Diffusion from Incomplete - - PowerPoint PPT Presentation

reconstructing patterns of information diffusion from
SMART_READER_LITE
LIVE PREVIEW

Reconstructing Patterns of Information Diffusion from Incomplete - - PowerPoint PPT Presentation

Reconstructing Patterns of Information Diffusion from Incomplete Observations Sapienza University Flavio Chierichetti Cornell University Jon Kleinberg Carleton College David Liben-Nowell Internet Activism Very important phenomenon.


slide-1
SLIDE 1

Reconstructing Patterns of Information Diffusion from Incomplete Observations

Flavio Chierichetti Jon Kleinberg David Liben-Nowell Sapienza University Cornell University Carleton College

slide-2
SLIDE 2

Internet Activism

  • Very important phenomenon.
  • Incomplete Traces.

How to study partially-visible viral phenomena?

  • Chain Letter Petitions:

how to estimate the reach?

slide-3
SLIDE 3

NPR Chain Letter

PBS, NPR (National Public Radio), and the arts are facing major cutbacks in funding. In spite of the efforts of each station to reduce spending costs and streamline their services, the government

  • fficials believe that the funding currently going

to these programs is too large a portion of funding for something which is seen as "unworthwhile." [...] When this issue comes up in 1996, the funding will be determined for fiscal years 1996-1998. The only way that our representatives can be aware

  • f the base of support or PBS and funding for

these types of programs is by making our voices heard. Please add your name to this list if you believe in what we stand for. This list will be forwarded to the President of the United States, the Vice President of the United States, the House of Representatives and Congress. If you happen to be the 50th, 100th, 150th, etc. signer of this petition, please forward to: kubi7975@blue.univnorthco.edu . This way we can keep track of the lists and organize them. Forward this to everyone you know, and help us to keep these programs alive. Thank you.

  • 1. Elizabeth Weinert, student, University of Northern

Colorado, Greeley, Colorado.

  • 2. Robert M. Penn; San Francisco, CA
  • 3. Gregory S. Williamson, San Francisco, CA
  • 4. Daniel C. Knightly, Austin, TX
  • 5. Andrew H. Knightly, Los Angeles, CA
  • 6. Aaron C. Yeater, Somerville, MA
  • 7. Tobie M. Cornejo, Washington, DC
  • 8. John T. Mason, Dalton, MA
  • 9. Eric W. Fish, Williamstown, MA
  • 10. Courtney E. Estill, Hamilton College, NY
  • 11. Vanessa Moore, Northfield, MN
  • 12. Lynne Raschke, Haverford College, PA

(originally Minnesota)

  • 13. Deborah Bielak, Haverford, PA
  • 14. Morgan Lloyd, Haverford, PA 19041
  • 15. Galen Lloyd, Goucher College, MD
  • 16. Brian Eastwood, University of Vermont, VT
  • 17. Elif Batuman, Harvard University, MA
  • 18. Kohar Jones, Yale University, CT
  • 19. Claudia Brittenham, Yale University, CT
  • 20. Alexandra Block, Yale University, CT
  • 21. Susanna Chu, Yale University, CT
  • 22. Michelle Chen, Harvard University, MA
  • 23. Jessica Hammer, Harvard University, MA
  • 24. Ann Pettigrew, Haverford College, PA
  • 25. Kirstin Knox, Swarthmore College, PA
  • 26. Jason Adler, Swarthmore College, PA
  • 27. Daniel Gottlieb, Swarthmore College

(but truly from Lawrence, KS)

  • 28. Josh Feltman, Tufts University, MA
  • 29. Louise Forrest, Massachusetts Institute of

Technology, MA

  • 30. HongSup Park, Massachusetts Institute of

Technology, MA (originally from Portage, Wisconsin)

  • 31. Ana Sandoval,Massachusetts Institute of Technology

[...]

slide-4
SLIDE 4

NPR Chain Letter

PBS, NPR (National Public Radio), and the arts are facing major cutbacks in funding. In spite of the efforts of each station to reduce spending costs and streamline their services, the government

  • fficials believe that the funding currently going

to these programs is too large a portion of funding for something which is seen as "unworthwhile." [...] When this issue comes up in 1996, the funding will be determined for fiscal years 1996-1998. The only way that our representatives can be aware

  • f the base of support or PBS and funding for

these types of programs is by making our voices heard. Please add your name to this list if you believe in what we stand for. This list will be forwarded to the President of the United States, the Vice President of the United States, the House of Representatives and Congress. If you happen to be the 50th, 100th, 150th, etc. signer of this petition, please forward to: kubi7975@blue.univnorthco.edu . This way we can keep track of the lists and organize them. Forward this to everyone you know, and help us to keep these programs alive. Thank you.

  • 1. Elizabeth Weinert, student, University of Northern

Colorado, Greeley, Colorado.

  • 2. Robert M. Penn; San Francisco, CA
  • 3. Gregory S. Williamson, San Francisco, CA
  • 4. Daniel C. Knightly, Austin, TX
  • 5. Andrew H. Knightly, Los Angeles, CA
  • 6. Aaron C. Yeater, Somerville, MA
  • 7. Tobie M. Cornejo, Washington, DC
  • 8. John T. Mason, Dalton, MA
  • 9. Eric W. Fish, Williamstown, MA
  • 10. Courtney E. Estill, Hamilton College, NY
  • 11. Vanessa Moore, Northfield, MN
  • 12. Lynne Raschke, Haverford College, PA

(originally Minnesota)

  • 13. Deborah Bielak, Haverford, PA
  • 14. Morgan Lloyd, Haverford, PA 19041
  • 15. Galen Lloyd, Goucher College, MD
  • 16. Brian Eastwood, University of Vermont, VT
  • 17. Elif Batuman, Harvard University, MA
  • 18. Kohar Jones, Yale University, CT
  • 19. Claudia Brittenham, Yale University, CT
  • 20. Alexandra Block, Yale University, CT
  • 21. Susanna Chu, Yale University, CT
  • 22. Michelle Chen, Harvard University, MA
  • 23. Jessica Hammer, Harvard University, MA
  • 24. Ann Pettigrew, Haverford College, PA
  • 25. Kirstin Knox, Swarthmore College, PA
  • 26. Jason Adler, Swarthmore College, PA
  • 27. Daniel Gottlieb, Swarthmore College

(but truly from Lawrence, KS)

  • 28. Josh Feltman, Tufts University, MA
  • 29. Louise Forrest, Massachusetts Institute of

Technology, MA

  • 30. HongSup Park, Massachusetts Institute of

Technology, MA (originally from Portage, Wisconsin)

  • 31. Ana Sandoval,Massachusetts Institute of Technology

[...]

slide-5
SLIDE 5

Chain Letters

Aaron

slide-6
SLIDE 6

Chain Letters

Aaron Betty Charles David

slide-7
SLIDE 7

Chain Letters

Aaron Betty Charles David Earl Fran

slide-8
SLIDE 8

Chain Letters

Aaron Betty Charles David Earl Fran George Hilary

slide-9
SLIDE 9

Chain Letters

Aaron Betty Charles David Earl Fran George Hilary

A B E H

Dear all, an important cause demands your attention. […] If you care about this, add your name and forward this letter. […] The signers, Aaron Betty Earl Hilary

slide-10
SLIDE 10

Chain Letters

Aaron Betty Charles David Earl Fran George Hilary

A B E H A D

slide-11
SLIDE 11

Chain Letters

Aaron Betty Charles David Earl Fran George Hilary

slide-12
SLIDE 12

Chain Letters

Aaron Betty Charles David Earl Fran George Hilary

G

slide-13
SLIDE 13

George’s Blog

Chain Letters

Aaron Betty Charles David Earl Fran Hilary

Here is something that I sent to my friends today:

Dear all, an important cause demands your attention. […] If you care about this, add your name and forward this letter. […] The signers, Aaron David George

G

George

slide-14
SLIDE 14

George’s Blog

Chain Letters

Aaron Betty Charles David Earl Fran Hilary

G

George

Here is something that I sent to my friends today:

Dear all, an important cause demands your attention. […] If you care about this, add your name and forward this letter. […] The signers, Aaron David George

slide-15
SLIDE 15

Chain Letters

Aaron Betty Charles David Earl Fran Hilary George

slide-16
SLIDE 16

Chain Letters

Aaron Betty Charles David Earl Fran Hilary George

slide-17
SLIDE 17

Chain Letters

Aaron Betty Charles David Earl Fran Hilary George George and Hilary, by exposing their emails, revealed a subtree of the Chain Letter tree.

slide-18
SLIDE 18

Real-World Chain Letters’ Tree

  • [Liben-Nowell, Kleinberg, PNAS’08], mined
  • web-accessible mailing-lists,
  • blog posts.
  • They obtained some “exposed” nodes of

two Chain Letters’ trees, and

  • they produced two “revealed” trees.
slide-19
SLIDE 19

NPR revealed tree

Liben-Nowell, Kleinberg, PNAS’08

slide-20
SLIDE 20

NPR revealed tree

Liben-Nowell, Kleinberg, PNAS’08

slide-21
SLIDE 21

NPR revealed tree

Liben-Nowell, Kleinberg, PNAS’08

slide-22
SLIDE 22

NPR revealed tree

Liben-Nowell, Kleinberg, PNAS’08

Non-exponential growth 13K nodes

slide-23
SLIDE 23

Iraq Chain Letter

Dear all: The US Congress has just authorized the President

  • f the US to go to war against Iraq. The UN is

gathering signatures in an effort to avoid this tragic world event. Please consider this an urgent request: UN Petition for Peace - Stand for Peace. Islam is not the Enemy. War is NOT the Answer. Today we are at a point of imbalance in the world and are moving toward what may be the beginning

  • f a THIRD WORLD WAR.

Please COPY (rather than Forward) this e-mail in a new message, sign at the end of the list, and send it to all the people whom you know. If you receive this list with more than 500 names signed, please send a copy of the message to: usa@un.int president@whitehouse.gov Even if you decide not to sign, please consider forwarding the petition

  • n instead of

deleting it. 1) Suzanne Dathe, Grenoble, France 2) Laurence COMPARAT, Grenoble, France 3) Philippe MOTTE, Grenoble, France 4) Jok FERRAND, Mont St. Martin, France 5) Emmanuelle PIGNOL, St Martin d'Heres, FRANCE 6) Marie GAUTHIER, Grenoble, FRANCE 7) Laurent VESCALO, Grenoble, FRANCE 8) Mathieu MOY, St Egreve, FRANCE 9) Bernard BLANCHET, Mont St Martin,FRANCE 10) Tassadite FAVRIE, Grenoble, FRANCE 11) Loic GODARD, St Ismier, FRANCE 12) Benedicte PASCAL, Grenoble, FRANCE 13) Khedaidja BENATIA, Grenoble, FRANCE 14) Marie-Therese LLORET, Grenoble,FRANCE 15) Benoit THEAU, Poitiers, FRANCE 16) Bruno CONSTANTIN, Poitiers, FRANCE 17) Christian COGNARD, Poitiers, FRANCE 18) Robert GARDETTE, Paris, FRANCE 19) Claude CHEVILLARD, Montpellier, FRANCE 20) Gilles FREISS, Montpellier, FRANCE 21) Patrick AUGEREAU, Montpellier, FRANCE 22) Jean IMBER! T, Marseille, FRANCE 23) Jean-Claude MURAT, Toulouse, France 24) Anna BASSOLS, Barcelona, Catalonia 25) Mireia DUNACH, Barcelona, Catalonia 26) Michel VILLAZ, Grenoble, France 27) Pages Frederique, Dijon, France 28) Rodolphe FISCHMEISTER,Chatenay-Malabry, France 29) Francois BOUTEAU, Paris, France 30) Patrick PETER, Paris, France 31) Lorenza RADICI, Paris, France 32) Monika Siegenthaler, Bern, Switzerland 33) Mark Philp,Glasgow,Scotland 34) Tomas Andersson, Stockholm, Sweden 35) Jonas Eriksson, Stockholm, Sweden 36) Karin Eriksson, Stockholm, Sweden ...

slide-24
SLIDE 24

Iraq Chain Letter

Dear all: The US Congress has just authorized the President

  • f the US to go to war against Iraq. The UN is

gathering signatures in an effort to avoid this tragic world event. Please consider this an urgent request: UN Petition for Peace - Stand for Peace. Islam is not the Enemy. War is NOT the Answer. Today we are at a point of imbalance in the world and are moving toward what may be the beginning

  • f a THIRD WORLD WAR.

Please COPY (rather than Forward) this e-mail in a new message, sign at the end of the list, and send it to all the people whom you know. If you receive this list with more than 500 names signed, please send a copy of the message to: usa@un.int president@whitehouse.gov Even if you decide not to sign, please consider forwarding the petition

  • n instead of

deleting it. 1) Suzanne Dathe, Grenoble, France 2) Laurence COMPARAT, Grenoble, France 3) Philippe MOTTE, Grenoble, France 4) Jok FERRAND, Mont St. Martin, France 5) Emmanuelle PIGNOL, St Martin d'Heres, FRANCE 6) Marie GAUTHIER, Grenoble, FRANCE 7) Laurent VESCALO, Grenoble, FRANCE 8) Mathieu MOY, St Egreve, FRANCE 9) Bernard BLANCHET, Mont St Martin,FRANCE 10) Tassadite FAVRIE, Grenoble, FRANCE 11) Loic GODARD, St Ismier, FRANCE 12) Benedicte PASCAL, Grenoble, FRANCE 13) Khedaidja BENATIA, Grenoble, FRANCE 14) Marie-Therese LLORET, Grenoble,FRANCE 15) Benoit THEAU, Poitiers, FRANCE 16) Bruno CONSTANTIN, Poitiers, FRANCE 17) Christian COGNARD, Poitiers, FRANCE 18) Robert GARDETTE, Paris, FRANCE 19) Claude CHEVILLARD, Montpellier, FRANCE 20) Gilles FREISS, Montpellier, FRANCE 21) Patrick AUGEREAU, Montpellier, FRANCE 22) Jean IMBER! T, Marseille, FRANCE 23) Jean-Claude MURAT, Toulouse, France 24) Anna BASSOLS, Barcelona, Catalonia 25) Mireia DUNACH, Barcelona, Catalonia 26) Michel VILLAZ, Grenoble, France 27) Pages Frederique, Dijon, France 28) Rodolphe FISCHMEISTER,Chatenay-Malabry, France 29) Francois BOUTEAU, Paris, France 30) Patrick PETER, Paris, France 31) Lorenza RADICI, Paris, France 32) Monika Siegenthaler, Bern, Switzerland 33) Mark Philp,Glasgow,Scotland 34) Tomas Andersson, Stockholm, Sweden 35) Jonas Eriksson, Stockholm, Sweden 36) Karin Eriksson, Stockholm, Sweden ...

slide-25
SLIDE 25

18,119 nodes

IRAQ revealed tree

Liben-Nowell, Kleinberg, PNAS’08

slide-26
SLIDE 26

18,119 nodes 17,079 nodes with one child (94%)

IRAQ revealed tree

Liben-Nowell, Kleinberg, PNAS’08

slide-27
SLIDE 27

18,119 nodes 17,079 nodes with one child (94%) 00,620 exposed nodes 00,557 (exposed) leaves

IRAQ revealed tree

Liben-Nowell, Kleinberg, PNAS’08

slide-28
SLIDE 28

18,119 nodes 17,079 nodes with one child (94%) 00,620 exposed nodes 00,557 (exposed) leaves

IRAQ revealed tree

Liben-Nowell, Kleinberg, PNAS’08

Why is this fraction so high?

slide-29
SLIDE 29

18,119 nodes 17,079 nodes with one child (94%) 00,620 exposed nodes 00,557 (exposed) leaves

IRAQ revealed tree

Liben-Nowell, Kleinberg, PNAS’08

Why is this fraction so high? What can we infer about the

  • riginal, unknown, Chain Letter Tree?
slide-30
SLIDE 30

Tree-Revealing Process

Liben-Nowell, Kleinberg, PNAS’08

Aaron Betty Charles David Earl Fran George Hilary

slide-31
SLIDE 31

Aaron Betty Charles David Earl Fran George Hilary Each node is exposed independently with prob. δ > 0

Tree-Revealing Process

Liben-Nowell, Kleinberg, PNAS’08

slide-32
SLIDE 32

Aaron Betty Charles David Earl Fran George Hilary Each node is exposed independently with prob. δ > 0

Tree-Revealing Process

Liben-Nowell, Kleinberg, PNAS’08

slide-33
SLIDE 33

Aaron Betty Charles David Earl Fran George Hilary Each node is exposed independently with prob. δ > 0

Tree-Revealing Process

Liben-Nowell, Kleinberg, PNAS’08

slide-34
SLIDE 34

Aaron Betty Charles David Earl Fran George Hilary Each node is exposed independently with prob. δ > 0

Tree-Revealing Process

Liben-Nowell, Kleinberg, PNAS’08

slide-35
SLIDE 35

Aaron Betty David Earl Fran Hilary George Ancestors of exposed nodes are revealed Charles

Tree-Revealing Process

Liben-Nowell, Kleinberg, PNAS’08

slide-36
SLIDE 36

Aaron Betty Charles David Earl Fran Hilary George Ancestors of exposed nodes are revealed

Tree-Revealing Process

Liben-Nowell, Kleinberg, PNAS’08

slide-37
SLIDE 37

Previous Work

  • Golub, Jackson, PNAS’10 perform simulations,
  • using branching process trees near the

critical threshold as the Chain Letter Trees,

  • and exposing nodes as in

Kleinberg, Liben-Nowell, PNAS’08.

  • They observe that the revealed tree has a

high fraction of nodes with only one child (and some other properties).

slide-38
SLIDE 38

Our Contribution

  • Our 1st result, informally, states that the

tree-revealing process, is enough to explain the high fraction of single-child nodes, assuming only a degree bound on the unknown chain letter tree.

slide-39
SLIDE 39

Our Contribution

  • Our 1st result, informally, states that the

tree-revealing process, is enough to explain the high fraction of single-child nodes assuming only a degree bound on the unknown chain letter tree.

slide-40
SLIDE 40

Revealed vs. Unknown

We see a “revealed” tree...

Aaron Betty David Earl Hilary George

slide-41
SLIDE 41

Aaron Betty David Earl Hilary George

We see a “revealed” tree...

Aaron Betty David Earl Charles

...we would like to study the “unknown” tree!

Kurt Hilary George Fran Ian Jason Larry

Revealed vs. Unknown

slide-42
SLIDE 42

Aaron Betty David Earl Hilary George

We see a “revealed” tree...

Aaron Betty David Earl Charles

...we would like to study the “unknown” tree!

Kurt Hilary George Fran Ian Jason Larry

Revealed vs. Unknown

Size? Width? Height? Degree Distribution? ...

slide-43
SLIDE 43

Size? Width? Height? Degree Distribution? ...

Aaron Betty David Earl Hilary George

We see a “revealed” tree...

Aaron Betty David Earl Charles

...we would like to study the “unknown” tree!

Kurt Hilary George Fran Ian Jason Larry

Revealed vs. Unknown

slide-44
SLIDE 44

Our Contribution

  • Our 2nd result, informally, states that (under

reasonable assumptions) it is possible to estimate the size of the unknown chain letter tree with a small error, with high probability.

slide-45
SLIDE 45

Our Contribution

  • Our 2nd result, informally, states that (under

reasonable assumptions) it is possible to estimate the size of the unknown chain letter tree with a small error, with high probability. Observe that we do not know the exposing probability δ

slide-46
SLIDE 46

Our Contribution

  • Our 2nd result, informally, states that (under

reasonable assumptions) it is possible to estimate the size of the unknown chain letter tree with a small error, with high probability. We use this theorem to estimate that ~ 173k people that signed the IRAQ chain letter

This estimate is backed by a probability bound (on the probability space induced by the revealing process)

slide-47
SLIDE 47

Our Contribution

  • Our 2nd result, informally, states that (under

reasonable assumptions) it is possible to estimate the size of the unknown chain letter tree with a small error, with high probability. We use this theorem to estimate that ~ 173k people that signed the IRAQ chain letter The chain letter generated ~ 3.5M emails

slide-48
SLIDE 48

Single-Child Fraction

  • Nodes are exposed with probability
  • We assume that the unknown tree’s

maximum degree is at most

δ > 0 k

slide-49
SLIDE 49

Single-Child Fraction

We partition the tree into subforests,

slide-50
SLIDE 50

Single-Child Fraction

We partition the tree into subforests,

slide-51
SLIDE 51

Single-Child Fraction

We partition the tree into subforests,

slide-52
SLIDE 52

We partition the tree into subforests, in such a way that each subforest has nodes and the median height in the subforest is .

Single-Child Fraction

' δ−1 Ω

  • logk−1 δ−1
slide-53
SLIDE 53

We partition the tree into subforests, in such a way that each subforest has nodes and the median height in the subforest is .

Single-Child Fraction

F ' δ−1 Ω

  • logk−1 δ−1
slide-54
SLIDE 54

Single-Child Fraction

F

We partition the tree into subforests, in such a way that each subforest has nodes and the median height in the subforest is .

' δ−1 Ω

  • logk−1 δ−1
slide-55
SLIDE 55

Single-Child Fraction

F

We partition the tree into subforests, in such a way that each subforest has nodes and the median height in the subforest is .

  • logk−1 δ−1

' δ−1 Ω

  • logk−1 δ−1
slide-56
SLIDE 56

Single-Child Fraction

F

We partition the tree into subforests, in such a way that each subforest has nodes and the median height in the subforest is .

δ−1 2 δ−1 2

  • logk−1 δ−1

' δ−1 Ω

  • logk−1 δ−1
slide-57
SLIDE 57

We partition the tree into subforests, in such a way that each subforest has nodes and the median height in the subforest is .

Single-Child Fraction

δ−1 2 δ−1 2

  • logk−1 δ−1

F

Pr[some node is exposed in F’s lower half] = Θ(1)

' δ−1 Ω

  • logk−1 δ−1
slide-58
SLIDE 58

We partition the tree into subforests, in such a way that each subforest has nodes and the median height in the subforest is .

Single-Child Fraction

δ−1 2 δ−1 2

  • logk−1 δ−1

F

Pr[some node is exposed in F’s lower half] = Θ(1)

' δ−1 Ω

  • logk−1 δ−1
slide-59
SLIDE 59

We partition the tree into subforests, in such a way that each subforest has nodes and the median height in the subforest is .

Single-Child Fraction

δ−1 2 δ−1 2

  • logk−1 δ−1

F

Pr[some node is exposed in F’s lower half] = Θ(1)

If this happens, nodes will be revealed in .

Ω(logk−1 δ−1) F

' δ−1 Ω

  • logk−1 δ−1
slide-60
SLIDE 60

Single-Child Fraction

F

We partition the tree into subforests, in such a way that each subforest has nodes and the median height in the subforest is .

Pr[some node is exposed in F’s lower half] = Θ(1)

If this happens, nodes will be revealed in .

Ω(logk−1 δ−1) F

' δ−1 Ω

  • logk−1 δ−1
slide-61
SLIDE 61

Single-Child Fraction

F F

We partition the tree into subforests, in such a way that each subforest has nodes and the median height in the subforest is .

Pr[some node is exposed in F’s lower half] = Θ(1)

If this happens, nodes will be revealed in .

Ω(logk−1 δ−1) F

# of forests ' n · δ ' δ−1 Ω

  • logk−1 δ−1
slide-62
SLIDE 62

Single-Child Fraction

F F

Pr[some node is exposed in F’s lower half] = Θ(1)

If this happens, nodes will be revealed in .

Ω(logk−1 δ−1) F

# of forests ' n · δ Pr[Ω(n · δ · logk−1 δ−1) nodes will be revealed] = 1 − o(1)

slide-63
SLIDE 63

Single-Child Fraction

Pr[Ω(n · δ · logk−1 δ−1) nodes will be revealed] = 1 − o(1)

slide-64
SLIDE 64

Single-Child Fraction

Pr[Ω(n · δ · logk−1 δ−1) nodes will be revealed] = 1 − o(1) Pr[at most 2 · n · δ nodes will be exposed] = 1 − o(1)

slide-65
SLIDE 65

Single-Child Fraction

Pr[Ω(n · δ · logk−1 δ−1) nodes will be revealed] = 1 − o(1) Pr[at most 2 · n · δ nodes will be exposed] = 1 − o(1)

Each leaf in the revealed tree is an exposed node.

slide-66
SLIDE 66

Single-Child Fraction

Pr[Ω(n · δ · logk−1 δ−1) nodes will be revealed] = 1 − o(1) Pr[at most 2 · n · δ nodes will be exposed] = 1 − o(1)

Each leaf in the revealed tree is an exposed node.

Pr[the revealed tree will have at most 2 · n · δ leaves] = 1 − o(1)

slide-67
SLIDE 67

Single-Child Fraction

Pr[Ω(n · δ · logk−1 δ−1) nodes will be revealed] = 1 − o(1) Pr[at most 2 · n · δ nodes will be exposed] = 1 − o(1)

Each leaf in the revealed tree is an exposed node.

Pr[the revealed tree will have at most 2 · n · δ leaves] = 1 − o(1)

In an arbitrary tree, the number of internal nodes with more than one child is upper-bounded by the number of leaves.

slide-68
SLIDE 68

Single-Child Fraction

Pr[Ω(n · δ · logk−1 δ−1) nodes will be revealed] = 1 − o(1) Pr[at most 2 · n · δ nodes will be exposed] = 1 − o(1)

Each leaf in the revealed tree is an exposed node.

Pr[the revealed tree will have at most 2 · n · δ leaves] = 1 − o(1)

In an arbitrary tree, the number of internal nodes with more than one child is upper-bounded by the number of leaves.

Pr[the revealed tree has ≤ 4nδ non-single-child nodes] = 1 − o(1)

slide-69
SLIDE 69

Single-Child Fraction

Pr[Ω(n · δ · logk−1 δ−1) nodes will be revealed] = 1 − o(1) Pr[the revealed tree has ≤ 4nδ non-single-child nodes] = 1 − o(1)

slide-70
SLIDE 70

Single-Child Fraction

Pr[Ω(n · δ · logk−1 δ−1) nodes will be revealed] = 1 − o(1) Pr[the revealed tree has ≤ 4nδ non-single-child nodes] = 1 − o(1)

slide-71
SLIDE 71

Single-Child Fraction

Pr[Ω(n · δ · logk−1 δ−1) nodes will be revealed] = 1 − o(1) Pr[the revealed tree has ≤ 4nδ non-single-child nodes] = 1 − o(1) n · δ · logk−1 δ−1

slide-72
SLIDE 72

Single-Child Fraction

Pr[Ω(n · δ · logk−1 δ−1) nodes will be revealed] = 1 − o(1) Pr[the revealed tree has ≤ 4nδ non-single-child nodes] = 1 − o(1) n · δ · logk−1 δ−1

slide-73
SLIDE 73

Single-Child Fraction

Pr[Ω(n · δ · logk−1 δ−1) nodes will be revealed] = 1 − o(1) Pr[the revealed tree has ≤ 4nδ non-single-child nodes] = 1 − o(1) n · δ · logk−1 δ−1 ⌧ n · δ

slide-74
SLIDE 74

Single-Child Fraction

Pr[Ω(n · δ · logk−1 δ−1) nodes will be revealed] = 1 − o(1) Pr[the revealed tree has ≤ 4nδ non-single-child nodes] = 1 − o(1) n · δ · logk−1 δ−1 ⌧ n · δ

1 logk−1 δ−1

A fraction of the set.

slide-75
SLIDE 75

Single-Child Fraction

Pr[Ω(n · δ · logk−1 δ−1) nodes will be revealed] = 1 − o(1) Pr[the revealed tree has ≤ 4nδ non-single-child nodes] = 1 − o(1) Pr [the fraction of single-child nodes in the revealed tree is ≥ 1 − O ✓ 1 logk−1 δ−1 ◆ = 1 − o(1)

slide-76
SLIDE 76

Single-Child Fraction

Pr[Ω(n · δ · logk−1 δ−1) nodes will be revealed] = 1 − o(1) Pr[the revealed tree has ≤ 4nδ non-single-child nodes] = 1 − o(1)

The high single-child fraction can be explained by assuming just a degree bound on the unknown tree

Pr [the fraction of single-child nodes in the revealed tree is ≥ 1 − O ✓ 1 logk−1 δ−1 ◆ = 1 − o(1)

slide-77
SLIDE 77

How to guess the size of the unknown tree?

Aaron Betty David Earl Hilary George

Revealed Tree

Aaron Betty David Earl Charles

Unknown Tree

Kurt Hilary George Fran Ian Jason Larry

Number of Signers

slide-78
SLIDE 78

Unknown Tree Exposure

slide-79
SLIDE 79

?

Unknown Tree Exposure

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

slide-80
SLIDE 80

?

Unknown Tree Exposure

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

slide-81
SLIDE 81

?

Unknown Tree Exposure

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

slide-82
SLIDE 82

Unknown Tree Exposure

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

slide-83
SLIDE 83

Unknown Tree Exposure

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

slide-84
SLIDE 84

Unknown Tree Exposure

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

slide-85
SLIDE 85

Unknown Tree Exposure

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

slide-86
SLIDE 86

Unknown Tree Exposure

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

slide-87
SLIDE 87

Unknown Tree Exposure

? ? ? ? ? ? ? ? ? ?

slide-88
SLIDE 88

Revealed Tree

? ? ? ? ? ? ? ? ? ?

slide-89
SLIDE 89

Revealed Tree

? ? ? ? ? ? ? ? ? ?

Nodes exposures are IID here!

slide-90
SLIDE 90

Size Estimation

? ? ? ? ? ? ? ? ? ?

Nodes exposures are IID here!

  • 1. Estimate

δ

slide-91
SLIDE 91

Size Estimation

Nodes exposures are IID here!

  • 1. Estimate

δ

slide-92
SLIDE 92

Size Estimation

Nodes exposures are IID here!

δ ' 3 10

  • 1. Estimate

δ

slide-93
SLIDE 93

Size Estimation

Nodes exposures are IID here!

  • 2. Estimate using the number of

exposed nodes in the revealed tree

δ ' 3 10

n · δ

slide-94
SLIDE 94

Size Estimation

Nodes exposures are IID here!

δ ' 3 10 n · δ ' 7

  • 2. Estimate using the number of

exposed nodes in the revealed tree

n · δ

slide-95
SLIDE 95

Size Estimation

Nodes exposures are IID here!

δ ' 3 10 n · δ ' 7 n ' 23.¯ 3

  • 3. Take the ratio
slide-96
SLIDE 96

Size Estimation

Nodes exposures are IID here!

δ ' 3 10 n · δ ' 7 n ' 23.¯ 3

What can go wrong?

slide-97
SLIDE 97

Size Estimation

Nodes exposures are IID here!

δ ' 3 10 n · δ ' 7 n ' 23.¯ 3

The “yellow area” could contain too few nodes for the estimation of to be successful.

δ

slide-98
SLIDE 98

Size Estimation

The “yellow area” could contain too few nodes for the estimation of to be successful.

δ

?

slide-99
SLIDE 99

Size Estimation

The “yellow area” could contain too few nodes for the estimation of to be successful.

δ

?

slide-100
SLIDE 100

Theorem

  • The previous algorithm can guess the size

with high probability if is the maximum number of children in

the unknown tree, is the exposing probability.

  • No algorithm can do it otherwise.

n > ˜ Ω

  • max
  • δ−2, δ−1 · k
  • k

δ

slide-101
SLIDE 101

Theorem

k < ˜ O √n

  • δ > ˜

Ω r 1 n !

satisfy the requirement

n > ˜ Ω

  • max
  • δ−2, δ−1 · k
  • k

δ

  • The previous algorithm can guess the size

with high probability if is the maximum number of children in

the unknown tree, is the exposing probability.

  • No algorithm can do it otherwise.
slide-102
SLIDE 102

Theorem

  • The previous algorithm can guess the size

with high probability if is the maximum number of children in

the unknown tree, is the exposing probability.

  • No algorithm can do it if is smaller.

n

n > ˜ Ω

  • max
  • δ−2, δ−1 · k
  • k

δ

slide-103
SLIDE 103

IRAQ Tree Size

  • We refined our asymptotic theorem for

the IRAQ revealed tree (18k nodes)

  • Assuming the tree-revealing model,

we estimate that the number of signers of the IRAQ petition is within a factor of 2 of 173k with probability

slide-104
SLIDE 104

IRAQ Tree Size

  • We refined our asymptotic theorem for

the IRAQ revealed tree (18k nodes)

  • Assuming the tree-revealing model,

we estimate that the number of signers of the IRAQ petition is within a factor of 2 of 173k with probability ≥ 95%

slide-105
SLIDE 105

Conclusion

  • We gave a mathematical explanation of

some odd properties observed in real-world revealed trees,

  • we used the available revealed trees to guess

properties of unknown chain-letter trees.

  • We applied our technique to a real-world

dataset, giving the first estimate of the number of signers of the IRAQ chain letter.

http://petitions.cs.cornell.edu/