Principles of Software Construction: Objects, Design, and - - PowerPoint PPT Presentation

principles of software construction objects design and
SMART_READER_LITE
LIVE PREVIEW

Principles of Software Construction: Objects, Design, and - - PowerPoint PPT Presentation

Principles of Software Construction: Objects, Design, and Concurrency Case Studies in Data Consistency and Google's PageRank Spring 2014 Charlie Garrod Christian Kstner School of Computer Science Administrivia Homework


slide-1
SLIDE 1

¡ ¡ ¡

Spring ¡2014 ¡

School of Computer Science

Principles of Software Construction: Objects, Design, and Concurrency Case Studies in Data Consistency and Google's PageRank

Charlie Garrod Christian Kästner

slide-2
SLIDE 2

2

15-­‑214

Administrivia

  • Homework 6, homework 6, homework 6…

§ Due Thursday, 11:59 p.m. § May turn in as late as Saturday, 11:59 p.m.

  • Final exam review session

§ Saturday, May 10th, 6 – 8 p.m., PH 100

  • Final exam

§ Monday, May 12th, 5:30 – 8:30 p.m., UC McConomy

  • Faculty course evaluations

§ https://cmu.smartevals.com/

  • TA feedback(?)

§ Email from Greg Kesden coming soon(?)

slide-3
SLIDE 3

3

15-­‑214

Last time…

slide-4
SLIDE 4

4

15-­‑214

Data consistency

  • Suppose D is the database for some application and

ϕ is a function from database states to {true, false}

§ We call ϕ an integrity constraint for the application if ϕ(D) is

true if the state D is "good"

§ We say a database state D is consistent if ϕ(D) is true for

all integrity constraints ϕ

§ We say D is inconsistent if ϕ(D) is false for any integrity

constraint ϕ

  • Transaction ACID properties:

§ Atomicity:

All or nothing

§ Consistency:

Application-dependent as before

§ Isolation:

Each transaction runs as if alone

§ Durability:

Database will not abort or undo work of a transaction after it confirms the commit

slide-5
SLIDE 5

5

15-­‑214

The CAP theorem for distributed systems

  • For any distributed system you want…

§ Consistency § Availability § tolerance of network Partitions

  • …but you can support at most two of the three
slide-6
SLIDE 6

6

15-­‑214

Today: Case study in consistency, and PageRank

  • Google's PageRank algorithm
  • Ruminations on data consistency
slide-7
SLIDE 7

7

15-­‑214

A "university" search, circa 1997

From Page et al, “The PageRank Citation Ranking: Bringing Order to the Web”

slide-8
SLIDE 8

8

15-­‑214

<TITLE>Carnegie Mellon University - Computing Services

  • Network Group</TITLE>

<CENTER><IMG ALT="Carnegie Mellon University - Computing Services - Network Group“ SRC="http:/icons/campnet.jpg"></CENTER><P> <H2>Departments</H2> <DL> <DD> <IMG SRC="http://www.net.cmu.edu/icons/ greenball.gif"> <A HREF="http://www.net.cmu.edu/ datacomm/home.html"> <B> Data Communications</B></A> …

Traditional information retrieval

  • 1997’s http://www.net.cmu.edu:
slide-9
SLIDE 9

9

15-­‑214

Improving IR with citation counts

  • If a page is important, other pages link to it
slide-10
SLIDE 10

10

15-­‑214

PageRank: weighted citations

  • If a page is important, other important pages link to

it

slide-11
SLIDE 11

11

15-­‑214

PageRank: weighted citations

  • If a page is important, other important pages link to

it

§ e.g., 6 4 3 1 v1 v2 v3 v4

slide-12
SLIDE 12

12

15-­‑214

PageRank: weighted citations

  • If a page is important, other important pages link to

it

§ e.g., 6/14 4/14 3/14 1/14 v1 v2 v3 v4

slide-13
SLIDE 13

13

15-­‑214

PageRank: weighted citations

  • If a page is important, other important pages link to

it

§ e.g., .4286 .2857 .2143 .0713 v1 v2 v3 v4

slide-14
SLIDE 14

14

15-­‑214

PageRank: weighted citations

  • If a page is important, other important pages link to

it

§ Is this well-defined? § How do we compute it? § How do we compute it efficiently?

slide-15
SLIDE 15

15

15-­‑214

The WWW as a graph as a matrix

1 1/2 1/2 1 1/3 1/3 1/3

v1 v2 v3 v4

W

1/2 1/2 1/3 1/3 1/3 1 1

slide-16
SLIDE 16

16

15-­‑214

The WWW as a graph as a matrix

1 1/2 1/2 1 1/3 1/3 1/3

  • PageRanks R = [r1, r2, … rn] solve the linear

equation R = R * W

§ R is an eigenvector of the Web

W

v1 v2 v3 v4

1/2 1/2 1/3 1/3 1/3 1 1

slide-17
SLIDE 17

17

15-­‑214

The power method

  • (under some conditions) To find an eigenvector v
  • f a matrix M

§ Start with some approximation of v: v0 § Compute repeatedly:

slide-18
SLIDE 18

18

15-­‑214

  • Assign some initial PageRank R
  • While R hasn't converged, compute “next”

PageRanks from the previous PageRanks

The power method for PageRank

PageRank(G,delta) ¡ ¡ ¡ ¡ ¡Initialize ¡R ¡= ¡something, ¡R’ ¡= ¡0 ¡ ¡ ¡ ¡ ¡while ¡(R ¡– ¡R’ ¡> ¡delta) ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡R’ ¡= ¡R ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡R ¡= ¡0 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡for ¡each ¡edge ¡(u,v) ¡in ¡G ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡R[v] ¡+= ¡(R’[u] ¡/ ¡out-­‑deg(u)) ¡

slide-19
SLIDE 19

19

15-­‑214

A PageRank example

v1 v2 v3 v4

1/2 1/2 1/3 1/3 1/3 1 1

slide-20
SLIDE 20

20

15-­‑214

Convergence of the power method

Theorem: For any initial PageRanks summing to 1, the power method will converge to a well-defined, unique solution if the transition matrix W is stochastic, aperiodic, and irreducible

slide-21
SLIDE 21

21

15-­‑214

A stochastic transition matrix

  • A transition matrix is stochastic if all rows sum to

1 1 1/2 1/2 1 1/3 1/3 1/3

W

v1 v2 v3 v4

1/2 1/2 1/3 1/3 1/3 1 1

slide-22
SLIDE 22

22

15-­‑214

A stochastic transition matrix

  • A transition matrix is stochastic if all rows sum to

1 1/2 1/2 1 1/3 1/3 1/3

W

v1 v2 v3 v4

1/2 1/2 1/3 1/3 1/3 1

slide-23
SLIDE 23

23

15-­‑214

A stochastic transition matrix

  • A transition matrix is stochastic if all rows sum to

1 1/3 1/3 1/3 1/2 1/2 1 1/3 1/3 1/3

W

v1 v2 v3 v4

1/2 1/2 1/3 1/3 1/3 1

slide-24
SLIDE 24

24

15-­‑214

An aperiodic transition matrix

  • A transition matrix is periodic if there is an integer

k > 1 such that the interval between visits of two vertices is always a multiple of k

v1 v2

1 1

v3

1

slide-25
SLIDE 25

25

15-­‑214

An aperiodic transition matrix

  • A transition matrix is periodic if there is an integer

k > 1 such that the interval between visits of a vertex is always a multiple of k

v1 v2

1- E

v3

1- E 1- E E E E

slide-26
SLIDE 26

26

15-­‑214

An irreducible transition matrix

  • The transition matrix is irreducible if it’s possible

to (eventually) reach each state from any other state

v1 v2 v3 v4

slide-27
SLIDE 27

27

15-­‑214

An irreducible transition matrix

  • The transition matrix is irreducible if it’s possible

to (eventually) reach each state from any other state

v1 v2 v3 v4

slide-28
SLIDE 28

28

15-­‑214

Computing PageRank efficiently

  • Can keep Web graph on disk

§ PageRanks in RAM § Do not store modifications that made W stochastic,

aperiodic, and irreducible

§ Use smart initial PageRanks

  • Can partition Web graph between computers
slide-29
SLIDE 29

29

15-­‑214

Aside: Problems with PageRank

slide-30
SLIDE 30

30

15-­‑214

Problem with PageRank computation…

  • In spring 2000, Google's web-crawling system

failed too frequently to update their web index

§ Their solution: Google File System and MapReduce

slide-31
SLIDE 31

31

15-­‑214

Problem with PageRank computation…

  • In spring 2000, Google's web-crawling system

failed too frequently to update their web index

§ Their solution: Google File System and MapReduce

  • How bad is this web service outage?

§ …in terms of data consistency

slide-32
SLIDE 32

32

15-­‑214

Data consistency at Facebook

  • Replication for scalability:

§ Read-any, write-all § Palo Alto, CA is primary replica § Aside: A 2010 conversation:

Academic researcher: What would happen if X occurred? Facebook engineer: We don't know. X hasn't happened yet but it would be bad.

slide-33
SLIDE 33

33

15-­‑214

Data consistency at Amazon

  • Strict data consistency increases real costs

Amazon engineer: "'Usually ships in 2-3 days'? What does that mean? Absolutely nothing."

slide-34
SLIDE 34

34

15-­‑214

A common reality: Relaxed data consistency

  • Relaxed in time

§ E.g., Time-to-live in a data cache

  • Relaxed in value

§ I.e., within some error bound from the correct value

  • Other consistency guarantees

§ E.g., Causal consistency

slide-35
SLIDE 35

35

15-­‑214

Summary

  • Google makes $billions by treating us all like

random surfers

§ PageRank as iterative, weighted citation rankings

  • WWW graph modifications needed to compute

PageRank

  • Data consistency can be more than a boolean

function

slide-36
SLIDE 36

36

15-­‑214

Thursday…

  • Guest lecture by Claire Le Goues