Advanced Distributed Systems Introduction & RFC #677 Wyatt - - PowerPoint PPT Presentation

advanced distributed systems introduction rfc 677 wyatt
SMART_READER_LITE
LIVE PREVIEW

Advanced Distributed Systems Introduction & RFC #677 Wyatt - - PowerPoint PPT Presentation

Advanced Distributed Systems Introduction & RFC #677 Wyatt Lloyd Some slides adapted from Minlan Yu Welcome! Rules No laptops, no phones: pay attention! Sit towards the front: participate! Today


slide-1
SLIDE 1

Advanced Distributed Systems

  • Introduction & RFC #677

Wyatt Lloyd

  • Some slides adapted from Minlan Yu
slide-2
SLIDE 2

Welcome!

slide-3
SLIDE 3

Rules

  • No laptops, no phones: pay attention!
  • Sit towards the front: participate!
slide-4
SLIDE 4

Today

  • Me
  • My research
  • Distributed systems, what and why?
  • Topic overview
  • Class structure
  • Vishal Mishra colloquium
slide-5
SLIDE 5

Introducing Me

  • Wyatt Lloyd

– Please call me Wyatt – Phonetically “why” “it”

  • New Assistant Professor

– Started Fall 2014 – First class!

  • Penn State à Princeton à Facebook à USC

– Pennsylvania à New Jersey à New York à LA

slide-6
SLIDE 6

My Research

  • Distributed Systems!
  • 1: Geo-replicated storage

– Mar 24: COPS, Eiger – Mar 31: Rococo

  • 2: Improving photo storage and delivery (at FB)

– April 23: FB Photo Caching, f4, RIPQ

slide-7
SLIDE 7

Distributed Systems, What?

1) Multiple computers 2) Connected by a network 3) Doing something together

slide-8
SLIDE 8

Distributed Systems, Why?

  • Or, why not 1 computer to rule them all?
  • Failure
  • Limited computation/storage/…
  • Physical location
slide-9
SLIDE 9

Topic Overview

  • Introduction to Distributed Systems

– Building systems – MapReduce, a case study – Logical time

  • Ordering events in distributed systems
  • Turing Award!

– Safety/Liveness

  • Safety: Don’t break stuff
  • Liveness: Keeping doing stuff
slide-10
SLIDE 10

Fault Tolerance

  • Local fault tolerance

– Or, restarting a machine breaks everything

  • Distributed fault tolerance

– What if you can’t restart? – What if you want to do stuff while restarting?

  • Byzantine fault tolerance

– What if computers are evil?

slide-11
SLIDE 11

Group Communication

  • Impossibility of consensus (FLP)

– You’ll never agree!

  • Atomic commit

– Either we all do it or no one does

  • Fancier group communication

– Atomic broadcast, atomic multicast, …

slide-12
SLIDE 12

Consensus

  • It’s not impossible, but it’s hard!

– Paxos – Viewstamped Replication – Two Turing Awards!

  • We’re going to talk about Paxos a lot
slide-13
SLIDE 13

Consistency and Transactions

  • Strong consistency

– The data is the same everywhere

  • Weak consistency

– The data *isn’t* the same everywhere, but the systems are faster and/or cheaper

  • Transactions

– Updating/reading lots of data together

  • Distributed Transactions

– That data is on different machines

slide-14
SLIDE 14

Distribute Everything

  • Distributed Logs
  • Distributed File Systems
  • Distributed Debugging
  • Distributed Hash Tables
  • Peer-to-peer Systems
slide-15
SLIDE 15

Modern Marvels

  • Google stack day

– Google is awesome (and publishes)

  • Facebook stack day

– Facebook is awesome (and publishes)

  • Pushing systems to their limits

– Academics are awesome

slide-16
SLIDE 16

Probably!

  • I will likely move stuff around
  • I might even remove/add topics
  • Email/chat with me if you think there is a

topic I’m missing J

slide-17
SLIDE 17

Class Structure

  • Classes
  • Paper Readings
  • Paper Summaries
  • Paper Presentations
  • Programming Projects
  • Exams
slide-18
SLIDE 18

Classes

  • Normal + Seminar
  • 50 min lecture
  • 5 min break
  • 20 min supplemental presentations
  • 30 min paper discussion
  • Colloquium afterwards sometimes

– Vishal Mishra colloquium today, but at 4

slide-19
SLIDE 19

Paper Reading

  • One paper per class required

– Must read before class

  • Multiple paper per class supplemental

– Do not have to read – Other students will present – Recommended by me

  • Paper available on HotCRP site
slide-20
SLIDE 20

You Spend a Lot of Time Reading

  • Reading papers for grad classes (like this one!)
  • Reviewing papers for conferences
  • Giving colleagues feedback on their papers
  • Keeping up with work related to your research
  • Staying broadly educated about the field
  • Transitioning into a new research area
  • Learning how to write better papers J
  • So, it is worthwhile to lear

So, it is worthwhile to learn to r n to read ead effectively effectively

slide-21
SLIDE 21

Keshav’s 3-Pass Approach: Pass 1

  • A ten-minute scan to get the general idea

– Title, abstract, and introduction – Headings – Conclusion – Bibliography

  • What to learn: the five C’s

– Category: What type of paper is it? – Context: What body of work does it relate to? – Correctness: Do the assumptions seem valid? – Contributions: What are the main research contributions? – Clarity: Is the paper well written?

slide-22
SLIDE 22

Keshav’s 3-Pass Approach: Pass 2

  • A more careful, one-hour reading

– Read with greater care, but ignore details like proofs – Figures, diagrams, and illustrations – Mark relevant references for later reading

  • Grasp the content of the paper

– Be able to summarize the main thrust to others – Identify whether you can/should fully understand

  • Decide whether to

– Abandon reading the paper in any greater depth – Read background material before proceeding further – Persevere and continue on to the third pass

slide-23
SLIDE 23

Keshav’s 3-Pass Approach: Pass 3

  • Several-hour virtual re-implementation of the work

– Making the same assumptions, recreate the work – Identify the paper’s innovations and its failings – Identify and challenge every assumption – Think how you would present the ideas yourself – Jot down ideas for future work

  • When should you read this carefully?

– Reviewing for a conference or journal – Giving colleagues feedback on a paper – Understand a paper closely related to your research

slide-24
SLIDE 24

Paper Summaries

  • What problem the paper is addressing (1-2 sentences).
  • The core novel ideas or technical contributions

– What's the 30 second elevator pitch – What should one remember about this paper?

  • A longer description (3-5 sentences) that summarizes

the paper's approach, mechanisms, and findings.

– Longer for supplemental papers (3-5 paragraphs)

  • A novel response…
slide-25
SLIDE 25

Paper Summaries

  • Novel Response

– Issues with problem or assumptions? – What problems do you see with methodology that the paper does not address?

  • (Precision, accuracy, misconceptioon, representativeness)

– How would the results differ today? Why? – What study should we do as followup work? – Should we adapt the approach to a new setting?

slide-26
SLIDE 26

Paper Summaries

  • Submit via HotCRP

– http://nsl.cs.usc.edu/599s15reviews/

  • Required papers

– Due at 11:59pm night before class

  • 14 hours before class

– Can submit up to 4 summaries up to 1 week late

  • After this, get a 0
  • Supplemental papers you present

– Due at the same time as your initial presentation – Will be viewable and a resource for the whole class

slide-27
SLIDE 27

Paper Presentations

  • 5-7 minute presentations to the class
  • You will be an expert on these papers!
  • Expect to do 2-3

– Currently 47 supplemental paper + 18

  • Due 1 week before the class

– Email me and Bailan the slides

  • Format/details out soon
  • Sign up sheet out soon
slide-28
SLIDE 28

Programming Projects

  • In go!

– https://golang.org/ – Do the tour: https://tour.golang.org/welcome/1 – Play in the playground: https://play.golang.org/

  • This is useful place to test code snippets as you work on your

projects

– Watch Rob Pike’s concurrency talk: http://youtu.be/f6kdp27TYZs

  • Why go?

– Cool new elegant language!

  • Especially concurrency, but syntax is clean too

– Amount of time to learn go < amount of time go will save you vs. C++ for just these projects!

slide-29
SLIDE 29

4 Programming Projects

  • 1: Local MapReduce + Go Intro
  • 2: Primary/Backup Key/Value Service

– Add some fault tolerance

  • 3: Paxos-based Key/Value Service

– Add real fault tolerance

  • 4: Sharded Key/Value Service

– Add scalability

slide-30
SLIDE 30

7 Programming Project Deadlines

  • All on Fridays at 11:59pm
  • 1 Jan 30
  • 2aFeb 13
  • 2bFeb 20
  • 3aMar 6
  • 3bMar 13
  • 4aApr 3
  • 4bApr 10
slide-31
SLIDE 31

Submit/Develop with Git

  • Distributed Version Control System

– Use to track your changes – Makes it easy to go back to old versions

  • “Shoot, X used to work …”

– Makes it easy to collaborate

  • You won’t use this though

– You will use it to track your time in these projects

  • Not an intended use…
  • Check out the docs: http://git-scm.com/
  • Learn git in 15 minutes: https://try.github.io/
slide-32
SLIDE 32

Rules

  • No collaboration on assignments: learn!

– Do not share any code with other students – Do not post your code anywhere – Write all of your own code – Do not write psuedo-code on a whiteboard with other students – Default to asking me if you’re not sure if something is okay – (We will run anti-cheating software)

slide-33
SLIDE 33

2 Exams

  • In Class

– Feb 26 – Apr 30 (last class)

  • Test material from:

– Lectures – Required papers – High level ideas from supplemental papers

slide-34
SLIDE 34

Grading

  • Paper Summaries: 10%
  • Paper Presentation: 10%
  • Participation: 5%

– Participate every class!

  • Exams: 35%
  • Programming Assignments: 40%
slide-35
SLIDE 35

Intermission

slide-36
SLIDE 36

RFC #677

The Maintenance of Duplicate Databases

  • 1975!
  • “User Identification Database for the TIP

user authentication and accounting system.”

slide-37
SLIDE 37

A History Lesson

  • Terminals connect to computers

– “monitor = video terminal”, keyboard

  • ARPANET
  • TIP

– Terminal IMP: Interface Message Processor – Telephone number -> ARPANET connection

  • Before the User Identification Database anyone with a

terminal and the phone number could log in

– Send email – Transfer files – Do computations?

slide-38
SLIDE 38

ARPANET March 1972

slide-39
SLIDE 39

ARPANET July 1976

slide-40
SLIDE 40

ARPANET July 1976

slide-41
SLIDE 41

Access the ARPANET

Terminal + Phone # of a TIP + Authenticate = ARPANET

slide-42
SLIDE 42

User ID DB: What, Where, Why

  • What is in the database?

– User ID, Password, Accouting

  • Where is the database?

– All 25 of the red squares

  • Why is the database in each place?

– “to increase reliability of data access” – “to insure efficiency of data access”

slide-43
SLIDE 43

Simple Scenarios

  • User logs in in Hawaii
  • User changes password in Hawaii
  • User logs in in LA
  • What if Hawaii-to-mainland link is down?
slide-44
SLIDE 44

Simple Scenarios

  • New user added in Hawaii
  • User logs in in Boston
  • User deleted in Pentagon
  • User logs in in Utah
slide-45
SLIDE 45

Fun Scenario 1

  • What if user changes password in

Hawaii and then USC?

  • Conflict resolution strategies:

– Apply all updates in order received.

  • What goes wrong?

– Apply all updates in a total, global order

  • Where does this order come from?
slide-46
SLIDE 46

RFC #677’s Total Order

  • What is a total order?

– Orders everything! – Formally:

  • Binary relation on a set
  • Transitive

(a ≤ b & b ≤ c => a ≤ c)

  • Antisymmetric

(a ≤ b & b ≤ a => a = b)

  • Total
  • (a ≤ b OR b ≤ a)
slide-47
SLIDE 47

RFC #677’s Total Order

  • Where does it come from?

– “Each timestamp is a pair (T,D): T is a time, D is a DBMP identifier”

  • Why does this totally order DB updates?
  • Why not rand128() + dbid?
  • What can still go wrong with this?

– Inspired Lamport’s time paper (3rd class)

slide-48
SLIDE 48

Fun Scenario 2

  • What if?

– Al is deleted at the Pentagon, – Al is modified in Boston

  • In Utah:

– Update(Al) then Delete(Al) – Delete(Al) then Update(Al)

  • Use “tombstone” to mark deletion
slide-49
SLIDE 49

Fun Scenario 3

  • What if?

– Al is deleted in the Pentagon – Al is readded in Boston – Al is modified in USC

  • In Utah:

– Delete, Add, Modify – Delete, Modify, Add

  • “Tombstone” cancel modify, modify lost
  • How do you fix this?

– Include creation time, later creation time wins

slide-50
SLIDE 50

Lots of Users?

  • What if we add and delete a lot of users?

– E.g., Al was “deleted” in 1975

  • Garbage Collection!

– When can we delete something?

  • “[Do not delete an] entry until it will never receive

any assignments with the same selector (S) and the same or older create time (CT).”

– All DBs have deleted it (scenario 2) – No more modifications are pending (scenario 3)

slide-51
SLIDE 51

Eventual Consistency Defined in 1975

“The extent to which the copies of the database can be kept "identical" must be examined. Because of the inherent delay in communications between DBMPs, it is impossible to guarantee that the data bases are identical at all times. Rather, our goal is to guarantee that the copies are "consistent" with each other. By this we mean that given a cessation of update activity to any entry, and enough time for each DBMP to communicate with all

  • ther DBMPs, then the state of that entry (its existence

and value) will be identical in all copies of the database.”

slide-52
SLIDE 52

What Did You Learn Today?

  • Distributed Systems, What and Why
  • Class Topics
  • Class Structure
  • RFC #677: A really old distributed system

– Why distributed? – Totally order updates – Tombstones for safe deletion – Garbage collection for removing tombstones

slide-53
SLIDE 53

Next Time

  • Remote Procedure Calls (RPCs)

– Getting machines to talk to each other

  • Map Reduce

– Your project – Computing at scale – Incredibly influential paper and system

slide-54
SLIDE 54

Vishal Misra Colloquium

  • Professor at Columbia
  • Focus on Distributed Systems and Networking

– Infio: Distributed systems startup

  • The Network Neutrality Debate:

An Engineering Perspective

  • Let’s go!