Introduction to Distributed * Systems Introduction to Distributed * - PowerPoint PPT Presentation

Introduction to Distributed * Systems Introduction to Distributed * Systems

Outline Outline • about the course • relationship to other courses • the challenges of distributed systems • distributed services • *ility for distributed services • about the course

What is CPS 212 about? What is CPS 212 about? What do I mean by “ distributed information systems ”? • Distributed : a bunch of “computers” connected by “wires” • Nodes are (at least) semi-autonomous... but run software to coordinate and share resources. • Information systems : focus on systems to store/access/share data and operations on data. Move {data, computation} around the network and deliver it to the right places at the right times, safely and securely. • Focus on Internet information services and their building blocks. The Web, Web Services, name services, resource sharing (Grid) Clustering, network storage, file sharing

Why are you here? Why are you here? • You are a second-year (or later) CPS graduate student. • You have taken CPS 210 and 214 and/or 216 and you want more. familiarity with TCP/IP networking, threads, and file systems • Or: we have talked and we agreed that you should take the class. • You are comfortable with concurrent programming in Java. (You want to do some Java programming labs.) • You want to prepare for R/D in this exciting and important area. (You want to read about 15 papers and take some exams.) • You want to get started... (Semester group project.)

Continuum of Distributed Systems Continuum of Distributed Systems Issues: naming and sharing performance and scale Parallel resource management Networks small big Architectures CPS 214 fast slow CPS 221 ? ? Global Multiprocessors clusters LAN Internet fast network slow network high latency low latency trusting hosts untrusting hosts low bandwidth high bandwidth coordinated autonomy autonomous nodes secure, reliable interconnect unreliable network no independent failures fear and distrust coordinated resources independent failures decentralized administration

The Challenges of Distributed Systems The Challenges of Distributed Systems • private communication over public networks who sent it (authentication), did anyone change it, did anyone see it • building reliable systems from unreliable components nodes fail independently; a distributed system can “partly fail” Lamport: “A distributed system is one in which the failure of a machine I’ve never heard of can prevent me from doing my work.” • location, location, location Placing data and computation for effective resource sharing, and finding it again once you put it somewhere. • coordination and shared state What should we (the system components) do and when should we do it? Once we’ve all done it, can we all agree on what we did and when?

Information Systems vs. Databases Information Systems vs. Databases “Information systems” is more general than “relational databases”. • Overlap: We study distributed concurrency control and recovery, but not the relational model. The issues are related, but we’ll consider a wider range of data models and service models. In this course, we view databases as: • local components of larger distributed systems, or • distributed systems in themselves. Focus: scale and robustness of large-scale Internet services.

September 11, 2001 September 11, 2001 The 9/11 load spike at CNN.com: • complete collapse • scramble to manually deploy new servers How can we handle “flash crowds”? • Buy/install enough hardware for worst-case load? • Block traffic? • Adaptive provisioning? • Steal resources from less critical services?

That Other September 11 That Other September 11 This is a graph of request traffic to download the Starr Report on Pres. Clinton’s extracurricular pursuits, released on 9/11/98.

Broader Importance of Distributed Software Technology Broader Importance of Distributed Software Technology Today, the global community depends increasingly on distributed information systems technologies. There are many recent examples of high-profile meltdowns of systems for distributed information exchange. • Code Red worm: July 2001 • denial-of-service attacks against Yahoo etc. (spring 00) • stored credit card numbers stolen from CDNow.com (spring 00) People were afraid to buy over the net at all just a few years ago! • Network Solutions DNS root server failure (fall 00) • MCI trunk drop interrupts Chicago Board of Exchange (summer 99) These reflect the reshaping of business, government, and society brought by the global Internet and related software. We have to “get it right”!

The Importance of Authentication The Importance of Authentication This is a picture of a $2.5B move in the value of Emulex Corporation, in response to a fraudulent press release by short-sellers through InternetWire in 2000. The release was widely disseminated by news media as a statement from Emulex management, but media failed to authenticate it. EMLX [reproduced from clearstation.com ]

Challenges for Services: * Challenges for Services: *ility ility We want our distributed applications to be useful, correct, and secure. We also want reliability . Broadly, that means: • recoverability Don’t lose data if a failure occurs (also durability ) • availability Don’t interrupt service if a failure occurs. • scalability Grow effectively with the workload. See also: manageability . • survivability Murphy’s Law says it’s a dangerous world. Can systems protect themselves? • See also: security , adaptibility, agility, dependability, perormability, etc.

The Meaning of Scalability The Meaning of Scalability Scalability is now part of the “enhanced standard litany” [Fox]; everybody claims their system is “scalable”. What does it really mean? Pay as you go: expand capacity by spending more money, in total cost of capacity proportion to the new capacity. scalable unscalable cost Note : watch out for “hockey sticks”! marginal cost of capacity capacity How do we measure or validate claims of scalability?

Scalability II: Manageability Scalability II: Manageability Today, “cost” has a broader meaning than it once did: • growth in administrative overhead with capacity • no interruption of service to upgrade capacity “24 * 7 * 365 * .9999” Where does the money go? [Borrowed from Jim Gray] vendor facility 5% 20% New Old vendor World World 40% staff 50% 40% staff facility 40% 5%

Self- -Managing Systems Managing Systems Self IBM’s Autonomic Computing Challenge

How to Build Self- -Managing Systems? Managing Systems? How to Build Self Act uat or (direct ives) client s Adapt at ion Policy Monit or Servers in (observat ions) t he Mist Where are the humans in the loop?

Availability Availability The basic technique for achieving availability is replication . replicate hardware components replicate functions replicate data replicate servers • e.g., primary/backup, hot standby, process pairs, etc. • e.g., RAID parity for available storage Build decentralized systems that eliminate single points of failure . • If a component fails, select a replica and redirect requests there. fail over

Recoverability Recoverability Some basic assumptions: • Nodes have volatile and (optional) nonvolatile storage. • Volatile storage is fast, but its contents are discarded in a failure. OS crash/restart, power failure, untimely process death • Nonvolatile ( stable ) storage is slow, but its contents survive failures of components other than the storage device itself. E.g., disk : high latency but also high bandwidth (if sequential) Low-latency nonvolatile storage exists. It is expensive but getting cheaper: NVRAM, Uninterruptible Power Supply (UPS), flash memory, MRAM, etc...these help keep things interesting. • Stability is never absolute: it is determined by probability of device failure, often measured by “mean time between failure” (MTBF). How about backing up data in remote memory?

Another View Another View

The Course The Course These challenges affect how/where we place functions and data in the network. It turns out that there are many common problems and techniques that can be (mostly) “factored out” of applications and services. That is (mostly) what this course is about. • Web operating systems • Large-scale information system: the Web • Distributed services: the next-generation Web • Internet service infrastructure and Internet information systems • Building blocks for scalable services: storage services, file services, cluster management, • Core distributed systems material

Introduction to Distributed * Systems Introduction to Distributed * - PowerPoint PPT Presentation

Introduction to Distributed * Systems Introduction to Distributed * Systems Outline Outline about the course relationship to other courses the challenges of distributed systems distributed services *ility for distributed

Distributed Systems (ICE 601) Distributed Transactions Dongman Lee ICU Class Overview

Distributed Systems Goals of Distributed Systems 13A. Distributed Systems: Goals & Challenges

Distributed Systems Goals of Distributed Systems 13A. Distributed Systems: Goals & Challenges

Introduction to Distributed Systems Introduction to Distributed Systems Outline Outline

Distributed File Systems Distributed File Systems A distributed file system (DFS) is a

Unleashing Talent in A Distributed Workforce C O R E N E T 2 0 2 0 HACKATHON: DISTRIBUTED W O R K

` James R. Wilcox Zach Tatlock Ilya Sergey Distributed Systems Distributed Infrastructure

Distributed Storage Systems part 1 Marko Vukoli Distributed Systems and Cloud Computing This

Coordinating distributed systems Marko Vukoli Distributed Systems and Cloud Computing Previous

Distributed File Systems Issues in Distributed File Service Case Studies: Sun

WHAT WE TALK ABOUT WHEN WE TALK ABOUT DISTRIBUTED SYSTEMS ALVARO VIDELA DISTRIBUTED SYSTEMS

Distributed File Systems: An Overview of Peer-to-Peer Architectures Distributed File Systems

DISTRIBUTED SYSTEMS Department of Computing Science Umea University Distributed Systems - D N

Networks and Distributed Systems Olaf Landsiedel Networks and Distributed Systems What is

Distributed Storage Systems part 2 Marko Vukoli Distributed Systems and Cloud Computing

Outline Introduction Background Distributed DBMS Architecture Distributed Database Design

Building an Elastic Main-Memory Database: E-Store AARON J. ELMORE AELMORE@CS.UCHICAGO.EDU

Project Planning and Project Management Week 2: Project Life Cycles Kay Dudman 1 Last week

Project Planning and Management Kay Dudman Slide 1 of 39 Structure of the Module Lectures (1

Permutation tests for coefficients of variation in general one-way ANOVA models Markus Pauly 1 and

Lecture 2.5: Proofs in propositional calculus Matthew Macauley Department of Mathematical

Applications in finite state automata Organisation and Introduction Kurt Eberle

Fractals : Spectral properties Statistical physics Course 1 Eric Akkermans I N S O R A

Synthetic Creutz-Hubbard model: interacting topological insulators with ultracold atoms Matteo

Sambuz

Useful Links

Newsletter

Mail Us

Introduction to Distributed * Systems Introduction to Distributed * - PowerPoint PPT Presentation

Introduction to Distributed * Systems Introduction to Distributed * Systems Outline Outline about the course relationship to other courses the challenges of distributed systems distributed services *ility for distributed

Distributed Systems (ICE 601) Distributed Transactions Dongman Lee ICU Class Overview

Distributed Systems Goals of Distributed Systems 13A. Distributed Systems: Goals &amp; Challenges

Distributed Systems Goals of Distributed Systems 13A. Distributed Systems: Goals &amp; Challenges

Introduction to Distributed Systems Introduction to Distributed Systems Outline Outline

Distributed File Systems Distributed File Systems A distributed file system (DFS) is a

Unleashing Talent in A Distributed Workforce C O R E N E T 2 0 2 0 HACKATHON: DISTRIBUTED W O R K

` James R. Wilcox Zach Tatlock Ilya Sergey Distributed Systems Distributed Infrastructure

Distributed Storage Systems part 1 Marko Vukoli Distributed Systems and Cloud Computing This

Coordinating distributed systems Marko Vukoli Distributed Systems and Cloud Computing Previous

Distributed File Systems Issues in Distributed File Service Case Studies: Sun

WHAT WE TALK ABOUT WHEN WE TALK ABOUT DISTRIBUTED SYSTEMS ALVARO VIDELA DISTRIBUTED SYSTEMS

Distributed File Systems: An Overview of Peer-to-Peer Architectures Distributed File Systems

DISTRIBUTED SYSTEMS Department of Computing Science Umea University Distributed Systems - D N

Networks and Distributed Systems Olaf Landsiedel Networks and Distributed Systems What is

Distributed Storage Systems part 2 Marko Vukoli Distributed Systems and Cloud Computing

Outline Introduction Background Distributed DBMS Architecture Distributed Database Design

Building an Elastic Main-Memory Database: E-Store AARON J. ELMORE AELMORE@CS.UCHICAGO.EDU

Project Planning and Project Management Week 2: Project Life Cycles Kay Dudman 1 Last week

Project Planning and Management Kay Dudman Slide 1 of 39 Structure of the Module Lectures (1

Permutation tests for coefficients of variation in general one-way ANOVA models Markus Pauly 1 and

Lecture 2.5: Proofs in propositional calculus Matthew Macauley Department of Mathematical

Applications in finite state automata Organisation and Introduction Kurt Eberle

Fractals : Spectral properties Statistical physics Course 1 Eric Akkermans I N S O R A

Synthetic Creutz-Hubbard model: interacting topological insulators with ultracold atoms Matteo

Sambuz

Useful Links

Newsletter

Mail Us

Distributed Systems Goals of Distributed Systems 13A. Distributed Systems: Goals & Challenges

Distributed Systems Goals of Distributed Systems 13A. Distributed Systems: Goals & Challenges