Verteilte Systeme (Distributed Systems) Karl M. Gschka - - PowerPoint PPT Presentation

verteilte systeme distributed systems
SMART_READER_LITE
LIVE PREVIEW

Verteilte Systeme (Distributed Systems) Karl M. Gschka - - PowerPoint PPT Presentation

Verteilte Systeme (Distributed Systems) Karl M. Gschka Karl.Goeschka@tuwien.ac.at http://www.infosys.tuwien.ac.at/teaching/courses/ VerteilteSysteme/ Some slides based on material from this book (Prentice Hall, 2005) 2 Concepts,


slide-1
SLIDE 1

Verteilte Systeme (Distributed Systems)

Karl M. Göschka Karl.Goeschka@tuwien.ac.at

http://www.infosys.tuwien.ac.at/teaching/courses/ VerteilteSysteme/

slide-2
SLIDE 2

2

 Some slides based on material from this book (Prentice Hall, 2005)

slide-3
SLIDE 3

6

Concepts, Paradigms, Technologies

Communication Processes & Concurrency Naming and Discovery Coordination & Agreement Replication & Consistency Dependability and FT Transactions Security (Persistency, Durability) CORBA J2EE COM+ .NET Web Services GRID P2P Pervasive Multimedia WWW Databases VoIP ...

How is naming and discovery realized in COM+ technology? Systems Engineering

slide-4
SLIDE 4

7

Paradigms and Characteristics (1)

 Classifications mixed and not orthogonal!  Focused entity: Data item, tuple, file, document, object, component, service, resource, stream, ...  Communication mechanisms: Transparency vs. coordination-based

 Request/reply  Message-passing (document exchange)  DSM and associative tuple spaces  Event and notification systems: publish/subscribe; subject-based, (interupt signaling/exception handling)

slide-5
SLIDE 5

8

Paradigms and Characteristics (2)

 Communication patterns:

 synchronous (blocking) vs. asynchronous (non- blocking)  transient vs. persistent

 Time: real-time vs. non-real-time, sync vs. async, ...  Coupling: tight vs. loose (temporal, referential, ...)  Scale and granularity: Client/server – pervasive  Mobility and topology: Structured, unstructured, ad-hoc, overlay, ...  Technologies, standards, and big players: OMG, J2EE, Microsoft, IBM, SAP, IETF, ITU, ...

slide-6
SLIDE 6

9

Communication coupling

A taxonomy of coordination models

e.g. „ad-hoc“ communication; event and subject-based; publish/subscribe; TIB/Rendevous e.g. tuple spaces; persistent DSM; associative; JavaSpaces e.g. transient message- passing; RPC; RMI e.g. persistent message-passing

slide-7
SLIDE 7

Technology Overview

 Distributed object-based systems  Components and EAI  Coordination-based systems  WWW, SOA, Grid, P2P  Lecture summary

slide-8
SLIDE 8

12

Distributed object based systems

 CORBA

 ORB  IDL (language mapping)

 DCOM

 Builds ORPC on top of DCE RPC  Integration of binary components from different languages (e.g. Visual Basic, Java, C++)

 Java

 RMI  Single-language middleware

slide-9
SLIDE 9

16

CORBA Object Model

The general organization of a CORBA system.

slide-10
SLIDE 10

22

CORBA Language Mapping

IDL IDL Compiler Skeleton Code Client Code generates reads Client Server Object Stub Skeleton Object Request Broker In programming language „A“ Stub Code Object Implementation Code In programming language „B“ Language „A“ Compiler and Linker Language „B“ Compiler and Linker

slide-11
SLIDE 11

43

Portable Object Adapter

Mapping of CORBA object identifiers to servants. a) The POA supports multiple servants. b) The POA supports a single servant.

OID: POA assigned or user assigned Activation explicit or

  • n demand

1. Multiple OIDs  single servant 2. One servant for all objects 3. Single servant, many objects and types (DSI) Thread per request, connection,

  • bject,...

Policy separated from mechanism!

slide-12
SLIDE 12

46

Object References

The organization of an IOR with specific information for IIOP  binding!

slide-13
SLIDE 13

49

CORBA Services

Overview of CORBA services.

Provides the current time within specified error margins Time Mechanisms for secure channels, authorization, and auditing Security Facilities for expressing relationships between objects Relationship Facilities for persistently storing objects Persistence Facilities to publish and find the services on object has to offer Trading Facilities for associating (attribute, value) pairs with objects Property Facilities for systemwide name of objects Naming Facilities for attaching a license to an object Licensing Facilities for creation, deletion, copying, and moving of objects Life cycle Facilities for marshaling and unmarshaling of objects Externalization Advanced facilities for event-based asynchronous communication Notification Facilities for asynchronous communication through events Event Flat and nested transactions on method calls over multiple objects Transaction Facilities to allow concurrent access to shared objects Concurrency Facilities for querying collections of objects in a declarative manner Query Facilities for grouping objects into lists, queue, sets, etc. Collection Description Service

A key to success for each technology is the set of services and development support it provides!

slide-14
SLIDE 14

62

COM Object Model

DCOM CORBA

The difference between language-defined and binary interfaces.

slide-15
SLIDE 15

64

Type Library and Registry

The overall architecture of DCOM.

~ CORBA interface repository ~ CORBA implementation repository

slide-16
SLIDE 16

78

Java RMI

 Integrated into the language (homogeneous)

 interface construct used to define remote objects  remote interface indicates remotely callable object  Objects are passed between JVMs  Marshaling = serialization (homogeneous)

 rmic = IDL compiler  rmid = daemon listening for rmi incoming calls  rmiregistry ~ directory service (supports binding)  Similar to CORBA but supports only one language  Subtle differences local/remote: Cloning (server only), synchronize (proxy only)  Remote object passed by reference (refers to proxy implementation) (implementation handle) ( only one language + class loader facility)

slide-17
SLIDE 17

Technology Overview

 Distributed object-based systems  Components and EAI  Coordination-based systems  WWW, SOA, Grid, P2P  Lecture summary

slide-18
SLIDE 18

84

Enterprise Application Integration

 Today we are developing tomorrow‘s legacy systems, that have to be integrated the next day  Approaches to replace N technologies by one new technology usually end up with N+1 technologies   Enterprise Application Integration (EAI) is the creation of business solutions by combining applications using common middlewares, interfaces, standards, and toolchains.   Integration at presentation, data, or functional layer

slide-19
SLIDE 19

85

Legacy Systems

proven functionality tested code historical Investment

  • ften highly efficient

and robust they work

slide-20
SLIDE 20

87

Where can we agree?

 There will be no consensus on hardware.  There will be no consensus on operating system.  There will be no consensus on network protocols.  There will be no consensus on programming language.  There must be consensus on interfaces and interoperability (interaction and composition)!   Standards, virtualization, components, services

slide-21
SLIDE 21

88

Virtualization in Distributed Systems

(a) A process virtual machine, with multiple instances of (application, runtime) combinations. E.g., JVM. (b) A virtual machine monitor, with multiple instances of (applications, OS)

  • combinations. E.g., VMware,

Xen

slide-22
SLIDE 22

93

Why use components?

 Concentration on the core competence  Encapsulation of solutions in components  Open for COTS products  Re-Use of components

“Avoid developing software – use components“

OMG Tagung, Wien 2001

“Components are for composition“

  • C. Szyperski

Components are well- known and proven in

  • ther domains
slide-23
SLIDE 23

94

Component-based Software Engineering

Components: CBSE and Product Lines „Buy before build. Reuse before buy“ Fred Brooks 1975(!)

slide-24
SLIDE 24

95

Product Line

Application A Application B

Components of Mercedes E class cars are 70% equal. Components of Boeing 757 and 767 are 60% equal.  most effort is integration istead of development! Quality, time to market, but complexity  re-use

slide-25
SLIDE 25

97

 Component

 Independent Development  Third party development  Binary integration  Component + Component == Component  Connection through composition  heterogeneous  black-box re-use

 Object

 Banana – gorilla problem  Local development  Source integration  Object + Object != Object  Connection through references  (homogeneous)  white-box re-use through inheritance

Component vs. Object

Klasse Komponente

slide-26
SLIDE 26

98

Component Granularity

100% Proportion of application addressed by component

cost-effectiveness

  • f use and reuse

match to requirements, flexibility/speed of change Routine/

  • peration

Program/ class Coarse-grained component Application package

slide-27
SLIDE 27

101

Component lifecycle

Component Repository Composition environment Run-time environment Component Model Component-based Architecture for field devices

slide-28
SLIDE 28

110

Enterprise Java Beans: Ziele

 EJB as integration technology  The following concepts are most relevant:

 EJB – Enterprise Java Beans,  RMI – Remote Method Invocation,  JNDI – Java Naming and Directory Interface,  JMS – Java Messaging Service,  JDBC – Access to databases,  SQLJ – static embedded SQL for Java,  JDO – Java Data Objects,  J2EE Connector (for Legacy Integration),  JTS/JTA – Java Transaction Service and Java Transaction Architecture.

slide-29
SLIDE 29

111

EJB Architecture

EJB Server EJB Container

Enterprise Beans

Web Container

JSP File Servlet Enterprise Beans

Client Database Server Tier 1 Tier 3 Tier 2

Run-time environment for containers, thread management, OS resources, load balancing, directory service, ... Run-time environment for beans, life-cycle management, instance pooling, distribution, service interfaces (standard and proprietary), e.g. user administration, ... Services: JNDI, JTS, Persistence, JMS, Security Policy, ...

slide-30
SLIDE 30

113

EJB Roles

Client Server

Application Assembler

EJB

EJB Provider EJB Deployer EJB Container Provider EJB Server Provider

DB EJB Server EJB Container Client

Business Logic

<<uses client contracts>> <<Installing EJB using server tools>>

slide-31
SLIDE 31

Technology Overview

 Distributed object-based systems  Components and EAI  Coordination-based systems  WWW, SOA, Grid, P2P  Lecture summary

slide-32
SLIDE 32

129

Coordination-based systems

  • To achieve highly scalable and open

distributed systems, components must be loosely coupled

  • Explicit communication and coordination of

components/activities (instead of transparency)

  • Event-based systems (also called

publish/subscribe) attempt to achieve loose coupling through event communication (notifications)

  • Key concept: loose coupling – separation

between computation and coordination

slide-33
SLIDE 33

145

Principles of JavaSpaces ™

 Temporal and referential uncoupling  Tuples with serialized Java objects  rich typing (type-safe), methods, subtypes  Entries are leased  Transactions  Persistent or transient (durability)  Operations:

 write  read (blocking), readIfExists (non-blocking)  take (blocking), takeIfExists (non-blocking)  notify

Read() is based

  • n template

matching

slide-34
SLIDE 34

146

Overview of JavaSpaces

The general organization of a JavaSpace in Jini.

slide-35
SLIDE 35

158

Distributed Shared Memory

 Abstraction for sharing data (without atually sharing physical memory)  Effective for parallel applications  Less effective for client/server applications  DSM runtime support performs message passing and replication/caching  Processors actually sharing memory: groups of 4, practical limit is 10, 64 with hierarchy  Distributed memory multiprocessors and clusters (high-speed network) typically scale better  Scalability of DSM is a well-known problem (similar to replication  consistency relaxed)

slide-36
SLIDE 36

160

Message passing vs. DSM

 Message passing: marshalling, process protection, heterogeneous, synchronization through messages (agreement problem), programmer is aware of communication costs  DSM: direct sharing (even pointers!), processes may interfere, homogeneous, synchronization through locks and semaphores, programmer may be unaware of communication costs  Efficiency depends on patterns of data sharing  Message passing cannot be avoided altogether in a distributed system!

slide-37
SLIDE 37

Technology Overview

 Distributed object-based systems  Components and EAI  Coordination-based systems  WWW, SOA, Grid, P2P  Lecture summary

slide-38
SLIDE 38

171

The World Wide Web

Key: Document (A), HTTP, URL, loose coupling

slide-39
SLIDE 39

176

Architectural Overview

 The principle of using server-side CGI programs.

slide-40
SLIDE 40

200

Web Services and SOA – motivation

 EAI – Enterprise Application Integration (MoM) (note: Was an argument for CBSE as well)  WfMS – Workflow Management Systems BPEL  CBSE – Components are not obsolete!  SOA provide a virtual component model  WWW – Loose coupling: Heterogeneous, flexible, and dynamic orchestration  Re-use (note: Was an argument for CBSE, Middleware, ...)  Interface management (note: -“-)  Business integration („business goals with IT“)

slide-41
SLIDE 41

202

DBMS .NET J2EE Virtual Component Concrete Component

Virtualizing Components

Assembly

StP ... Web Service (E)JB

implements implements implements

slide-42
SLIDE 42

203

Web Services – introduction

 Effectiveness of simple protocols  Complex applications with service integration  Web service != web server  Data representation and marshalling: XML  SOAP protocol: How to package messages  WSDL: Service description („IDL“)  UDDI: Naming and discovery (did not work)  XML Security: Documents signed or encrypted  Coordination through explicit protocols

slide-43
SLIDE 43

204

Organizing Into A Platform

Messaging Quality

  • f Service

Transport Description

Transports Interface + Bindings Composite XML Non-XML Security Policy Discovery, Negotiation, Agreement Atomic

Choreography Choreography Protocols Protocols State State

Components

Reliable Messaging Transactions

slide-44
SLIDE 44

205

Transactions Reliable Messaging

The Bus And Standards

Transports Interface + Bindings Composite XML Non-XML Security Policy Discovery, Negotiation, Agreement Atomic

Choreography Choreography Grouping Grouping

BPEL WS-RM WS-Security* WS-AT, WS-BA,… WSDL* WS-Policy* SOAP, WS-Addressing JMS, RMI/IIOP, .. HTTP, TCP/IP, SMTP, FTP, … UDDI, WS-Addressing, MEX,… WS-C, WS-N*,…

State State

WS-RF

Messaging Quality

  • f Service

Transport Description Components

slide-45
SLIDE 45

209

Combinations of web services

hotel booking a Travel Agent flight booking a hire car booking a Service Client flight booking b hotel booking b hire car booking b

Value-added services from third parties provide new functionality

slide-46
SLIDE 46

210

Web Services – principles (1)

 Interface offers a collection of operations, provided by a variety of different resources (programs, objects, databases, ...)  Messages

 XML-formatted SOAP messages call operations of interfaces  REST (representational state transfer): URLs and HTTP messages used to manipulate data resources

 Amazon, Google, eBay, ... offer web services to manipulate their web resources (e.g. Procurement application @Amazon, ‚sniping‘ @eBay)

slide-47
SLIDE 47

211

Web Services – principles (2)

 Communication patterns

 asynchronous exchange of documents  synchronous request/reply  event/notification also available

 No particular programming model

 no remote object reference  no garbage collection

 XML representation:

 more space (human readable?)  binary versions available  more time to process

slide-48
SLIDE 48

212

Web Services – principles (3)

 Service references: URL (URI)  Activation of services:

 continous operation  activation on demand

 Transparency:

 none  which need not be bad  However, for convenience, handling of XML and SOAP is hidden by APIs and/or tools.

 Proxies vs. dynamic invocation

 Conversion to SOAP/XML static or on-the-fly

slide-49
SLIDE 49

235

What is BPEL

 A language to specify behavior of business processes

 Between Web services...  ...and as Web services

 Same language to define executable processes and business protocols  Executable processes

 Can be performed at all compliant environments (portability)  Interoperability between heterogeneous environments

 Abstract processes

 Specify constraints of message exchange  Are “views” on internal processes

 Combination of graph-based language (IBM WSFL) and calculus-based language (Microsoft XLANG)

slide-50
SLIDE 50

244

Grid – motivation

 Middleware to enable the sharing of resources

  • n a large scale (mainly data or computer

power for data-intensive applications).  Management coordinates the use of resources.

slide-51
SLIDE 51

245

Heterogeneous Resources

Distributed physical clusters and storage

slide-52
SLIDE 52

246

Virtual clusters and storage

The Grid: Virtualizing Resources

Grid Middleware Service “Bus” as GRID middleware

slide-53
SLIDE 53

250

Cloud Computing

Computing Power as a configurable, payable Service

slide-54
SLIDE 54

252

Peer to peer (P2P) – aims

 Enable large (global) scale

 by eliminating (centrally-) managed servers and infrastructures (administration and fault recovery cost, bandwidth bottleneck).

 Build a reliable resource sharing layer over an unreliable and untrusted collection of (unpredictable) nodes (probabilistic).

 Exploit available resources and construct applications that are scalable, reliable, and secure.

 Data and computational resources are contributed by many hosts (nodes) in an unmanaged way to participate in the provision of a uniform service.

slide-55
SLIDE 55

253

Peer to peer (P2P) – challenges

 All nodes have the same functional capabilities and responsibilities  Key problem: Placement of data objects and subsequent provision for access, while balancing workload and availability.   Algorithms for placement and retrieval are a key aspect:

 decentralized  self-organising  self-balancing (storage and processing load)

 Most effective with immutable data (file sharing, web caching, ...).

slide-56
SLIDE 56

255

Why P2P - from research point of view

 Challenges for future Internet applications

 Scalability (nodes, users)  Dynamics (mobility, QoS-aware flexibility)  Heterogeneity (scopes of control)  Security (anonymity, censorship, availability)  Dependability (and performance)

 Centralized systems

 single point of failure  single point to attack  bottleneck

slide-57
SLIDE 57

256

Why P2P - Reality

slide-58
SLIDE 58

267

Other systems

 Mobile and pervasive (ubiquitous) systems:

 wireless connectivity of portable devices  device miniaturization and integration of computing devices with our everyday physical world   deal with frequent change!

 Distributed multimedia

 Continous streams of data in real-time (Video, VoIP, ...)  timely processing and delivery   Flow specifications and QoS contracts/management  Voice-data convergence and service integration

slide-59
SLIDE 59

Technology Overview

 Distributed object-based systems  Components and EAI  Coordination-based systems  WWW, SOA, Grid, P2P  Lecture summary

slide-60
SLIDE 60

273

Design goals in distributed systems

 Resource sharing (collaborative, competitive)  Transparency

 Hiding internal structure, complexity

 Openness

 Portability, interoperability, ...  Services provided by standard rules  Separating policy from mechanism

 Scalability

 Ability to expand the system easily

 Concurrency

 inherently parallel (not just simulated)

 Fault Tolerance (FT), availability

slide-61
SLIDE 61

277

Dealing with complexity

 Abstraction (and modeling)

 Client, server, service  Interface versus implementation

 Information hiding (encapsulation)

 Interface design

 Separation of concerns

 Layering (filesystem example: bytes, disc blocks, files)  Client and server  Components (granularity issues)  Policy vs. mechanism

slide-62
SLIDE 62

278

Communication models

 Multiprocessors: shared memory  Multicomputers: message passing  Synchronization in shared memory:

 Semaphores (atomic mutex variable)  Monitors — an abstract data type whose

  • perations may be invoked by concurrent threads;

different invocations are synchronized

 Synchronization in multicomputers: blocking in message passing

slide-63
SLIDE 63

279

Architectural Styles

 Important styles of architecture for distributed systems  Layered architectures (OSI)  Object-based architectures (and components)  Data-centered architectures (file based, database, resourceful WS, ...)  Event-based architectures  .... and combinations thereof

slide-64
SLIDE 64

280

Essentially everyone, when they first build a distributed application, makes the above eight assumptions. All prove to be false in the long run and all cause big trouble and painful learning experiences. (Peter Deutsch)

The 8 Fallacies of Distributed Computing

  • 1. The network is reliable
  • 2. Latency is zero
  • 3. Bandwidth is infinite
  • 4. The network is secure
  • 5. Topology doesn't change
  • 6. There is one administrator
  • 7. Transport cost is zero
  • 8. The network is homogeneous
slide-65
SLIDE 65

281

Concepts of distributed systems

 Communication  Concurrency and operating system support (competitive, cooperative)  Naming and discovery  Synchronization and agreement  Consistency and replication  Fault-tolerance  Security

slide-66
SLIDE 66

282

Communication

 Communication is the distinguishing characteristic of distributed applications  Communication mechanisms may be explicit or implicit  Different models of communication exist  Synchronization and persistence  Discrete and continuous media have distinct communication requirements  Remote procedures, distributed objects, message queues, and streams are just four types of abstractions that can be used

slide-67
SLIDE 67

283

Operating System Support

 Concurrency is naturally present in a distributed system and needs operating system support  Concurrency may be exploited in several ways in distributed systems:

 To improve performance by hiding delays due to blocking  To structure high-performance servers  To structure clients that hide server replication

 Other paradigms with different operating system support

 code migration  virtualization

slide-68
SLIDE 68

284

Naming and discovery

 Names are organized in name spaces; implemented in hierarchies and layers  A naming service provides the mapping (resolution): name  attribute (typically address)  Consistency of distributed name service depends on update algorithms used  Caching and replication increase performance/availability  Directory service provides a way to structure a name space according to attributes  Discovery service supports ad hoc networks, dynamics, and large-scale (e.g., P2P)  Mobility is supported by location services  Distributed garbage collection is challenging

slide-69
SLIDE 69

285

Synchronization and agreement

 Distributed processes need to synchronize their actions to ensure cooperation or fair competition  Lack of a global clock makes synchronization difficult  Often, ordering is enough: Logical clocks and vector stamps reduce the cost of synchronization  Distributed agreement algorithms are required when processes need to coordinate their actions.  Mutex, Election, Global state, ...

slide-70
SLIDE 70

286

Consistency and replication

 Replication can help to achieve better performance and fault tolerance  Chosen replication protocol depends on different parameters: consistency requirements, read/write ratio, number of clients, etc.

 consistency model  update propagation methods

 Most important protocols:

 Primary-backup replication  Coordinator-cohort/Update-everywhere replication  Active replication  Quorum-based protocols  Epidemic protocols

 Need to be adopted for domain:

 distributed object system, file system, database system, service-oriented system, P2P system, etc.

slide-71
SLIDE 71

287

Dependability and fault tolerance

 Dependability is a holistic concept  Distributed systems can suffer partial failures  Distributed systems can provide fault-tolerance  Faults can be due to process failures or communication failures  Process replication (process groups) can help deal with process failures  Reliable communication can be built on top of unreliable communication mechanisms  Lost-reply problem has to be dealt with in client/server architectures  A reliable multicast (group communication) is in many cases necessary for providing fault-tolerant distributed algorithms

slide-72
SLIDE 72

288

Security

 Demand for security (unfortunately) obvious: e- banking, e-government, online auctions, etc.  Security services are necessary to protect communications and transactions in open networks  Security can be provided by secure channels and authorization services  Authorization requires authentication and access control  Encryption is used for secure communication  Public key and secret key cryptography can be used for authentication (e.g. digital signatures)  Distribution of encryption keys must be managed by a trusted third party or out-of-band communication.

slide-73
SLIDE 73

292

Concepts, Paradigms, Technologies

Communication Processes & Concurrency Naming and Discovery Coordination & Agreement Replication & Consistency Dependability and FT Transactions Security (Persistency, Durability) CORBA J2EE COM+ .NET Web Services GRID P2P Pervasive Multimedia WWW Databases VoIP ...

How is naming and discovery realized in COM+ technology? Systems Engineering

slide-74
SLIDE 74

295

Future Prospects?

 SOA and Web services (standards-based)  Grid and cloud computing (aggregation of nodes)  Mobility (portable devices) and ad-hoc (MANET)  Voice-data convergence and service integration  Pervasive/ambient/ubiquitous computing (billions of nodes and new kinds of applications)  Ultra large scale systems (complexity, emerging behaviour)  Adaptive (self-*, autonomous) systems  Bio-inspired methods

slide-75
SLIDE 75

306

Continuative lectures

 Advanced Distributed Systems

 Dependable Systems  Adaptive Systems  Autonomic and Bio-inspired systems  Application examples: MMOG

 Technologien Verteilter Systeme  Software Architekturen  Entwurfsmethoden für Verteilte Systeme  More lecturing

 http://www.infosys.tuwien.ac.at/teaching/

slide-76
SLIDE 76

307

Interested?

 Join our national and international projects – where research meets industry!  Praktikum  Diplomarbeit  Dissertation

 http://www.infosys.tuwien.ac.at/  http://www.infosys.tuwien.ac.at/staff/kmg/  http://www.dedisys.org/  http://www.dedisys.org/trade/