Verteilte Systeme (Distributed Systems) Karl M. Gschka - - PowerPoint PPT Presentation

verteilte systeme distributed systems
SMART_READER_LITE
LIVE PREVIEW

Verteilte Systeme (Distributed Systems) Karl M. Gschka - - PowerPoint PPT Presentation

Verteilte Systeme (Distributed Systems) Karl M. Gschka Karl.Goeschka@tuwien.ac.at http://www.infosys.tuwien.ac.at/teaching/courses/ VerteilteSysteme/ Lecture 5: Naming and Discovery Name, Address, Identifier Name Space and Name


slide-1
SLIDE 1

Verteilte Systeme (Distributed Systems)

Karl M. Göschka Karl.Goeschka@tuwien.ac.at

http://www.infosys.tuwien.ac.at/teaching/courses/ VerteilteSysteme/

slide-2
SLIDE 2

Lecture 5: Naming and Discovery

 Name, Address, Identifier  Name Space and Name Resolution (DNS)  Directory Services (X.500/LDAP)  Discovery Services  Distributed Garbage Collection

slide-3
SLIDE 3

3

Naming and discovery

 Identify and locate resources for communication and resource sharing  Location transparency  Scalability and performance  DNS, NDS, ADS, X.500, LDAP, JNDI, JINI, UDDI, ...  Dealing with mobile entities  Removal of unreferenced entities

slide-4
SLIDE 4

4

Types of names

 Names: string to refer to an entity

 often human readable  Entities

 can be operated on (via access points)  have attributes

 Address: name (location) of an access point

 for short: address of an entity  address as name of an entity?

 Identifier (often machine-readable)

 refers to at most one entity  each entity has only one identifier  always refers to the same entity (never re-used)

slide-5
SLIDE 5

6

Composed naming domains

http://www.cdk3.net:8888/WebExamples/earth.html URL Resource ID (IP number, port number, pathname) Network address 2:60:8c:2:b0:5a file Web server 55.55.55.55 WebExamples/earth.html 8888 DNS lookup Socket

slide-6
SLIDE 6

7

Properties of names

 Location-independence (transparency)

 Name independent from address  Do not encode address in name  IP addresses are not location-independent (are they?)

 Uniqueness (how to achieve?)

 Static assignment of ranges of names (convention)  Use compound names  Use contexts or domains  Use an id generation algorithm (sequence generator)  Flat or hierarchical name spaces

 Simple and composite names; aliases  Pure names (no entity information at all)  Wildcards? Paths?

slide-7
SLIDE 7

8

How to implement identifiers

 Keep a counter

 What if machine fails?  Keep counter in stable storage

 Use a random-number generator

 high probability but not guaranteed  keep list of previously assigned names

 Concurrency control

 sequence generator

 Distribution:

 use creating node

slide-8
SLIDE 8

Lecture 5: Naming and Discovery

 Name, Address, Identifier  Name Space and Name Resolution (DNS)  Directory Services (X.500/LDAP)  Discovery Services  Distributed Garbage Collection

slide-9
SLIDE 9

10

Name Spaces (1)

 organizes names in a labeled, directed graph  name always relative to a directory node  leaf node:

 named entity (attributes or state)

 directory node:

 labeled edges  node identifiers (directory table)

 root node  path: sequence of labels

 absolute vs. relative (with respect to root node)  global vs. local (relative to place of usage)

slide-10
SLIDE 10

11

Name Spaces (2)

 A general naming graph:  only a single root?  acyclic?  strictly hierarchical (tree, no “links”)?

slide-11
SLIDE 11

12

Linking and Mounting

 Alias

 multiple absolute paths („hard link“)  leaf node stores absolute path name („symbolic link“)

 Mounting

 merge different name spaces  mount point: directory that stores identifier of directory node from foreign name space  To mount we need: access protocol, server, mounting point (each of these needs to be resolved!)

slide-12
SLIDE 12

13

Linking: Hard Link

The concept of a hard link explained in a naming graph.

slide-13
SLIDE 13

14

Linking: Symbolic Link

The concept of a symbolic link explained in a naming graph.

slide-14
SLIDE 14

15

Mounting

/remote/vu/mbox

Mounting remote name spaces through a specific protocol.

slide-15
SLIDE 15

16

Merging trees (e.g., GNS)

home

Organization of the DEC Global Name Service: Names always (implicitly) include the identifier of the node from where the resolution should start.

slide-16
SLIDE 16

20

Name resolution (1)

The general organization of the UNIX file system implementation on a logical disk of contiguous disk blocks.

slide-17
SLIDE 17

21

Name resolution (2)

 iterative process, whereby a name is repeatedly presented to naming contexts  naming context either maps

 onto primitive attributes or  onto futher naming context

 alias and cycles?

 threshold number vs. strong administration

 2 steps: name  node identifer  directory table (or leaf entity content)

 e.g., inodedisk block/nameinode  e.g., IPname serverIP

 ... is provided by a name service

slide-18
SLIDE 18

22

Closure mechanism

 Selection of the initial node  necessarily partly implicit!

 Implicit sequence (UNIX indoes, superblock)  Implicit use of environment variables  Initial data in a pre-agreed file (config.txt)  Well-known port address  Well-known authority  Implicit method: Phone number  Implicit method: Multicast/Broadcast

slide-19
SLIDE 19

23

How to implement a name space?

 A name service maintains a list/table or “database” of bindings for names  It allows the database to be updated or queried  Extensions:

 More than one attribute?  Attribute-based lookup (discovery service)?  Scalability?  Fault-tolerance?  High-availability?  Access protection?

Attribute Name

slide-20
SLIDE 20

24

Name service interface

 resolve() or lookup():

 mapping from name to data about the entity, e.g. address in order to access it  from human-readable to machine-readable

 bind(), rebind() and unbind():

 association between name and entity  names are usually bound to attributes (property values) of the entity rather than the entity itself, e.g. address in order to access it  e.g. DNS: domain name  IP address of host

slide-21
SLIDE 21

26

Simple name service algorithm

 ARP: address resolution protocol IP address  Physical (MAC) address

 broadcast-based  maintains local caches  correctness criterion: unique names

 Scalability?

initially, each node only knows binding for itself

slide-22
SLIDE 22

27

Dealing with large name spaces

 Flat name space is not scalable (how many unique names are there?)  Broadcast as lookup is not scalable   hierarchical name spaces  Use naming contexts or domains to divide name space (e.g. www.infosys.tuwien.ac.at)  Structure supports management of name space according to organizational lines  Distributed name space management for scalability and availability

slide-23
SLIDE 23

29

Name Space Distribution (1)

An example partitioning of the DNS name space, including Internet-accessible files, into three layers.

slide-24
SLIDE 24

30

Name Space Distribution (2)

Sometimes Yes Yes Is client-side caching applied? None None or few Many Number of replicas (server-side) Immediate Immediate Lazy Update propagation Immediate Milliseconds Seconds Responsiveness to lookups Vast numbers Many Few Total number of nodes Department Organization Worldwide Geographical scale of network Managerial Administrational Global Item

A comparison between name servers for implementing nodes from a large-scale name space partitioned into a global layer, an administrational layer, and a managerial layer.

slide-25
SLIDE 25

31

Navigating distributed name services

 Large name systems are distributed  Each (client) node has a local name resolver  Each name server is responsible for a separate context of the name space  A client’s resolution request “navigates” through different name servers for full resolution

 client controlled: iterative or multicast  server controlled: iterative or multicast  server controlled: recursive

slide-26
SLIDE 26

32

Client-controlled navigation

Client 1 2 3

A client iteratively contacts name servers NS1–NS3 in order to resolve a name

NS2 NS1 NS3 Name servers

slide-27
SLIDE 27

33

Server-controlled navigation

1 2 3 5 1 2 3 4 4

A name server NS1 communicates with

  • ther name servers on behalf of a client

client client Recursive server-controlled NS2 NS1 NS3 NS2 NS1 NS3 Iterative server-controlled

slide-28
SLIDE 28

34

Iterative Name Resolution

ftp://ftp.cs.vu.nl/pub/globe/index.html

slide-29
SLIDE 29

35

Recursive Name Resolution

ftp://ftp.cs.vu.nl/pub/globe/index.html

slide-30
SLIDE 30

38

Effects of caching and replication

 CACHING

 Reduce time for name resolution on cache hit  Lower load on network  Increase availability of service: important requirement for name service

 Try dig and host utilities on UNIX  REPLICATION

 Remove “hot spots”: reduce accesses to high- level nodes  Reduce time for name resolution if accessing closer replica  Increase availability

slide-31
SLIDE 31

39

Cache consistency

 Cache consistency is relaxed or lazy  Client is expected to deal with stale data  Why not strict consistency (in large name service)?

 Updates take long, waiting for all sites to be updated  Lookups take long, waiting for data to stabilize

slide-32
SLIDE 32

40

Domain Name System (DNS)

 Defines a naming standard for the Internet: One of the largest distributed name services  Maps domain names to IP addresses  Lookup for mail servers  Uses caching and replication to achieve both performance and availability  Organized as rooted tree:

 Subtree: Domain  Path name: Domain name (absolute or relative)  Node contents: Resource records for zone  Root servers: http://www.root-servers.org/

slide-33
SLIDE 33

41

DNS name space

Name server names are in italics, and the corresponding domains are in (parentheses). Arrows denote name server entries

a.root-servers.net (root) ns0.ja.net (ac.uk) dns0.dcs.qmw.ac.uk (dcs.qmw.ac.uk) alpha.qmw.ac.uk (qmw.ac.uk) dns0-doc.ic.ac.uk (ic.ac.uk) ns.purdue.edu (purdue.edu) uk purdue.edu ic.ac.uk qmw.ac.uk dcs.qmw.ac.uk *.qmw.ac.uk *.ic.ac.uk *.dcs.qmw.ac.uk * .purdue.edu ns1.nic.uk (uk) ac.uk co.uk yahoo.com

slide-34
SLIDE 34

43

DNS Implementation (1)

An excerpt from the DNS database for the zone cs.vu.nl.

slide-35
SLIDE 35

44

DNS Implementation (2)

An excerpt from the DNS database for the zone cs.vu.nl.

slide-36
SLIDE 36

45

DNS Implementation (3)

 Primary name server  Secondary name server (zone transfer)  Caching-only server (non-authoritative)  Query: domain name + class (IN) + type

130.37.21.1 A solo.cs.vu.nl solo.cs.vu.nl NS cs.vu.nl Record value Record type Name

Part of the description for the vu.nl domain which contains the cs.vu.nl domain (“glue data”).

slide-37
SLIDE 37

46

Availability and performance

 Originally: all host names/addresses in a single, central master file, downloaded by FTP  Each client has address of more than one name server  Each name service has a primary and one or more secondary servers (and caching-only servers)  Each name server stores addresses of some root servers and authoritative server for parent domain  Clients cache previously resolved names  Top-level (e.g. root) servers are replicated  For performance, combine multiple requests and replies

slide-38
SLIDE 38

Lecture 5: Naming and Discovery

 Name, Address, Identifier  Name Space and Name Resolution (DNS)  Directory Services (X.500/LDAP)  Discovery Services  Distributed Garbage Collection

slide-39
SLIDE 39

48

Directory services

 Basic role: Attribute-based naming  Add/remove names to/from directory  Get names from directory according to property description pattern (e.g. wildcards)  “Yellow pages”  Assign access modes to names ( e.g. Read/write/execute)  Enforce access control  Useful component of many distributed applications (e.g. in chat or email)

slide-40
SLIDE 40

49

X.500 Directory Service

 An ambitious attempt to compile information about the world-wide information system  Not just names but information about

  • rganizations

 Support attribute-based retrieval  Each level responsible for maintaining

  • rganization of its lower levels
slide-41
SLIDE 41

51

The X.500 Name Space (1)

Part of the directory information tree.

slide-42
SLIDE 42

52

The X.500 Name Space (2)

130.37.21.11

  • WWW_Server

130.37.21.11

  • FTP_Server

130.37.24.6, 192.31.231,192.31.231.66

  • Mail_Servers

Main server CN CommonName

  • Math. & Comp. Sc.

OU OrganizationalUnit Vrije Universiteit O Organization Amsterdam L Locality NL C Country Value Abbr. Attribute

Result of: read /C=NL/O=Vrije Universiteit/OU=Math.&Comp. Sc./CN=Main server/

A simple example of a X.500 directory entry using X.500 naming conventions.

slide-43
SLIDE 43

53

The X.500 Name Space (3)

192.31.231.66 Host_Address 192.31.231.42 Host_Address zephyr Host_Name star Host_Name Main server CommonName Main server CommonName

  • Math. & Comp. Sc.

OrganizationalUnit

  • Math. & Comp. Sc.

OrganizationalUnit Vrije Universiteit Organization Vrije Universiteit Organization Amsterdam Locality Amsterdam Locality NL Country NL Country Value Attribute Value Attribute

Result of: list /C=NL/O=Vrije Universiteit/OU=Math.&Comp. Sc./CN=Main server/  star, zephyr (list returns names only)

Two directory entries having Host_Name as RDN

slide-44
SLIDE 44

58

Light-weight Directory Access Protocol

 Directory Access Protocol “DAP” proved to be too “heavy”  LDAP is a newer protocol with more efficient access to X.500 and simpler directories (RFC 2251)  directly on top of TCP (instead of OSI)  parameters passed as strings (instead of ASN.1)  Makes it possible to write “directory-enabled” applications (such as email)  defacto standard for Internet-based directory services (Windows 2000 ADS)

slide-45
SLIDE 45

59

JNDI (1)

 Java Naming and Directory Interface

 APIs to Access Name Services

 Supports access to

 COS (Common Object Services) Naming  DNS (Domain Name System)  LDAP (Lightweight Directory Access Protocol)  NIS (Network Information System) and NIS+

slide-46
SLIDE 46

60

JNDI (2)

Java-only solution

slide-47
SLIDE 47

Lecture 5: Naming and Discovery

 Name, Address, Identifier  Name Space and Name Resolution (DNS)  Directory Services (X.500/LDAP)  Discovery Services  Distributed Garbage Collection

slide-48
SLIDE 48

62

Discovery services

 When do we need a discovery service?

 In ad hoc or spontaneous networks a group of hosts decide to share resources and services (e.g. Jini, P2P) that change dynamically  Nodes appear and disappear often in a large-scale system (e.g., P2P)  Mobility has to be supported

 Principles

 Hosts must have the ability to announce and discover available resources and services  TTL is important in these networks: e.g., lease

slide-49
SLIDE 49

63

Service discovery in Jini

Printing service service Lookup service Lookup Printing service admin admin finance finance Client Client Corporate infoservice

  • 1. ‘finance’

lookup service?

  • 2. Here I am: .....
  • 3. Request

printing

  • 4. Use printing

service Network

JINI: „Jini Is Not Initials“

slide-50
SLIDE 50

64

Effect of mobility

 Hosts on a network are no longer permanently fixed in one (topological) location  Laptops, personal digital assistants, and their contents move from one location in the network and join in other points  In general, objects and resources are not stationary  How can name resolution work?  Note: IP addresses refer to fixed locations within the network topology

slide-51
SLIDE 51

65

Naming vs. Locating Entities (1)

 Traditional naming systems are not well suited for supporting name-to-address mappings that change often:

 global and administrational layer are assumed to be stable  caching, replication  updates are usually restricted to a single name server

 Two immediate solutions

 change address record (non-local lookup/update)  add symbolic link (chain of links)

slide-52
SLIDE 52

66

Naming vs. Locating Entities (2)

a) Direct, single level mapping between names and addresses. b) Two-level mapping (e.g., using identities). Location service maps identifier to address.

slide-53
SLIDE 53

67

Simple Solutions

 Broadcasting

 e.g. ARP  Bandwidth?  # of hosts interrupted?

 Multicasting: restricted group of hosts

 multicast address as general location service  locate the nearest replica

 Forwarding Pointers

 moving entity leaves reference behind  simple, BUT: long chain, many intermediate locations, vulnerability to broken link, ...   keep chains relatively short

slide-54
SLIDE 54

68

Forwarding Pointers (1)

Proxies can be passed as parameters: P1 P2 (p‘)

The principle of forwarding pointers using (proxy, skeleton) pairs. Migration is completely transparent, but this is no address lookup, rather is the client’s request forwarded along the chain to the actual object.

slide-55
SLIDE 55

69

Forwarding Pointers (2)

 Goal: Keep chains relatively short!  Redirecting a forwarding pointer, by storing a shortcut in a proxy.  Response directly or along the reverse path?  Skeleton that is no longer referred to can be removed  Problems, when pair crashes or becomes unreachable  home location (where the object was created)

slide-56
SLIDE 56

70

Home-Based Approaches (1)

 Broadcasting and forwarding pointers impose scalability problems  Home location keeps track of current location (highly dependable)

 e.g., fallback with forwarding pointers

 Mobile IP:

 Home agent at fixed address  mobile host registers temporary care-of-address with the home agent  Packets are tunneled, sender is informed

slide-57
SLIDE 57

71

Home-Based Approaches (2)

The principle of Mobile IP.

slide-58
SLIDE 58

72

Home-Based Approaches (3)

 Mobile telephony (GSM) – two-tiered scheme:

 first, check local Visitor Location Register (VLR)  then, contact Home Location Register (HLR) to find current location

 Drawbacks:

 Communication latency  Fixed home location: availability; permanent migration  register home location with traditional naming service and let client first lookup the home (relatively stable, can be effectively cached)

slide-59
SLIDE 59

73

Hierarchical Approaches

 Hierarchical organization of a location service into domains, sub domains, and leaf domains, each having an associated directory node.  Entity is represented by location record: In the directory node of a leaf domain it contains an address, else it contains a pointer to the respective sub-domain

slide-60
SLIDE 60

Lecture 5: Naming and Discovery

 Name, Address, Identifier  Name Space and Name Resolution (DNS)  Directory Services (X.500/LDAP)  Discovery Services  Distributed Garbage Collection

slide-61
SLIDE 61

82

Unreferenced Entities

 Naming and location services provide a global referencing service  An entity that can not be accessed should (often) be removed  In many systems, entities are removed explicitly only  It is often unknown, whether there is (still) a reference to an entity  Distributed garbage collection for remote

  • bjects performed by skeletons and proxies

(and thus hidden from clients and objects)

slide-62
SLIDE 62

83

The Problem of Unreferenced Objects

An example of a graph representing objects containing references to each other.

slide-63
SLIDE 63

84

Reference Counting (1)

 The problem of maintaining a proper reference count in the presence of unreliable communication.  It is essential to detect duplicate messages.

... is popular in uniprocessor systems, but leads to a number of problems in distributed systems

slide-64
SLIDE 64

85

Reference Counting (2)

a) Copying a reference to another process and incrementing the counter too late (race condition) b) A solution, but:

  • reliable communication required
  • Three messages for passing one reference

now P2 is allowed to remove the reference

slide-65
SLIDE 65

90

Reference Listing (Java RMI)

 Skeleton keeps track of the proxies  Adding and removing are idempotent operation (increment/decrement are not!)  no race condition  communication need not be reliable  Reference created, id sent to skeleton, proxy created after acknowledgement  Reference passed P1P2, P2 passes its id to the skeleton, ACK, then proxy at P2  Race condictions: temporary entry while (before) remote reference is transmitted  Keep alive of reference list for increased FT  Scales badly  use lease

slide-66
SLIDE 66

94

Summary

 Names are organized in name spaces; implemented in hierarchies and layers  A naming service provides the mapping (resolution): name  attribute (typically address)  Consistency of distributed name service depends on update algorithms used  Caching and replication increase performance/availability  Directory service provides a way to structure a name space according to attributes  Discovery service supports ad hoc networks, dynamics, and large-scale  Mobility is supported by location services  Distributed garbage collection is challenging