SLIDE 1 Verteilte Systeme (Distributed Systems)
Karl M. Göschka Karl.Goeschka@tuwien.ac.at
http://www.infosys.tuwien.ac.at/teaching/courses/ VerteilteSysteme/
SLIDE 2
Lecture 5: Naming and Discovery
Name, Address, Identifier Name Space and Name Resolution (DNS) Directory Services (X.500/LDAP) Discovery Services Distributed Garbage Collection
SLIDE 3 3
Naming and discovery
Identify and locate resources for communication and resource sharing Location transparency Scalability and performance DNS, NDS, ADS, X.500, LDAP, JNDI, JINI, UDDI, ... Dealing with mobile entities Removal of unreferenced entities
SLIDE 4 4
Types of names
Names: string to refer to an entity
often human readable Entities
can be operated on (via access points) have attributes
Address: name (location) of an access point
for short: address of an entity address as name of an entity?
Identifier (often machine-readable)
refers to at most one entity each entity has only one identifier always refers to the same entity (never re-used)
SLIDE 5 6
Composed naming domains
http://www.cdk3.net:8888/WebExamples/earth.html URL Resource ID (IP number, port number, pathname) Network address 2:60:8c:2:b0:5a file Web server 55.55.55.55 WebExamples/earth.html 8888 DNS lookup Socket
SLIDE 6 7
Properties of names
Location-independence (transparency)
Name independent from address Do not encode address in name IP addresses are not location-independent (are they?)
Uniqueness (how to achieve?)
Static assignment of ranges of names (convention) Use compound names Use contexts or domains Use an id generation algorithm (sequence generator) Flat or hierarchical name spaces
Simple and composite names; aliases Pure names (no entity information at all) Wildcards? Paths?
SLIDE 7 8
How to implement identifiers
Keep a counter
What if machine fails? Keep counter in stable storage
Use a random-number generator
high probability but not guaranteed keep list of previously assigned names
Concurrency control
sequence generator
Distribution:
use creating node
SLIDE 8
Lecture 5: Naming and Discovery
Name, Address, Identifier Name Space and Name Resolution (DNS) Directory Services (X.500/LDAP) Discovery Services Distributed Garbage Collection
SLIDE 9 10
Name Spaces (1)
organizes names in a labeled, directed graph name always relative to a directory node leaf node:
named entity (attributes or state)
directory node:
labeled edges node identifiers (directory table)
root node path: sequence of labels
absolute vs. relative (with respect to root node) global vs. local (relative to place of usage)
SLIDE 10 11
Name Spaces (2)
A general naming graph: only a single root? acyclic? strictly hierarchical (tree, no “links”)?
SLIDE 11 12
Linking and Mounting
Alias
multiple absolute paths („hard link“) leaf node stores absolute path name („symbolic link“)
Mounting
merge different name spaces mount point: directory that stores identifier of directory node from foreign name space To mount we need: access protocol, server, mounting point (each of these needs to be resolved!)
SLIDE 12 13
Linking: Hard Link
The concept of a hard link explained in a naming graph.
SLIDE 13 14
Linking: Symbolic Link
The concept of a symbolic link explained in a naming graph.
SLIDE 14 15
Mounting
/remote/vu/mbox
Mounting remote name spaces through a specific protocol.
SLIDE 15 16
Merging trees (e.g., GNS)
home
Organization of the DEC Global Name Service: Names always (implicitly) include the identifier of the node from where the resolution should start.
SLIDE 16 20
Name resolution (1)
The general organization of the UNIX file system implementation on a logical disk of contiguous disk blocks.
SLIDE 17 21
Name resolution (2)
iterative process, whereby a name is repeatedly presented to naming contexts naming context either maps
onto primitive attributes or onto futher naming context
alias and cycles?
threshold number vs. strong administration
2 steps: name node identifer directory table (or leaf entity content)
e.g., inodedisk block/nameinode e.g., IPname serverIP
... is provided by a name service
SLIDE 18 22
Closure mechanism
Selection of the initial node necessarily partly implicit!
Implicit sequence (UNIX indoes, superblock) Implicit use of environment variables Initial data in a pre-agreed file (config.txt) Well-known port address Well-known authority Implicit method: Phone number Implicit method: Multicast/Broadcast
SLIDE 19 23
How to implement a name space?
A name service maintains a list/table or “database” of bindings for names It allows the database to be updated or queried Extensions:
More than one attribute? Attribute-based lookup (discovery service)? Scalability? Fault-tolerance? High-availability? Access protection?
Attribute Name
SLIDE 20 24
Name service interface
resolve() or lookup():
mapping from name to data about the entity, e.g. address in order to access it from human-readable to machine-readable
bind(), rebind() and unbind():
association between name and entity names are usually bound to attributes (property values) of the entity rather than the entity itself, e.g. address in order to access it e.g. DNS: domain name IP address of host
SLIDE 21 26
Simple name service algorithm
ARP: address resolution protocol IP address Physical (MAC) address
broadcast-based maintains local caches correctness criterion: unique names
Scalability?
initially, each node only knows binding for itself
SLIDE 22 27
Dealing with large name spaces
Flat name space is not scalable (how many unique names are there?) Broadcast as lookup is not scalable hierarchical name spaces Use naming contexts or domains to divide name space (e.g. www.infosys.tuwien.ac.at) Structure supports management of name space according to organizational lines Distributed name space management for scalability and availability
SLIDE 23 29
Name Space Distribution (1)
An example partitioning of the DNS name space, including Internet-accessible files, into three layers.
SLIDE 24 30
Name Space Distribution (2)
Sometimes Yes Yes Is client-side caching applied? None None or few Many Number of replicas (server-side) Immediate Immediate Lazy Update propagation Immediate Milliseconds Seconds Responsiveness to lookups Vast numbers Many Few Total number of nodes Department Organization Worldwide Geographical scale of network Managerial Administrational Global Item
A comparison between name servers for implementing nodes from a large-scale name space partitioned into a global layer, an administrational layer, and a managerial layer.
SLIDE 25 31
Navigating distributed name services
Large name systems are distributed Each (client) node has a local name resolver Each name server is responsible for a separate context of the name space A client’s resolution request “navigates” through different name servers for full resolution
client controlled: iterative or multicast server controlled: iterative or multicast server controlled: recursive
SLIDE 26 32
Client-controlled navigation
Client 1 2 3
A client iteratively contacts name servers NS1–NS3 in order to resolve a name
NS2 NS1 NS3 Name servers
SLIDE 27 33
Server-controlled navigation
1 2 3 5 1 2 3 4 4
A name server NS1 communicates with
- ther name servers on behalf of a client
client client Recursive server-controlled NS2 NS1 NS3 NS2 NS1 NS3 Iterative server-controlled
SLIDE 28 34
Iterative Name Resolution
ftp://ftp.cs.vu.nl/pub/globe/index.html
SLIDE 29 35
Recursive Name Resolution
ftp://ftp.cs.vu.nl/pub/globe/index.html
SLIDE 30 38
Effects of caching and replication
CACHING
Reduce time for name resolution on cache hit Lower load on network Increase availability of service: important requirement for name service
Try dig and host utilities on UNIX REPLICATION
Remove “hot spots”: reduce accesses to high- level nodes Reduce time for name resolution if accessing closer replica Increase availability
SLIDE 31 39
Cache consistency
Cache consistency is relaxed or lazy Client is expected to deal with stale data Why not strict consistency (in large name service)?
Updates take long, waiting for all sites to be updated Lookups take long, waiting for data to stabilize
SLIDE 32 40
Domain Name System (DNS)
Defines a naming standard for the Internet: One of the largest distributed name services Maps domain names to IP addresses Lookup for mail servers Uses caching and replication to achieve both performance and availability Organized as rooted tree:
Subtree: Domain Path name: Domain name (absolute or relative) Node contents: Resource records for zone Root servers: http://www.root-servers.org/
SLIDE 33 41
DNS name space
Name server names are in italics, and the corresponding domains are in (parentheses). Arrows denote name server entries
a.root-servers.net (root) ns0.ja.net (ac.uk) dns0.dcs.qmw.ac.uk (dcs.qmw.ac.uk) alpha.qmw.ac.uk (qmw.ac.uk) dns0-doc.ic.ac.uk (ic.ac.uk) ns.purdue.edu (purdue.edu) uk purdue.edu ic.ac.uk qmw.ac.uk dcs.qmw.ac.uk *.qmw.ac.uk *.ic.ac.uk *.dcs.qmw.ac.uk * .purdue.edu ns1.nic.uk (uk) ac.uk co.uk yahoo.com
SLIDE 34 43
DNS Implementation (1)
An excerpt from the DNS database for the zone cs.vu.nl.
SLIDE 35 44
DNS Implementation (2)
An excerpt from the DNS database for the zone cs.vu.nl.
SLIDE 36 45
DNS Implementation (3)
Primary name server Secondary name server (zone transfer) Caching-only server (non-authoritative) Query: domain name + class (IN) + type
130.37.21.1 A solo.cs.vu.nl solo.cs.vu.nl NS cs.vu.nl Record value Record type Name
Part of the description for the vu.nl domain which contains the cs.vu.nl domain (“glue data”).
SLIDE 37 46
Availability and performance
Originally: all host names/addresses in a single, central master file, downloaded by FTP Each client has address of more than one name server Each name service has a primary and one or more secondary servers (and caching-only servers) Each name server stores addresses of some root servers and authoritative server for parent domain Clients cache previously resolved names Top-level (e.g. root) servers are replicated For performance, combine multiple requests and replies
SLIDE 38
Lecture 5: Naming and Discovery
Name, Address, Identifier Name Space and Name Resolution (DNS) Directory Services (X.500/LDAP) Discovery Services Distributed Garbage Collection
SLIDE 39 48
Directory services
Basic role: Attribute-based naming Add/remove names to/from directory Get names from directory according to property description pattern (e.g. wildcards) “Yellow pages” Assign access modes to names ( e.g. Read/write/execute) Enforce access control Useful component of many distributed applications (e.g. in chat or email)
SLIDE 40 49
X.500 Directory Service
An ambitious attempt to compile information about the world-wide information system Not just names but information about
Support attribute-based retrieval Each level responsible for maintaining
- rganization of its lower levels
SLIDE 41 51
The X.500 Name Space (1)
Part of the directory information tree.
SLIDE 42 52
The X.500 Name Space (2)
130.37.21.11
130.37.21.11
130.37.24.6, 192.31.231,192.31.231.66
Main server CN CommonName
OU OrganizationalUnit Vrije Universiteit O Organization Amsterdam L Locality NL C Country Value Abbr. Attribute
Result of: read /C=NL/O=Vrije Universiteit/OU=Math.&Comp. Sc./CN=Main server/
A simple example of a X.500 directory entry using X.500 naming conventions.
SLIDE 43 53
The X.500 Name Space (3)
192.31.231.66 Host_Address 192.31.231.42 Host_Address zephyr Host_Name star Host_Name Main server CommonName Main server CommonName
OrganizationalUnit
OrganizationalUnit Vrije Universiteit Organization Vrije Universiteit Organization Amsterdam Locality Amsterdam Locality NL Country NL Country Value Attribute Value Attribute
Result of: list /C=NL/O=Vrije Universiteit/OU=Math.&Comp. Sc./CN=Main server/ star, zephyr (list returns names only)
Two directory entries having Host_Name as RDN
SLIDE 44 58
Light-weight Directory Access Protocol
Directory Access Protocol “DAP” proved to be too “heavy” LDAP is a newer protocol with more efficient access to X.500 and simpler directories (RFC 2251) directly on top of TCP (instead of OSI) parameters passed as strings (instead of ASN.1) Makes it possible to write “directory-enabled” applications (such as email) defacto standard for Internet-based directory services (Windows 2000 ADS)
SLIDE 45 59
JNDI (1)
Java Naming and Directory Interface
APIs to Access Name Services
Supports access to
COS (Common Object Services) Naming DNS (Domain Name System) LDAP (Lightweight Directory Access Protocol) NIS (Network Information System) and NIS+
SLIDE 46 60
JNDI (2)
Java-only solution
SLIDE 47
Lecture 5: Naming and Discovery
Name, Address, Identifier Name Space and Name Resolution (DNS) Directory Services (X.500/LDAP) Discovery Services Distributed Garbage Collection
SLIDE 48 62
Discovery services
When do we need a discovery service?
In ad hoc or spontaneous networks a group of hosts decide to share resources and services (e.g. Jini, P2P) that change dynamically Nodes appear and disappear often in a large-scale system (e.g., P2P) Mobility has to be supported
Principles
Hosts must have the ability to announce and discover available resources and services TTL is important in these networks: e.g., lease
SLIDE 49 63
Service discovery in Jini
Printing service service Lookup service Lookup Printing service admin admin finance finance Client Client Corporate infoservice
lookup service?
- 2. Here I am: .....
- 3. Request
printing
service Network
JINI: „Jini Is Not Initials“
SLIDE 50 64
Effect of mobility
Hosts on a network are no longer permanently fixed in one (topological) location Laptops, personal digital assistants, and their contents move from one location in the network and join in other points In general, objects and resources are not stationary How can name resolution work? Note: IP addresses refer to fixed locations within the network topology
SLIDE 51 65
Naming vs. Locating Entities (1)
Traditional naming systems are not well suited for supporting name-to-address mappings that change often:
global and administrational layer are assumed to be stable caching, replication updates are usually restricted to a single name server
Two immediate solutions
change address record (non-local lookup/update) add symbolic link (chain of links)
SLIDE 52 66
Naming vs. Locating Entities (2)
a) Direct, single level mapping between names and addresses. b) Two-level mapping (e.g., using identities). Location service maps identifier to address.
SLIDE 53 67
Simple Solutions
Broadcasting
e.g. ARP Bandwidth? # of hosts interrupted?
Multicasting: restricted group of hosts
multicast address as general location service locate the nearest replica
Forwarding Pointers
moving entity leaves reference behind simple, BUT: long chain, many intermediate locations, vulnerability to broken link, ... keep chains relatively short
SLIDE 54 68
Forwarding Pointers (1)
Proxies can be passed as parameters: P1 P2 (p‘)
The principle of forwarding pointers using (proxy, skeleton) pairs. Migration is completely transparent, but this is no address lookup, rather is the client’s request forwarded along the chain to the actual object.
SLIDE 55 69
Forwarding Pointers (2)
Goal: Keep chains relatively short! Redirecting a forwarding pointer, by storing a shortcut in a proxy. Response directly or along the reverse path? Skeleton that is no longer referred to can be removed Problems, when pair crashes or becomes unreachable home location (where the object was created)
SLIDE 56 70
Home-Based Approaches (1)
Broadcasting and forwarding pointers impose scalability problems Home location keeps track of current location (highly dependable)
e.g., fallback with forwarding pointers
Mobile IP:
Home agent at fixed address mobile host registers temporary care-of-address with the home agent Packets are tunneled, sender is informed
SLIDE 57 71
Home-Based Approaches (2)
The principle of Mobile IP.
SLIDE 58 72
Home-Based Approaches (3)
Mobile telephony (GSM) – two-tiered scheme:
first, check local Visitor Location Register (VLR) then, contact Home Location Register (HLR) to find current location
Drawbacks:
Communication latency Fixed home location: availability; permanent migration register home location with traditional naming service and let client first lookup the home (relatively stable, can be effectively cached)
SLIDE 59 73
Hierarchical Approaches
Hierarchical organization of a location service into domains, sub domains, and leaf domains, each having an associated directory node. Entity is represented by location record: In the directory node of a leaf domain it contains an address, else it contains a pointer to the respective sub-domain
SLIDE 60
Lecture 5: Naming and Discovery
Name, Address, Identifier Name Space and Name Resolution (DNS) Directory Services (X.500/LDAP) Discovery Services Distributed Garbage Collection
SLIDE 61 82
Unreferenced Entities
Naming and location services provide a global referencing service An entity that can not be accessed should (often) be removed In many systems, entities are removed explicitly only It is often unknown, whether there is (still) a reference to an entity Distributed garbage collection for remote
- bjects performed by skeletons and proxies
(and thus hidden from clients and objects)
SLIDE 62 83
The Problem of Unreferenced Objects
An example of a graph representing objects containing references to each other.
SLIDE 63 84
Reference Counting (1)
The problem of maintaining a proper reference count in the presence of unreliable communication. It is essential to detect duplicate messages.
... is popular in uniprocessor systems, but leads to a number of problems in distributed systems
SLIDE 64 85
Reference Counting (2)
a) Copying a reference to another process and incrementing the counter too late (race condition) b) A solution, but:
- reliable communication required
- Three messages for passing one reference
now P2 is allowed to remove the reference
SLIDE 65 90
Reference Listing (Java RMI)
Skeleton keeps track of the proxies Adding and removing are idempotent operation (increment/decrement are not!) no race condition communication need not be reliable Reference created, id sent to skeleton, proxy created after acknowledgement Reference passed P1P2, P2 passes its id to the skeleton, ACK, then proxy at P2 Race condictions: temporary entry while (before) remote reference is transmitted Keep alive of reference list for increased FT Scales badly use lease
SLIDE 66 94
Summary
Names are organized in name spaces; implemented in hierarchies and layers A naming service provides the mapping (resolution): name attribute (typically address) Consistency of distributed name service depends on update algorithms used Caching and replication increase performance/availability Directory service provides a way to structure a name space according to attributes Discovery service supports ad hoc networks, dynamics, and large-scale Mobility is supported by location services Distributed garbage collection is challenging