Data Logistics in Network Computing Martin Swany Introduction and - - PowerPoint PPT Presentation
Data Logistics in Network Computing Martin Swany Introduction and - - PowerPoint PPT Presentation
Data Logistics in Network Computing Martin Swany Introduction and Motivation The goal of Computational Grids is to mimic the electric power grid for computing power Service-orientation to make computing power a utility Compute
Introduction and Motivation
- The goal of Computational Grids is to
mimic the electric power grid for computing power
- Service-orientation to make computing power a
utility
- Compute cycles aren’t fungible
- Data location and movement overhead is
critical
- In the case of Data Grids, data movement
is the key problem
- Managing the location of data is critical
Data Logistics
- The definition of Logistics
“…the process of planning, implementing, and controlling the efficient, effective flow and storage of goods, services and related information from point of origin to point of consumption.”
- Shipping and distribution enterprises
make use of storage (and transformation) when moving material
- Optimizing the flow, storage, and access
- f data is necessary to make distributed
and Grid environments a viable reality
The Logistical Session Layer
- LSL allows systems to exploit “logistics”
in stream-oriented communication
- LSL service points (depots) provide short-
term logistical storage and cooperative data forwarding
- The primary focus is improved throughput
for reliable data streams
- Both unicast and multicast
- A wide range of new functionality is
possible
The Logistical Session Layer
Session Layer
- A session is the end-to-end composition of
segment-specific transports and signaling
- More responsive control loop via reduction of
signaling latency
- Adapt to local conditions with greater specificity
- Buffering in the network means retransmissions need
not come from the source
Physical Data Link Network Transport
Session
Physical Data Link Network Transport
Session
Physical Data Link Network Transport User Space
Initial Deployment
LSL Performance Improvement
TCP Overview
- TCP provides reliable transmission of byte streams over best-effort
packet networks
- Sequence number to identify stream position inside segments
- Segments are buffered until acknowledged
- Congestion (sender) and flow control (receiver) “windows”
- Everyone obeys the same rules to promote stability, fairness, and
friendliness
- Congestion-control loop uses ACKs to clock segment transmission
- Round Trip Time (RTT) critical to responsiveness
- Conservative congestion windows
- Start with window O(1) and grow exponentially then linearly
- Additive increase, multiplicative decrease (AIMD) congestion window
based on loss inference
- “Sawtooth” steady-state
- Problems with high bandwidth
delay product networks
Synchronous Multicast with LSL
- Each node sends the data and a recursively
encoded control subtree to its children
- The LSL connections exist simultaneously
- Synchronous distribution
- Reliability is provided by TCP
- Distribution is
logically half-duplex so the “upstream” channel can be used for negotiation and feedback
Build a Distribution Tree
Connections close once data is received
Distribution Experiment
- 52 nodes in 4 clusters
- UIUC, UTK, UCSD
- Distributions originate from a single host
- Average times over 10 identical
distributions
- Without checksum
- Control case is a “flat” tree within the
same infrastructure
Distribution Time
Bandwidth Delivered
Internet Backplane Protocol
- LSL is closely related to IBP
- Depots are similar in spirit but don’t yet share
an implementation
- J. Plank, A. Bassi, M. Beck, T. Moore, M. Swany, R. Wolski, The Internet
Backplane Protocol: Storage in the Network, IEEE Internet Computing,
September/October 2001.
Exposed network buffers
LSL Implementation
- The LSL client library provides
compatibility with current socket applications
- Although more functionality is available using
the API directly
- LD_PRELOAD for function override
- socket(), bind(), connect(),
setsockopt()…
- Allows Un*x binaries to use LSL without
recompilation
- Daemon runs on all Un*x platform
- Forwarding is better on Linux than on BSD
LSL Summary
- Logistical data overlays can significantly
improve performance for data movement
- Demonstrated speedup
- Think of a session as the composition of
network-specific transport layers
- There are many cases in which a single
transport protocol from end to end might not be the best choice
- Network heterogeneity
- Wireless
- Optical (with time-division multiplexing)
- Potential to become a new model rather than
short-term solution for TCP’s problems
The End to End Arguments
- Why aren’t techniques like this already in use?
- Recall the end-to-end arguments
- E2E Integrity
- Network elements can’t be trusted
- Duplication of function is inefficient
- Fate sharing
- State in the network related to a user
- Scalability
- Network transparency
- Network opacity
- The assumptions regarding scalability and
complexity may not hold true any longer
Cascaded TCP Dynamics
- Recall TCP’s sequence number and ACKs
- We can observe the progress of a TCP connection
by plotting the sequence number acknowledged by the receiver
- For this experiment, we captured packet-level
traces of both LSL and end-to-end connections
- 10 traces for each path and subpath were
gathered
- We compute the average growth of the sequence
number with respect to time
- The following graphs depict average progress
- f a set of transfers
UCSB->Denver->UIUC (64M)
UCSB->Houston->UFL (64M)
Cost of Path Traversal
- With pipelined use of LSL depots, there is
some startup overhead, after which the time to transfer is dominated by the narrow link
- Model as a graph, treating edge cost as time to
transfer some amount of data
- 1 / achievable bandwidth
- The cost of a path is that of the maximum-
valued link in the path from source to sink – max(ci,j) for edge i,j in the path
- Or, the achievable bandwidth on a path is
constrained by the link with the smallest bandwidth
- Optimization for this condition is minimax
- minimize the maximum value
Routing Connections
- Goal: Find the best path through the
network for a given source and sink
- Approach: Build a tree of best paths from a
single source to all nodes with a greedy algorithm similar to Shortest Path
- By walking the tree of Minimax paths (MMP)
we can extract the best path from the source node to each of the other nodes
- from source to a given destination to build a
complete source route
- produce a table of destination/next-hop pairs for
depot routing tables
- O(m log n) operation for each m
A Tree of Minimax Paths
- Bandwidth measurements vary slightly from
moment to moment
- Connections are bound by the same wide-area
connection
Edge Equivalence
Edge Equivalence Threshold -
- Modified algorithm considers edges within of one
another to have the same cost
Network Prediction/Forecasting
- Predicting network performance is difficult,
especially over links with high bandwidth-delay product
- Predictions are best generated from a history of
identical measurements
- Frequent probes cannot be intrusive
- How do we predict large transfers?
- Instrumentation data is inexpensive
- Approach: combine instrumentation data with
current lightweight probes to improve application- specific forecasts
What can short probes tell us?
HTTP 16MB Transfers ANL -> UCSB
1 2 3 4 5 6 7 8 9
Time (10 days) BW (mb/s)
NWS 64K Network Probes ANL -> UCSB
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Time (10 days) BW (mb/s)
Multivariate Forecasting
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.5 1 1.5
Bandw idth in Mbit/ sec ECDF Probability
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8
Bandw idth Mbit/ sec ECDF Probability
quantile = CDFX valueX
( )
predictionY = CDFY
1 quantile
( )
( )
) Pr( x X x CDF
- =
ECDF x
( ) = count x
total
Experimental Configuration
- Collect 64KB bandwidth measurements
every 10 seconds
- Time 16MB HTTP transfers every 60
seconds
- Use the wget utility to get a file from the
filesystem
- Heavily used, general purpose systems
including a Solaris system at Argonne Nat’l Lab and a Linux machine at UCSB
- Forecasting error as a measure of
efficacy
- Difference in forecast and measured value
Comparison of Forecasting
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 10 15 30 60 180 300 450
Time (minutes between HTTP xfers)
Mean Absolute Error (mb/ s)
Univariate Forecaster Multivariate Forecaster
Comparison of Forecasting 2
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1 2 3 4 5 10 15 30 60 180 300 450
Time (minutes between HTTP xfers)
Root Mean Square Error
Univariate Forecaster Multivariate Forecaster
Last Value Prediction
0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 5 10 15 30 60 180 300 450
Time (minutes between HTTP xfers)
Normalized Mean Absolute Error
Last Value Prediction Multivariate Forecaster
Network Performance
- To make network scheduling/routing choices,
we need feedback from the network
- Obvious, but this remains an open challenge
- Global Grid Forum Network Measurements
Working Group
- The latest instantiation of performance monitoring
efforts
- Also NMA-RG
- Current JRA1 / Internet2 effort: SONAR
- Service Oriented Network monitoring ARchitecture
GGF NMWG
- First step in the NMWG was to define the
“Characteristics” hierarchy
- B. Lowekamp, B. Tierney, Les Cottrell, R. Hughes-Jones, T. Kielmann,
- M. Swany, Enabling Network Measurement Portability Through a
Hierarchy of Characteristics, 4th International Workshop on Grid Computing (Grid2003), November, 2003.
- Current work is focused on providing
standard schemata for representing and exchanging performance information
- Version 2 is in progress
- Enables complete extensibility
New Work
- GGF: Network Measurements for
Applications, Research Group (NMA-RG)
- LSL-NP
- Implementation of LSL on the Intel IXP
platform
- Phoebus
- Session Layer for lambda-switched Optical
Networks
- Mercury
- Malleable environment for protocols and
services
UltraNet End-to-End Session
General Network Programming
- Exposure of network, processing and storage
elements allows for optimization
- Sensor stream downsampling
- The question: How can we construct scalable
network programming systems that adhere to the end to end model?
- While providing the process optimization that we
need.
- Consider many possible optimization loci in the
network, but all are “best effort”
- Even premium service just increases the likelihood
that work will be prioritized. The underlying elements are still “best effort.”
Mercury
- Network processor environment
- Dynamic protocol/process assembly
- Dynamic network virtualization
- Control must come from the edge and be
tied to a user
- Minimization of control traffic
- As everything speeds up, the speed of light
becomes more of an issue
- Hierarchies of control