Data Logistics in Network Computing Martin Swany Introduction and - - PowerPoint PPT Presentation

data logistics in network computing
SMART_READER_LITE
LIVE PREVIEW

Data Logistics in Network Computing Martin Swany Introduction and - - PowerPoint PPT Presentation

Data Logistics in Network Computing Martin Swany Introduction and Motivation The goal of Computational Grids is to mimic the electric power grid for computing power Service-orientation to make computing power a utility Compute


slide-1
SLIDE 1

Data Logistics in Network Computing

Martin Swany

slide-2
SLIDE 2

Introduction and Motivation

  • The goal of Computational Grids is to

mimic the electric power grid for computing power

  • Service-orientation to make computing power a

utility

  • Compute cycles aren’t fungible
  • Data location and movement overhead is

critical

  • In the case of Data Grids, data movement

is the key problem

  • Managing the location of data is critical
slide-3
SLIDE 3

Data Logistics

  • The definition of Logistics

“…the process of planning, implementing, and controlling the efficient, effective flow and storage of goods, services and related information from point of origin to point of consumption.”

  • Shipping and distribution enterprises

make use of storage (and transformation) when moving material

  • Optimizing the flow, storage, and access
  • f data is necessary to make distributed

and Grid environments a viable reality

slide-4
SLIDE 4

The Logistical Session Layer

  • LSL allows systems to exploit “logistics”

in stream-oriented communication

  • LSL service points (depots) provide short-

term logistical storage and cooperative data forwarding

  • The primary focus is improved throughput

for reliable data streams

  • Both unicast and multicast
  • A wide range of new functionality is

possible

slide-5
SLIDE 5

The Logistical Session Layer

slide-6
SLIDE 6

Session Layer

  • A session is the end-to-end composition of

segment-specific transports and signaling

  • More responsive control loop via reduction of

signaling latency

  • Adapt to local conditions with greater specificity
  • Buffering in the network means retransmissions need

not come from the source

Physical Data Link Network Transport

Session

Physical Data Link Network Transport

Session

Physical Data Link Network Transport User Space

slide-7
SLIDE 7

Initial Deployment

slide-8
SLIDE 8

LSL Performance Improvement

slide-9
SLIDE 9

TCP Overview

  • TCP provides reliable transmission of byte streams over best-effort

packet networks

  • Sequence number to identify stream position inside segments
  • Segments are buffered until acknowledged
  • Congestion (sender) and flow control (receiver) “windows”
  • Everyone obeys the same rules to promote stability, fairness, and

friendliness

  • Congestion-control loop uses ACKs to clock segment transmission
  • Round Trip Time (RTT) critical to responsiveness
  • Conservative congestion windows
  • Start with window O(1) and grow exponentially then linearly
  • Additive increase, multiplicative decrease (AIMD) congestion window

based on loss inference

  • “Sawtooth” steady-state
  • Problems with high bandwidth

delay product networks

slide-10
SLIDE 10

Synchronous Multicast with LSL

  • Each node sends the data and a recursively

encoded control subtree to its children

  • The LSL connections exist simultaneously
  • Synchronous distribution
  • Reliability is provided by TCP
  • Distribution is

logically half-duplex so the “upstream” channel can be used for negotiation and feedback

slide-11
SLIDE 11

Build a Distribution Tree

slide-12
SLIDE 12

Connections close once data is received

slide-13
SLIDE 13

Distribution Experiment

  • 52 nodes in 4 clusters
  • UIUC, UTK, UCSD
  • Distributions originate from a single host
  • Average times over 10 identical

distributions

  • Without checksum
  • Control case is a “flat” tree within the

same infrastructure

slide-14
SLIDE 14

Distribution Time

slide-15
SLIDE 15

Bandwidth Delivered

slide-16
SLIDE 16

Internet Backplane Protocol

  • LSL is closely related to IBP
  • Depots are similar in spirit but don’t yet share

an implementation

  • J. Plank, A. Bassi, M. Beck, T. Moore, M. Swany, R. Wolski, The Internet

Backplane Protocol: Storage in the Network, IEEE Internet Computing,

September/October 2001.

Exposed network buffers

slide-17
SLIDE 17

LSL Implementation

  • The LSL client library provides

compatibility with current socket applications

  • Although more functionality is available using

the API directly

  • LD_PRELOAD for function override
  • socket(), bind(), connect(),

setsockopt()…

  • Allows Un*x binaries to use LSL without

recompilation

  • Daemon runs on all Un*x platform
  • Forwarding is better on Linux than on BSD
slide-18
SLIDE 18

LSL Summary

  • Logistical data overlays can significantly

improve performance for data movement

  • Demonstrated speedup
  • Think of a session as the composition of

network-specific transport layers

  • There are many cases in which a single

transport protocol from end to end might not be the best choice

  • Network heterogeneity
  • Wireless
  • Optical (with time-division multiplexing)
  • Potential to become a new model rather than

short-term solution for TCP’s problems

slide-19
SLIDE 19

The End to End Arguments

  • Why aren’t techniques like this already in use?
  • Recall the end-to-end arguments
  • E2E Integrity
  • Network elements can’t be trusted
  • Duplication of function is inefficient
  • Fate sharing
  • State in the network related to a user
  • Scalability
  • Network transparency
  • Network opacity
  • The assumptions regarding scalability and

complexity may not hold true any longer

slide-20
SLIDE 20

Cascaded TCP Dynamics

  • Recall TCP’s sequence number and ACKs
  • We can observe the progress of a TCP connection

by plotting the sequence number acknowledged by the receiver

  • For this experiment, we captured packet-level

traces of both LSL and end-to-end connections

  • 10 traces for each path and subpath were

gathered

  • We compute the average growth of the sequence

number with respect to time

  • The following graphs depict average progress
  • f a set of transfers
slide-21
SLIDE 21

UCSB->Denver->UIUC (64M)

slide-22
SLIDE 22

UCSB->Houston->UFL (64M)

slide-23
SLIDE 23

Cost of Path Traversal

  • With pipelined use of LSL depots, there is

some startup overhead, after which the time to transfer is dominated by the narrow link

  • Model as a graph, treating edge cost as time to

transfer some amount of data

  • 1 / achievable bandwidth
  • The cost of a path is that of the maximum-

valued link in the path from source to sink – max(ci,j) for edge i,j in the path

  • Or, the achievable bandwidth on a path is

constrained by the link with the smallest bandwidth

  • Optimization for this condition is minimax
  • minimize the maximum value
slide-24
SLIDE 24

Routing Connections

  • Goal: Find the best path through the

network for a given source and sink

  • Approach: Build a tree of best paths from a

single source to all nodes with a greedy algorithm similar to Shortest Path

  • By walking the tree of Minimax paths (MMP)

we can extract the best path from the source node to each of the other nodes

  • from source to a given destination to build a

complete source route

  • produce a table of destination/next-hop pairs for

depot routing tables

  • O(m log n) operation for each m
slide-25
SLIDE 25

A Tree of Minimax Paths

slide-26
SLIDE 26
  • Bandwidth measurements vary slightly from

moment to moment

  • Connections are bound by the same wide-area

connection

Edge Equivalence

slide-27
SLIDE 27

Edge Equivalence Threshold -

  • Modified algorithm considers edges within of one

another to have the same cost

slide-28
SLIDE 28

Network Prediction/Forecasting

  • Predicting network performance is difficult,

especially over links with high bandwidth-delay product

  • Predictions are best generated from a history of

identical measurements

  • Frequent probes cannot be intrusive
  • How do we predict large transfers?
  • Instrumentation data is inexpensive
  • Approach: combine instrumentation data with

current lightweight probes to improve application- specific forecasts

slide-29
SLIDE 29

What can short probes tell us?

HTTP 16MB Transfers ANL -> UCSB

1 2 3 4 5 6 7 8 9

Time (10 days) BW (mb/s)

NWS 64K Network Probes ANL -> UCSB

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Time (10 days) BW (mb/s)

slide-30
SLIDE 30

Multivariate Forecasting

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.5 1 1.5

Bandw idth in Mbit/ sec ECDF Probability

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8

Bandw idth Mbit/ sec ECDF Probability

quantile = CDFX valueX

( )

predictionY = CDFY

1 quantile

( )

( )

) Pr( x X x CDF

  • =

ECDF x

( ) = count x

total

slide-31
SLIDE 31

Experimental Configuration

  • Collect 64KB bandwidth measurements

every 10 seconds

  • Time 16MB HTTP transfers every 60

seconds

  • Use the wget utility to get a file from the

filesystem

  • Heavily used, general purpose systems

including a Solaris system at Argonne Nat’l Lab and a Linux machine at UCSB

  • Forecasting error as a measure of

efficacy

  • Difference in forecast and measured value
slide-32
SLIDE 32

Comparison of Forecasting

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 10 15 30 60 180 300 450

Time (minutes between HTTP xfers)

Mean Absolute Error (mb/ s)

Univariate Forecaster Multivariate Forecaster

slide-33
SLIDE 33

Comparison of Forecasting 2

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1 2 3 4 5 10 15 30 60 180 300 450

Time (minutes between HTTP xfers)

Root Mean Square Error

Univariate Forecaster Multivariate Forecaster

slide-34
SLIDE 34

Last Value Prediction

0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 5 10 15 30 60 180 300 450

Time (minutes between HTTP xfers)

Normalized Mean Absolute Error

Last Value Prediction Multivariate Forecaster

slide-35
SLIDE 35

Network Performance

  • To make network scheduling/routing choices,

we need feedback from the network

  • Obvious, but this remains an open challenge
  • Global Grid Forum Network Measurements

Working Group

  • The latest instantiation of performance monitoring

efforts

  • Also NMA-RG
  • Current JRA1 / Internet2 effort: SONAR
  • Service Oriented Network monitoring ARchitecture
slide-36
SLIDE 36

GGF NMWG

  • First step in the NMWG was to define the

“Characteristics” hierarchy

  • B. Lowekamp, B. Tierney, Les Cottrell, R. Hughes-Jones, T. Kielmann,
  • M. Swany, Enabling Network Measurement Portability Through a

Hierarchy of Characteristics, 4th International Workshop on Grid Computing (Grid2003), November, 2003.

  • Current work is focused on providing

standard schemata for representing and exchanging performance information

  • Version 2 is in progress
  • Enables complete extensibility
slide-37
SLIDE 37

New Work

  • GGF: Network Measurements for

Applications, Research Group (NMA-RG)

  • LSL-NP
  • Implementation of LSL on the Intel IXP

platform

  • Phoebus
  • Session Layer for lambda-switched Optical

Networks

  • Mercury
  • Malleable environment for protocols and

services

slide-38
SLIDE 38

UltraNet End-to-End Session

slide-39
SLIDE 39

General Network Programming

  • Exposure of network, processing and storage

elements allows for optimization

  • Sensor stream downsampling
  • The question: How can we construct scalable

network programming systems that adhere to the end to end model?

  • While providing the process optimization that we

need.

  • Consider many possible optimization loci in the

network, but all are “best effort”

  • Even premium service just increases the likelihood

that work will be prioritized. The underlying elements are still “best effort.”

slide-40
SLIDE 40

Mercury

  • Network processor environment
  • Dynamic protocol/process assembly
  • Dynamic network virtualization
  • Control must come from the edge and be

tied to a user

  • Minimization of control traffic
  • As everything speeds up, the speed of light

becomes more of an issue

  • Hierarchies of control