Data Logistics in Network Computing Martin Swany Introduction and - PowerPoint PPT Presentation

Data Logistics in Network Computing Martin Swany

Introduction and Motivation • The goal of Computational Grids is to mimic the electric power grid for computing power • Service-orientation to make computing power a utility • Compute cycles aren’t fungible • Data location and movement overhead is critical • In the case of Data Grids, data movement is the key problem • Managing the location of data is critical

Data Logistics • The definition of Logistics “…the process of planning, implementing, and controlling the efficient, effective flow and storage of goods, services and related information from point of origin to point of consumption.” • Shipping and distribution enterprises make use of storage (and transformation) when moving material • Optimizing the flow, storage, and access of data is necessary to make distributed and Grid environments a viable reality

The Logistical Session Layer • LSL allows systems to exploit “logistics” in stream-oriented communication • LSL service points ( depots ) provide short- term logistical storage and cooperative data forwarding • The primary focus is improved throughput for reliable data streams • Both unicast and multicast • A wide range of new functionality is possible

The Logistical Session Layer

Session Layer • A session is the end-to-end composition of segment-specific transports and signaling • More responsive control loop via reduction of signaling latency • Adapt to local conditions with greater specificity • Buffering in the network means retransmissions need not come from the source Session Session User Space Transport Transport Transport Network Network Network Data Link Data Link Data Link Physical Physical Physical

Initial Deployment

LSL Performance Improvement

TCP Overview TCP provides reliable transmission of byte streams over best-effort • packet networks Sequence number to identify stream position inside segments • Segments are buffered until acknowledged • Congestion (sender) and flow control (receiver) “windows” • Everyone obeys the same rules to promote stability, fairness, and • friendliness Congestion-control loop uses ACKs to clock segment transmission • Round Trip Time (RTT) critical to responsiveness • Conservative congestion windows • Start with window O(1) and grow exponentially then linearly • Additive increase, multiplicative decrease (AIMD) congestion window • based on loss inference “Sawtooth” steady-state • Problems with high bandwidth • delay product networks

Synchronous Multicast with LSL • Each node sends the data and a recursively encoded control subtree to its children • The LSL connections exist simultaneously • Synchronous distribution • Reliability is provided by TCP • Distribution is logically half-duplex so the “upstream” channel can be used for negotiation and feedback

Build a Distribution Tree

Connections close once data is received

Distribution Experiment • 52 nodes in 4 clusters • UIUC, UTK, UCSD • Distributions originate from a single host • Average times over 10 identical distributions • Without checksum • Control case is a “flat” tree within the same infrastructure

Distribution Time

Bandwidth Delivered

Internet Backplane Protocol • LSL is closely related to IBP • Depots are similar in spirit but don’t yet share an implementation Exposed network buffers J. Plank, A. Bassi, M. Beck, T. Moore, M. Swany, R. Wolski, The Internet Backplane Protocol: Storage in the Network , IEEE Internet Computing, September/October 2001 .

LSL Implementation • The LSL client library provides compatibility with current socket applications • Although more functionality is available using the API directly • LD_PRELOAD for function override • socket(), bind(), connect(), setsockopt()… • Allows Un*x binaries to use LSL without recompilation • Daemon runs on all Un*x platform • Forwarding is better on Linux than on BSD

LSL Summary • Logistical data overlays can significantly improve performance for data movement • Demonstrated speedup • Think of a session as the composition of network-specific transport layers • There are many cases in which a single transport protocol from end to end might not be the best choice • Network heterogeneity • Wireless • Optical (with time-division multiplexing) • Potential to become a new model rather than short-term solution for TCP’s problems

The End to End Arguments • Why aren’t techniques like this already in use? • Recall the end-to-end arguments • E2E Integrity • Network elements can’t be trusted • Duplication of function is inefficient • Fate sharing • State in the network related to a user • Scalability • Network transparency • Network opacity • The assumptions regarding scalability and complexity may not hold true any longer

Cascaded TCP Dynamics • Recall TCP’s sequence number and ACKs • We can observe the progress of a TCP connection by plotting the sequence number acknowledged by the receiver • For this experiment, we captured packet-level traces of both LSL and end-to-end connections • 10 traces for each path and subpath were gathered • We compute the average growth of the sequence number with respect to time • The following graphs depict average progress of a set of transfers

UCSB->Denver->UIUC (64M)

UCSB->Houston->UFL (64M)

Cost of Path Traversal • With pipelined use of LSL depots, there is some startup overhead, after which the time to transfer is dominated by the narrow link • Model as a graph, treating edge cost as time to transfer some amount of data • 1 / achievable bandwidth • The cost of a path is that of the maximum- valued link in the path from source to sink – max(c i,j ) for edge i,j in the path • Or, the achievable bandwidth on a path is constrained by the link with the smallest bandwidth • Optimization for this condition is minimax • mini mize the max imum value

Routing Connections • Goal : Find the best path through the network for a given source and sink • Approach : Build a tree of best paths from a single source to all nodes with a greedy algorithm similar to Shortest Path • By walking the tree of Minimax paths (MMP) we can extract the best path from the source node to each of the other nodes • from source to a given destination to build a complete source route • produce a table of destination/next-hop pairs for depot routing tables • O(m log n) operation for each m

A Tree of Minimax Paths

Edge Equivalence • Bandwidth measurements vary slightly from moment to moment • Connections are bound by the same wide-area connection

Edge Equivalence Threshold - � • Modified algorithm considers edges within � of one another to have the same cost

Network Prediction/Forecasting • Predicting network performance is difficult, especially over links with high bandwidth-delay product • Predictions are best generated from a history of identical measurements • Frequent probes cannot be intrusive • How do we predict large transfers? • Instrumentation data is inexpensive • Approach: combine instrumentation data with current lightweight probes to improve application- specific forecasts

What can short probes tell us? NWS 64K Network Probes ANL -> UCSB 2 1.8 1.6 1.4 BW (mb/s) 1.2 1 0.8 0.6 0.4 0.2 0 Time (10 days) HTTP 16MB Transfers ANL -> UCSB 9 8 7 6 BW (mb/s) 5 4 3 2 1 0 Time (10 days)

Multivariate Forecasting ( ) = CDF x Pr( X � x ) ( ) = count � x ECDF x total 1 1 0.9 0.9 ECDF Probability 0.8 ECDF Probability 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 1 2 3 4 5 6 7 8 0 0.5 1 1.5 Bandw idth Mbit/ sec Bandw idth in Mbit/ sec ( ) � 1 quantile ( ) quantile = CDF X value X prediction Y = CDF Y

Experimental Configuration • Collect 64KB bandwidth measurements every 10 seconds • Time 16MB HTTP transfers every 60 seconds • Use the wget utility to get a file from the filesystem • Heavily used, general purpose systems including a Solaris system at Argonne Nat’l Lab and a Linux machine at UCSB • Forecasting error as a measure of efficacy • Difference in forecast and measured value

Comparison of Forecasting 1 Univariate Forecaster 0.9 Mean Absolute Error (mb/ s) Multivariate Forecaster 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 5 10 15 30 60 180 300 450 Time (minutes between HTTP xfers)

Comparison of Forecasting 2 1.6 1.4 Root Mean Square Error 1.2 1 0.8 0.6 0.4 Univariate Forecaster 0.2 Multivariate Forecaster 0 1 2 3 4 5 10 15 30 60 180 300 450 Time (minutes between HTTP xfers)

Last Value Prediction 1.2 Normalized Mean Absolute Error 1 0.8 Last Value Prediction 0.6 Multivariate Forecaster 0.4 0.2 0 1 2 3 4 5 10 15 30 60 180 300 450 Time (minutes between HTTP xfers)

Data Logistics in Network Computing Martin Swany Introduction and - PowerPoint PPT Presentation

Data Logistics in Network Computing Martin Swany Introduction and Motivation The goal of Computational Grids is to mimic the electric power grid for computing power Service-orientation to make computing power a utility Compute

Project Logistics 1 Our Satisfied Project Logistics Customers 2 Project Logistics Solutions

Presentation Air Logistics Group Air Logistics Group Introduction Introducing Air Logistics

Milestone Logistics Fine Arts . Freight . Distribution Packing . Storage . Logistics . Relocation

WFP LOGISTICS CONTENTS World Food Programme: Who we are How WFP Logistics Works

Logistics Hotels and Rail Freight Logistics in French Cities Dr. Laetitia Dablanc IFSTTAR,

LOGISTICS HUB LUXEMBOURG A TAILOR MADE SOLUTION FOR YOUR EUROPEAN DISTRIBUTION DANIEL LIEBERMANN

BRIDGEePORT LOGISTICS CENTER PERTH AMBOY, NEW JERSEY BRIDGEePORT LOGISTICS CENTER

Cargo Sales & Service Presentation Air Logistics Group Air Logistics Group Introduction

logistics sector in Germany to use e-documents? European Logistics Platform, Brussels, 8 December

Investor Presentation Allcargo Logistics Indias 1 st Multinational Logistics Company

SAFE Urban logistics Scandinavian Analysis of urban Freight logistics using Electric

4 JROTC LOGISTICS LOGISTICS RESPONSIBILITIES DUTIES & ACCOUNTABILTY Military Property

Kline Tower (KT) Renovation Town Hall Meeting February 19, 2020 Project Site Logistics:

Network Data Plane Network Data Plane Network Data Plane (S. S. Lam) 3/23/2017 1 Network layer

Logistics Project Management Introduction Part 1/ 18 Sept. 2018 By PhD. Samia Chehbi Gamoura

Logistics Crafted to Fit Your Needs ABOU ABOUT T US US VORTEX Worldwide Logistics is an

In-Network Computing to the rescue of Faulty Links Acknowledgements: Isaac Pedisich (UPenn),

MONEY CHANGES EVERYTHING II: CREATING PRICE TRANSPARENCY IN NEW YORK STATE New York State Health

libcppa Now: High-Level Distributed Programming Without Sacrificing Performance Matthias

Active Networking ECE/CS598HPN Instructor: Radhika Mittal Active Network Definition Network

Eliminating Adverse Control Plane Interactions in Independent Network Systems Matthew K.

FTRMI: Fault-Tolerant Transparent RMI Diogo Reis and Hugo Miranda University of Lisbon LaSIGE

Weaving Relations for Cache Performance Johannes Kern Universit at T ubingen 03.12.2010

An Introduction to Linked Open Data Felix.Ostrowski@googlemail.com (@literarymachine) Adrian

Sambuz

Useful Links

Newsletter

Mail Us

Data Logistics in Network Computing Martin Swany Introduction and - PowerPoint PPT Presentation

Data Logistics in Network Computing Martin Swany Introduction and Motivation The goal of Computational Grids is to mimic the electric power grid for computing power Service-orientation to make computing power a utility Compute

Project Logistics 1 Our Satisfied Project Logistics Customers 2 Project Logistics Solutions

Presentation Air Logistics Group Air Logistics Group Introduction Introducing Air Logistics

Milestone Logistics Fine Arts . Freight . Distribution Packing . Storage . Logistics . Relocation

WFP LOGISTICS CONTENTS World Food Programme: Who we are How WFP Logistics Works

Logistics Hotels and Rail Freight Logistics in French Cities Dr. Laetitia Dablanc IFSTTAR,

LOGISTICS HUB LUXEMBOURG A TAILOR MADE SOLUTION FOR YOUR EUROPEAN DISTRIBUTION DANIEL LIEBERMANN

BRIDGEePORT LOGISTICS CENTER PERTH AMBOY, NEW JERSEY BRIDGEePORT LOGISTICS CENTER

Cargo Sales &amp; Service Presentation Air Logistics Group Air Logistics Group Introduction

logistics sector in Germany to use e-documents? European Logistics Platform, Brussels, 8 December

Investor Presentation Allcargo Logistics Indias 1 st Multinational Logistics Company

SAFE Urban logistics Scandinavian Analysis of urban Freight logistics using Electric

4 JROTC LOGISTICS LOGISTICS RESPONSIBILITIES DUTIES &amp; ACCOUNTABILTY Military Property

Kline Tower (KT) Renovation Town Hall Meeting February 19, 2020 Project Site Logistics:

Network Data Plane Network Data Plane Network Data Plane (S. S. Lam) 3/23/2017 1 Network layer

Logistics Project Management Introduction Part 1/ 18 Sept. 2018 By PhD. Samia Chehbi Gamoura

Logistics Crafted to Fit Your Needs ABOU ABOUT T US US VORTEX Worldwide Logistics is an

In-Network Computing to the rescue of Faulty Links Acknowledgements: Isaac Pedisich (UPenn),

MONEY CHANGES EVERYTHING II: CREATING PRICE TRANSPARENCY IN NEW YORK STATE New York State Health

libcppa Now: High-Level Distributed Programming Without Sacrificing Performance Matthias

Active Networking ECE/CS598HPN Instructor: Radhika Mittal Active Network Definition Network

Eliminating Adverse Control Plane Interactions in Independent Network Systems Matthew K.

FTRMI: Fault-Tolerant Transparent RMI Diogo Reis and Hugo Miranda University of Lisbon LaSIGE

Weaving Relations for Cache Performance Johannes Kern Universit at T ubingen 03.12.2010

An Introduction to Linked Open Data Felix.Ostrowski@googlemail.com (@literarymachine) Adrian

Sambuz

Useful Links

Newsletter

Mail Us

Cargo Sales & Service Presentation Air Logistics Group Air Logistics Group Introduction

4 JROTC LOGISTICS LOGISTICS RESPONSIBILITIES DUTIES & ACCOUNTABILTY Military Property