Naiad James Thomas Goals High-throughput batch processing - PowerPoint PPT Presentation

Jan 27, 2023 •279 likes •423 views

Naiad James Thomas Goals High-throughput batch processing Low-latency processing Iterative computation with streaming updates (novel contribution) For 100% in-memory workloads Novel Application, CIDR 2013 paper

Naiad James Thomas
Goals ● High-throughput batch processing ● Low-latency processing ● Iterative computation with streaming updates (novel contribution) ● For 100% in-memory workloads
Novel Application, CIDR 2013 paper ● Maintaining connected components of graph formed by @username mentions on Twitter ● Connected components is iterative algorithm ● Batches of updates with new @username mentions coming in from Twitter, need to maintain connected components in real time ● First system that can do this
Solution: Lower-Level API, Vertex Model ● Philosophy: hack at lower level if performance needed, otherwise use higher-level library
Low-level API Example
High-level Library Example
Distributed Implementation
Distributed Progress Tracking -- Timestamps
Distributed Progress Tracking -- Pointstamps
Distributed Progress Tracking -- Putting it Together ● Can deliver OnNotify at a vertex if OC for all lower or equal timestamps at predecessor vertices or edges is 0 ○ This OnNotify is in the “frontier” ● In distributed setting node’s local frontier is conservative and assumes that other nodes haven’t made progress until it explicitly hears from them
Fault Tolerance ● System calls user-defined Checkpoint() on vertices during a system-wide checkpoint, can Restore() them on failure ● Vertices can continuously log for better fault recovery at the expense of some throughput ● Higher burden on developer
Fault Tolerance -- Comparison with Spark/MR ● Since Spark/MR work with stateless tasks, on the failure of a node only the failed tasks need to be re-executed, reading from persisted barrier output ● Since vertices are continuously sending data to one another and updating mutable state and there is no system-imposed barrier like in Spark/MR, on the failure of ANY node Naiad must stop all nodes and restore them from the last system-wide checkpoint ● But scheduler needs to be on the path of every job to achieve this property (store lineage of ops), making Spark/MR less suitable for low-latency work
Optimizations -- Prevent Micro-Stragglers ● Tune TCP for this workload (e.g. reduce retransmission timeouts) ● Tune GC so there are fewer stop-the-worlds ● Shared memory contention ● Keep message queues small ● Can’t solve stragglers if they still happen!

Recommend

Using Naiad to Analyze Twitter Data in Batch and Real-time George Wort University of Cambridge

Using Naiad to Analyze Twitter Data in Batch and Real-time George Wort University of Cambridge 2017 Naiad Timely Dataflow System. Batch Processing. Stream Processing. Graph Processing. Supports iterative and incremental

179 views • 6 slides

Naiad a timely dataflow model Whats it hoping to achieve? 1. high throughput 2. low latency

Naiad a timely dataflow model Whats it hoping to achieve? 1. high throughput 2. low latency 3. incremental computation Why? So much data! Problems with other, contemporary dataflow systems: 1. Too specific (e.g. Map-Reduce, Hadoop)

873 views • 25 slides

Naiad: A Timely Dataflow System Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard,

Naiad: A Timely Dataflow System Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, Martin Abadi Presented by Stefan Ivanov for R244: Large-Scale Data Processing and Optimization Summary The Context Overall

493 views • 32 slides

Naiad: A Timely Dataflow System Derek G. Murray Frank McSherry Rebecca Isaacs Michael Isard

Naiad: A Timely Dataflow System Derek G. Murray Frank McSherry Rebecca Isaacs Michael Isard Paul Barham Martn Abadi MSR Silicon Valley Presented by Jesse Mu (jlm95) Background: dataflow programming Batch processing Batch processing

964 views • 74 slides

Naiad: A Timely Dataflow System Derek G. Murray Frank McSherry Rebecca Isaacs Michael Isard

Naiad: A Timely Dataflow System Derek G. Murray Frank McSherry Rebecca Isaacs Michael Isard Paul Barham Martin Abadi Microsoft Research Silicon Valley Presented by Braden Ehrat Batch Stream Graph processing processing processing

791 views • 21 slides

Naiad: A Timely Dataflow System Indigo Orton R244 Computer Laboratory Motivation High

Naiad: A Timely Dataflow System Indigo Orton R244 Computer Laboratory Motivation High throughput Low latency Interac4ve querying Example Analytics dashboard Constant metric streams stream Automated insights

298 views • 17 slides

CS 744: NAIAD Shivaram Venkataraman Fall 2019 ADMINISTRIVIA - Course Project Proposal

CS 744: NAIAD Shivaram Venkataraman Fall 2019 ADMINISTRIVIA - Course Project Proposal feedback - Midterm grades - Checkins? Applications Machine Learning SQL Streaming Graph Computational Engines Scalable Storage Systems Resource

386 views • 19 slides

Naiad (Timely Dataflow) & Streaming Systems CS 848: Models and Applications of Distributed

Naiad (Timely Dataflow) & Streaming Systems CS 848: Models and Applications of Distributed Data Systems Mon, Nov 7th 2016 Amine Mhedhbi What is Timely Dataflow ?! What is its significance? Dataflow ?! Dataflow?! Dataflow?!

1.08k views • 70 slides

The Security Impact of IPv6 How I Learned to Stop Worrying and Love IPv6 Johannes B. Ullrich,

The Security Impact of IPv6 How I Learned to Stop Worrying and Love IPv6 Johannes B. Ullrich, Ph.D. jullrich@sans.edu 1 Housekeeping This presentation consists of slides and audio. If you are experiencing any problems/ issues,

1.12k views • 56 slides

Capturing the Laws of (Data) Nature Hannes Mhleisen, Martin Kersten & Stefan Manegold

Capturing the Laws of (Data) Nature Hannes Mhleisen, Martin Kersten & Stefan Manegold CIDR 2015 Statistical Model Fitting & DB? User gave me a model, lets see. I am storing some data. I need some of the observations to fit

464 views • 24 slides

? packets (example: packet voice). It is better to provide degraded service to everyone than

Process Layer Process Process CSCE 515: Computer Network Transport Layer TCP UDP Programming ------ IP routing ICMP, ARP Network Layer IP & Wenyuan Xu RARP Department of Computer Science and Engineering University of South

472 views • 5 slides

Basic Internetworking (IP) CSCI 466: Networks Keith

Basic Internetworking (IP) CSCI 466: Networks Keith Vertanen Fall 2011 Overview Internetworking Service model Internet protocol (IP)

604 views • 35 slides

Security Mechanisms Rahul Hiran , Niklas Carlsson, Nahid Shahmehri Linkping University, Sweden

Does Scale, Size, and Locality Matter? Evaluation of Collaborative BGP Security Mechanisms Rahul Hiran , Niklas Carlsson, Nahid Shahmehri Linkping University, Sweden 1 Routing attacks increasingly common Each day there are large numbers of

894 views • 58 slides

IP Routing 12 May, 2002 1 Subnetting: Subnet Addressing

346 views • 8 slides

CSE 461 FINAL EXAM REVIEW HELP YOURSELF TO SNACKS FINAL OVERVIEW Online final (through

CSE 461 FINAL EXAM REVIEW HELP YOURSELF TO SNACKS FINAL OVERVIEW Online final (through Catalyst) Starts Friday, late night Due by Monday, 5:00PM (hard deadline) Open book, open notes, open internet, but not open people

803 views • 51 slides

CS 3700 Networks and Distributed Systems Network Layer (Putting the Net in Internet) Revised

CS 3700 Networks and Distributed Systems Network Layer (Putting the Net in Internet) Revised 10/3/19 Network Layer 2 Function: Route packets end-to-end on a Application network, through multiple hops Key challenge:

2.19k views • 172 slides

IP Addressing and Routing 1 Basic IP Addressing Each host connected to the Internet is

IP Addressing and Routing 1 Basic IP Addressing Each host connected to the Internet is identified by a unique IP address. An IP address is a 32-bit quantity. Expressed as a dotted-decimal notation W.X.Y.Z. Consists of two

486 views • 23 slides

iLab Static routing Minoo Rouhi rouhi@net.in.tum.de Slides by Benjamin Hof hof@in.tum.de

iLab Static routing Minoo Rouhi rouhi@net.in.tum.de Slides by Benjamin Hof hof@in.tum.de Chair of Network Architectures and Services Department of Informatics Technical University of Munich Lab 2 17ws 1 / 21 Outline Meta

540 views • 27 slides

Internet in 1990 NSFNET backbone Stanford

Internet in 1990 NSFNET backbone Stanford ISU BARRNET MidNet regional Westnet regional regional Berkeley PARC UNL KU UNM NCAR UA ECPE/CS 5516 (03/10/2000) Internetworking

349 views • 7 slides

Klimatic: A Virtual Data Lake for Harvesting and Distribution of Geospatial Data Tyler J.

Klimatic: A Virtual Data Lake for Harvesting and Distribution of Geospatial Data Tyler J. Skluzacek, Kyle Chard, Ian Foster P D SW- D ISCS 2016 November 14, 2016 Tyler J. Skluzacek, Kyle Chard, Ian Foster P D SW- D ISCS 2016 Klimatic: A Virtual

593 views • 21 slides

Aurora Borealis

Aurora Borealis h"p://www.cio.com/ar.cle/2370573/consumer-technology/aurora-borealis-through-a-circular-fisheye-lens.html Borealis Stream Processing Distributed streaming

644 views • 7 slides

IPv6 - The Next Generation Internet Subnetting and Classless Inter-domain Routing (CIDR)

IPv6 - The Next Generation Internet Subnetting and Classless Inter-domain Routing (CIDR) improve utilization of IP address space and slow growth of routing information, but at some point, they will not be sufficient more than 32 bits

580 views • 18 slides

Lecture 20: NoSQL II Monday, April 13, 2015 Announcements Today: MapReduce & flavor of

Lecture 20: NoSQL II Monday, April 13, 2015 Announcements Today: MapReduce & flavor of Pig Next class: Cloud platforms and Quiz #6 HW #4 is out and will be due 04/27 Grading questions: Class participation

705 views • 32 slides

1 Subnet Address Subnet Address & & Mask Mask Subnetting Subnetting

Outline Outline Lecture 10. Lecture 10. Subnetting Variable Length Subnet Mask (VLSM) Subnetting & & Supernetting Supernetting Subnetting Supernetting Classless Inter-Domain Routing (CIDR) Giuseppe Bianchi Giuseppe

434 views • 11 slides

Naiad James Thomas Goals High-throughput batch processing - PowerPoint PPT Presentation

Naiad James Thomas Goals High-throughput batch processing Low-latency processing Iterative computation with streaming updates (novel contribution) For 100% in-memory workloads Novel Application, CIDR 2013 paper

Using Naiad to Analyze Twitter Data in Batch and Real-time George Wort University of Cambridge

Naiad a timely dataflow model Whats it hoping to achieve? 1. high throughput 2. low latency

Naiad: A Timely Dataflow System Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard,

Naiad: A Timely Dataflow System Derek G. Murray Frank McSherry Rebecca Isaacs Michael Isard

Naiad: A Timely Dataflow System Derek G. Murray Frank McSherry Rebecca Isaacs Michael Isard

Naiad: A Timely Dataflow System Indigo Orton R244 Computer Laboratory Motivation High

CS 744: NAIAD Shivaram Venkataraman Fall 2019 ADMINISTRIVIA - Course Project Proposal

Naiad (Timely Dataflow) & Streaming Systems CS 848: Models and Applications of Distributed

The Security Impact of IPv6 How I Learned to Stop Worrying and Love IPv6 Johannes B. Ullrich,

Capturing the Laws of (Data) Nature Hannes Mhleisen, Martin Kersten & Stefan Manegold

? packets (example: packet voice). It is better to provide degraded service to everyone than

Basic Internetworking (IP) CSCI 466: Networks Keith

Security Mechanisms Rahul Hiran , Niklas Carlsson, Nahid Shahmehri Linkping University, Sweden

IP Routing 12 May, 2002 1 Subnetting: Subnet Addressing

CSE 461 FINAL EXAM REVIEW HELP YOURSELF TO SNACKS FINAL OVERVIEW Online final (through

CS 3700 Networks and Distributed Systems Network Layer (Putting the Net in Internet) Revised

IP Addressing and Routing 1 Basic IP Addressing Each host connected to the Internet is

iLab Static routing Minoo Rouhi rouhi@net.in.tum.de Slides by Benjamin Hof hof@in.tum.de

Internet in 1990 NSFNET backbone Stanford

Klimatic: A Virtual Data Lake for Harvesting and Distribution of Geospatial Data Tyler J.

Aurora Borealis

IPv6 - The Next Generation Internet Subnetting and Classless Inter-domain Routing (CIDR)

Lecture 20: NoSQL II Monday, April 13, 2015 Announcements Today: MapReduce & flavor of

1 Subnet Address Subnet Address & & Mask Mask Subnetting Subnetting

Sambuz

Useful Links

Newsletter

Mail Us

Naiad James Thomas Goals High-throughput batch processing - PowerPoint PPT Presentation

Naiad James Thomas Goals High-throughput batch processing Low-latency processing Iterative computation with streaming updates (novel contribution) For 100% in-memory workloads Novel Application, CIDR 2013 paper

Using Naiad to Analyze Twitter Data in Batch and Real-time George Wort University of Cambridge

Naiad a timely dataflow model Whats it hoping to achieve? 1. high throughput 2. low latency

Naiad: A Timely Dataflow System Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard,

Naiad: A Timely Dataflow System Derek G. Murray Frank McSherry Rebecca Isaacs Michael Isard

Naiad: A Timely Dataflow System Derek G. Murray Frank McSherry Rebecca Isaacs Michael Isard

Naiad: A Timely Dataflow System Indigo Orton R244 Computer Laboratory Motivation High

CS 744: NAIAD Shivaram Venkataraman Fall 2019 ADMINISTRIVIA - Course Project Proposal

Naiad (Timely Dataflow) &amp; Streaming Systems CS 848: Models and Applications of Distributed

The Security Impact of IPv6 How I Learned to Stop Worrying and Love IPv6 Johannes B. Ullrich,

Capturing the Laws of (Data) Nature Hannes Mhleisen, Martin Kersten &amp; Stefan Manegold

? packets (example: packet voice). It is better to provide degraded service to everyone than

Basic Internetworking (IP) CSCI 466: Networks Keith

Security Mechanisms Rahul Hiran , Niklas Carlsson, Nahid Shahmehri Linkping University, Sweden

IP Routing 12 May, 2002 1 Subnetting: Subnet Addressing

CSE 461 FINAL EXAM REVIEW HELP YOURSELF TO SNACKS FINAL OVERVIEW Online final (through

CS 3700 Networks and Distributed Systems Network Layer (Putting the Net in Internet) Revised

IP Addressing and Routing 1 Basic IP Addressing Each host connected to the Internet is

iLab Static routing Minoo Rouhi rouhi@net.in.tum.de Slides by Benjamin Hof hof@in.tum.de

Internet in 1990 NSFNET backbone Stanford

Klimatic: A Virtual Data Lake for Harvesting and Distribution of Geospatial Data Tyler J.

Aurora Borealis

IPv6 - The Next Generation Internet Subnetting and Classless Inter-domain Routing (CIDR)

Lecture 20: NoSQL II Monday, April 13, 2015 Announcements Today: MapReduce &amp; flavor of

1 Subnet Address Subnet Address &amp; &amp; Mask Mask Subnetting Subnetting

Sambuz

Useful Links

Newsletter

Mail Us

Naiad (Timely Dataflow) & Streaming Systems CS 848: Models and Applications of Distributed

Capturing the Laws of (Data) Nature Hannes Mhleisen, Martin Kersten & Stefan Manegold

Lecture 20: NoSQL II Monday, April 13, 2015 Announcements Today: MapReduce & flavor of

1 Subnet Address Subnet Address & & Mask Mask Subnetting Subnetting