CS 744: GEODE Shivaram Venkataraman Fall 2019 ADMINISTRIVIA - - PowerPoint PPT Presentation

Nov 04, 2022 •401 likes •600 views

CS 744: GEODE Shivaram Venkataraman Fall 2019 ADMINISTRIVIA - Assignment 2 grades - Midterm coming up Tuesday! - AEFIS feedback form SQL in BiG DATA SYSTEMS - Scale: How do we handle large datasets, clusters ? - Wide-area: How do we

CS 744: GEODE Shivaram Venkataraman Fall 2019
ADMINISTRIVIA - Assignment 2 grades - Midterm coming up Tuesday! - AEFIS feedback form
SQL in BiG DATA SYSTEMS - Scale: How do we handle large datasets, clusters ? - Wide-area: How do we handle queries across datacenters ?
WIDE AREA ANALYTICS
MOTIVATION
GOALS / ASSUMPTIONS - Support analytics queries (including joins) - Minimize wide-area network usage - Resources within single DC are plentiful - Primary metric: Bandwidth cost not latency
EXAMPLE
APPROACH 1. Join order selection - Choice of join algorithm - Order in which they are executed 2. Task assignment 3. Manage data replication
ARCHITECTURE
OPTIMIZER SETUP Workload properties Data birth Sovereignty Fixed Queries
Sub query deltas Cache intermediate results in sub-queries What does this help ? - Repeated queries (issued every hour etc.) - Shared sub-queries (across data-scientists ?) What does this not help with? - Computation still happens within DC - Extra storage for cache (how do you expire this ?)
QUERY OPTIMIZER: CALCITE++ Apache Calcite: centralized SQL query planner Input: SQL parse tree. Output: Optimized parse tree Similar to Catalyst, but includes cost-based optimization Calcite++ Estimate distributed join cost Important to pick right plan not estimate accurate cost! Select join strategy e.g. Broadcast
PSEUDO DISTRIBUTED EXECUTION Original Pseudo Distributed
Pseudo distributed execution Key idea: Use stats from repeated executions Advantages Disadvantages ?
Site selection, DATA REPLICATION Integer linear program formulation Objective: Minimize replicationCost + executionCost Constraints Disaster recovery Regulatory constraints Solution Assignment of which task runs on which DC Which partition is replicated to which DC
SITE SELECTION, DATA REplication ILP doesn’t scale for large workloads Greedy heuristic Greedily pick datacenter for task based on copying cost Plugin values, run ILP for replication strategy Limitations
SUMMARY New area of wide-area big data analytics Combine query optimization + network awareness Main contributions Optimize data replication, task placement Intelligent caching to reuse sub-queries
DISCUSSION https://forms.gle/Qr142WN1LVNyVAfLA
Items(id: Int, name: String, price: Double) Orders(id: Int, itemId: Int, count: Int, loc: String) SELECT order.id, item.name, item.price, order.count FROM item JOIN order WHERE item.id = order.itemid and item.price < 1400 and order.count > 2 - 1 If the orders table was distributed across three geographic locations: US, Europe and Asia, how can the query can be executed by using Geode.

Recommend

IoT Platform using Geode and ActiveMQ Scalable IoT Platform Swapnil Bawaskar @sbawaskar

IoT Platform using Geode and ActiveMQ Scalable IoT Platform Swapnil Bawaskar @sbawaskar sbawaskar@apache.org Agenda Introduction IoT MQTT Apache ActiveMQ Artemis Apache Geode Real world use case Q&A 2 IoT

634 views • 28 slides

Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian

Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian Tzolov Whoami Christian Tzolov Engineer at Pivotal, Big-Data, Hadoop, Spring Cloud Dataflow, Apache Geode, Apache HAWQ, Apache Committer, Apache

796 views • 41 slides

TAXI TRIP ANALYSIS (DEBS GRAND-CHALLENGE) WITH APACHE GEODE Swapnil Bawaskar William Markito

TAXI TRIP ANALYSIS (INCUBATING) (DEBS GRAND-CHALLENGE) WITH APACHE GEODE TAXI TRIP ANALYSIS (DEBS GRAND-CHALLENGE) WITH APACHE GEODE Swapnil Bawaskar William Markito Oliveira sbawaskar@apache.org markito@apache.org INTRODUCTION DEBS

291 views • 16 slides

Phone Fax 25448 SEIL ROAD 1-815-744-1910 1-815-744-1968 SHOREWOOD, ILLINOIS 60404-7620

Supervisor Trustees Joseph D. Baltz Bryan W. Kopman Larry Ryan John Theo Theobald Clerk Kristin Cross Brett Wheeler Phone Fax 25448 SEIL ROAD 1-815-744-1910 1-815-744-1968 SHOREWOOD, ILLINOIS 60404-7620 www.troytownship.com March

599 views • 57 slides

NETWORK OPERATORS Who we are Represents more than 600 independent electricity and gas

DG SANCO Working Group Meeting WHO IS GEODE AND 24 May 2012 WHAT DOES IT DO GEODE Presentation FOR ITS MEMBERS? - by Mr Joakim Bogdanoff NETWORK OPERATORS Who we are Represents more than 600 independent electricity and gas distribution

258 views • 9 slides

Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA by

Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA by Christian Tzolov @christzolov Whoami Christian Tzolov Technical Architect at Pivotal, BigData, Hadoop, SpringXD, Apache Committer, Crunch PMC

395 views • 36 slides

CS 6453: Geode and Clarinet Soumya Basu April 13, 2017 Motivation Motivation Status Quo Tens

CS 6453: Geode and Clarinet Soumya Basu April 13, 2017 Motivation Motivation Status Quo Tens of datacenters 100s of Terabytes of bandwidth! Why is this a problem? Application demands are growing Wide Area Network capacity is

311 views • 27 slides

2.744 Dreamweaver Tutorial Sangmok Han sangmok@mit.edu Feb 24, 2010 Overview We will go over

2.744 Dreamweaver Tutorial Sangmok Han sangmok@mit.edu Feb 24, 2010 Overview We will go over the steps for creating the below page using Dreamweaver: http://web.mit.edu/2.744/www/Results/studentSubmissions/humanUseAnalysis/sa

491 views • 30 slides

QR CODES 4 All Diane Edgar Education Specialist Region 4 ESC 713.744.6862 Handout Follow

QR CODES 4 All Diane Edgar Education Specialist Region 4 ESC 713.744.6862 Handout Follow along at http://www.esc4.net/default.aspx?name=e dtech.qrcodes Or http://techapps.wikispaces.com/QRazy+Q R+Codes+in+Your+Classroom What

399 views • 27 slides

CS 744: Big Data Systems Shivaram Venkataraman Fall 2018 ADMINISTRIVIA - Waitlist/Enrollment

CS 744: Big Data Systems Shivaram Venkataraman Fall 2018 ADMINISTRIVIA - Waitlist/Enrollment - Midterm clarification - How to make a killer presentation Midterm, Project Midterm Exam - Written exam based on main papers - Held on Nov 5,

573 views • 20 slides

Annual Budget 25448 Seil Rd. Shorewood, IL 60404 815-744-1968 www.troytownship.com P a g e | 1

UPDATED: March 19, 2018 2018-2019 Annual Budget 25448 Seil Rd. Shorewood, IL 60404 815-744-1968 www.troytownship.com P a g e | 1 Click for Table of Contents This page intentionally left blank. P a g e | 2 Click for Table of Contents

922 views • 54 slides

Y R A N I M I L E R P 25448 Seil Rd. Shorewood, IL 60404 815-744-1968

2017-2018 Annual Budget Y R A N I M I L E R P 25448 Seil Rd. Shorewood, IL 60404 815-744-1968 www.troytownship.com P a g e | 1 Click for Table of Contents Y R A N I M I This page intentionally left blank. L E R P P a g e

1.36k views • 64 slides

Proposed Town Fund Levy Presentation 25448 Seil Rd. Shorewood, IL 60404 815-744-1968

2018 Proposed Road and Bridge Fund Levy and Proposed Town Fund Levy Presentation 25448 Seil Rd. Shorewood, IL 60404 815-744-1968 www.troytownship.com October 15, 2018 P a g e | 1 Click for Table of Contents This page intentionally left

368 views • 34 slides

Proposed Town Fund Levy Presentation 25448 Seil Rd. Shorewood, IL 60404 815-744-1968

2017 Proposed Road and Bridge Fund Levy and Proposed Town Fund Levy Presentation 25448 Seil Rd. Shorewood, IL 60404 815-744-1968 www.troytownship.com October 16, 2017 P a g e | 1 Click for Table of Contents This page intentionally left

493 views • 35 slides

Authority Financials Financial Snapshot May 2017 Profit/Loss $593,016 $409,744 Actual

Board of Governors Meeting June 27, 2017 Authority Financials Financial Snapshot May 2017 Profit/Loss $593,016 $409,744 Actual $421,303 $440,057 Budgeted $11,722,215 $6,005,398 Actual YTD $7,196,917 $5,947,917 Budgeted YTD

436 views • 8 slides

CS 744: GOOGLE FILE SYSTEM Shivaram Venkataraman Fall 2020 ANNOUNCEMENTS no - Assignment 1

! morning good CS 744: GOOGLE FILE SYSTEM Shivaram Venkataraman Fall 2020 ANNOUNCEMENTS no - Assignment 1 out later today 5pm or before - Group submission form me Machine Scale : \ - Anybody on the waitlist? Collaboration

584 views • 30 slides

Wide Area Placement of Data Replicas for Fast and Highly Available Data Access Fan Ping Xiaohu

Wide Area Placement of Data Replicas for Fast and Highly Available Data Access Fan Ping Xiaohu Li, Christopher McConnell Rohini Vabbalareddy, Jeong-Hyon Hwang State University of New York - Albany Outline Background Network Coordinate

636 views • 46 slides

A Cloud-native Architecture for Replicated Data Services Hemant Saxena, Jeffery Pound University

A Cloud-native Architecture for Replicated Data Services Hemant Saxena, Jeffery Pound University of Waterloo, SAP Labs Waterloo Outline Problem overview Solution overview Kafka Cassandra Evaluation 2 Problem overview Cloud

692 views • 23 slides

Distributed Databases 1 19.1 Distributed Database System A distributed database system

Distributed Databases 1 19.1 Distributed Database System A distributed database system consists of loosely coupled sites that share no physical component Database systems that run on each site are independent of each other

761 views • 42 slides

Data Replication and Power Consumption in Data Grids Karl Smith, Susan Vrbsky, Ming Lei, Jeff

Data Replication and Power Consumption in Data Grids Karl Smith, Susan Vrbsky, Ming Lei, Jeff Byrd University of Alabama Tuscaloosa, AL Introduction Grid computing sharing of data and resources across multiple institutions Large

359 views • 35 slides

Principles of Software Construction: Objects, Design, and Concurrency Distributed System Design,

Principles of Software Construction: Objects, Design, and Concurrency Distributed System Design, Part 2. MapReduce Spring 2014 Charlie Garrod Christian Kstner School of Computer Science Administrivia Homework 5c due

842 views • 25 slides

Vembu Technologies 100+ Decade + G2 crowd Countries Experience Top Leaders-2019

Vembu Technologies 100+ Decade + G2 crowd Countries Experience Top Leaders-2019 www.vembu.com How to protect your business from data loss? Data loss & its sources Causes of data loss How to protect businesses from these losses ?

363 views • 13 slides

Distributed Databases Chapter 16 1 What is a Distributed Database? Database whose relations

Distributed Databases Chapter 16 1 What is a Distributed Database? Database whose relations reside on different sites Database some of whose relations are replicated at different sites Database whose relations are split between

551 views • 17 slides

Building an open source data lake at scale in the cloud Adrian Woodhead, Principal Engineer 1

Building an open source data lake at scale in the cloud Adrian Woodhead, Principal Engineer 1 Agenda Background Data Lake foundation: data + metadata High Availability and Disaster Recovery Data federation Event-based data processing 2

653 views • 24 slides