CS 744: Big Data Systems Shivaram Venkataraman Fall 2018 - PowerPoint PPT Presentation

Jun 26, 2023 •183 likes •364 views

CS 744: Big Data Systems Shivaram Venkataraman Fall 2018 Administrivia Course Project round 3 meetings signup! Final class on Dec 6 th No class on Dec 11 th Poster session Dec 13 th More details very soon! RDMA: REMOTE

CS 744: Big Data Systems Shivaram Venkataraman Fall 2018
Administrivia • Course Project round 3 meetings signup! • Final class on Dec 6 th • No class on Dec 11 th Poster session Dec 13 th – More details very soon! •
RDMA: REMOTE DIRECT MEMORY ACCESS
MOTIVATION Need to access remote data fast - Increasing NIC speeds (up to 100Gbps) - OS/CPU bottlenecks RDMA - Perform direct memory access (DMA) from NIC! - Bypass remote CPU, OS etc. RDMA cost / availability
FaRM Approach - Model distributed memory as shared address space - Communication primitives over RDMA Features - Memory Management - Transactions - Datastructures
COMMUNICATION PRIMITIVES Key idea: One sided RDMA read/writes How to implement writes ? - Circular buffer on receiver - Recv polls at “Head” - Sender writes at “Tail” - Ensure sender doesn’t overwrite
COMMUNICATION PRIMITIVES
RDMA Challenges Page Table Size - Doing DMA requires NIC to cache page tables - Need for larger pages to make page table smaller - PhyCo – kernel driver that allocates 2GB pages! Caching queue pair data - Need a queue pair (connection) between every sender-receiver - 2*m*t^2 for m machines, t threads per machine - Solution: Share queue pair among threads – 2*m*t/q
CONNECTION MULTIPLEXING
FARM API
MEMORY MANAGEMENT Every 2GB alloc is region 32-bit id, 32-bit offset Map regions in hash ring Why multiple rings ? Parallel recovery Load balancing
MEMORY ALLOCATION Hierarchy - Slabs, regions, blocks - Thread-level, private slab allocators - Blocks multiples of size1MB - Regions on size 2GB Hints - Applications request allocation “close” - Same block as hint or same region or nearby position
TRANSACTIONS Transaction components - Reuse standard protocols from DB (2-phase commit, OCC) - Components: Read set, write set - Coordinator that runs transaction Process - Prepare message to lock write set - Validate messages to check read set - Commit messages: first to replicas then to primaries
LOCK-FREE OPERATIONS Locks are still expensive! à Design lock-free read operations Version numbers stored per-cache line – Why do we need this ? Use memory barriers to update one line at a time
HASHTABLE CHALLENGES Goals - Perform most operations using single RDMA read - Achieve good utilization (avoid resizing hash table) Challenges - Chaining / Cuckoo hashing: Key could be in many disjoint locations - Hopscotch hashing: Each bucket has a neighborhood of H-1 buckets - But large H à more reads and small H à poor utilization
HASHTABLE SOLUTIONs Soln: Chained associative hopscotch Maintain overflow chain per-bucket - Add key to overflow if reqd - Small chains limit overhead - Inline values next to key Other optimizations - Lookups use lock-free read - Combine updates in 1 transaction
SUMMARY New networking hardware enables fast systems Insights Avoid CPU overheads using RDMA read Design higher-level primitives based on that Drawbacks Need to do multiple round trips ? Hardware dependent wins ?

Recommend

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data Analytics Analysis Big Data Big Value Real world Question Data Model Conclusion Machine Learning Use real data to train a model, which can

625 views • 27 slides

Phone Fax 25448 SEIL ROAD 1-815-744-1910 1-815-744-1968 SHOREWOOD, ILLINOIS 60404-7620

Supervisor Trustees Joseph D. Baltz Bryan W. Kopman Larry Ryan John Theo Theobald Clerk Kristin Cross Brett Wheeler Phone Fax 25448 SEIL ROAD 1-815-744-1910 1-815-744-1968 SHOREWOOD, ILLINOIS 60404-7620 www.troytownship.com March

599 views • 57 slides

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data algorithms Clinical Big Data Our new algorithms Small data vs. Big data Small data vs. Big data VS Small data vs. Big

922 views • 57 slides

CS 744: Big Data Systems Shivaram Venkataraman Fall 2018 ADMINISTRIVIA - Assignment 1 -

CS 744: Big Data Systems Shivaram Venkataraman Fall 2018 ADMINISTRIVIA - Assignment 1 - Projects - Piazza MOTIVATION Storing large amounts of semi-structured data - Traditionally done using database systems Varied processing needs - low

388 views • 20 slides

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department | Colorado State University CS535 BIG DATA PART A. BIG DATA TECHNOLOGY 1. INTRODUCTION TO BIG DATA What is Big Data? Sangmi Lee Pallickara

569 views • 7 slides

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by Prof. Dan Ariely, Duke University 2 What is big data? No standard definition! Wikipedia: Big data is a field that treats ways to

1.47k views • 53 slides

CS 744: Big Data Systems Shivaram Venkataraman Fall 2018 With slides from Mosharaf Chowdhury

CS 744: Big Data Systems Shivaram Venkataraman Fall 2018 With slides from Mosharaf Chowdhury and Ion Stoica Datacenter ARCHITECTURE - Hardware Trends - Software Implications - Network Design Why is One Machine Not Enough? Too much data ? Too

1.24k views • 52 slides

Why do big data and cloud systems slow down and stop? Shan Lu What are? Why do big data and

Why do big data and cloud systems slow down and stop? Shan Lu What are? Why do big data and cloud systems slow down and stop? Big data & cloud systems 3 Big data & cloud systems DB-backed web applications Cloud services

802 views • 68 slides

CS 744: Big Data Systems Shivaram Venkataraman Fall 2018 ADMINISTRIVIA - Waitlist/Enrollment

CS 744: Big Data Systems Shivaram Venkataraman Fall 2018 ADMINISTRIVIA - Waitlist/Enrollment - Midterm clarification - How to make a killer presentation Midterm, Project Midterm Exam - Written exam based on main papers - Held on Nov 5,

573 views • 20 slides

FLAT DATACENTER STORAGE CS 744 - Big Data Systems Fall 2018 Presenter - Arjun Balasubramanian

FLAT DATACENTER STORAGE CS 744 - Big Data Systems Fall 2018 Presenter - Arjun Balasubramanian FLAT DATACENTER STORAGE - Motivation - Design - Discussions/Questions FLAT DATACENTER STORAGE - Motivation - Design - Discussions/Questions

383 views • 37 slides

CS 744: Big Data Systems Shivaram Venkataraman Fall 2018 Who am I ? New faculty in Computer

CS 744: Big Data Systems Shivaram Venkataraman Fall 2018 Who am I ? New faculty in Computer Science! PhD Thesis at UC Berkeley: System Design for Large Scale Machine Learning Industry: Google, Microsoft Research Open source: Apache Spark

602 views • 44 slides

CS 744: Big Data Systems Shivaram Venkataraman Fall 2019 Who am I ? Assistant Professor in

CS 744: Big Data Systems Shivaram Venkataraman Fall 2019 Who am I ? Assistant Professor in Computer Science PhD Thesis at UC Berkeley: System Design for Large Scale Machine Learning Industry: Google, Microsoft Research Open source: Apache

594 views • 41 slides

CS 744: Big Data Systems Shivaram Venkataraman Fall 2018 ADMINISTRIVIA - Assignment 1: Due Oct

CS 744: Big Data Systems Shivaram Venkataraman Fall 2018 ADMINISTRIVIA - Assignment 1: Due Oct 1 - Sign up for Project meetings - Group updates MapReduce GFS BigTable BORG: WORKLOAD Long-running services (should never go down) Batch

822 views • 17 slides

CS 744: Big Data Systems Shivaram Venkataraman Fall 2020 Who am I ? Assistant Professor in

CS 744: Big Data Systems Shivaram Venkataraman Fall 2020 Who am I ? Assistant Professor in Computer Science PhD at UC Berkeley: System Design for Large Scale Machine Learning Industry: Google, Microsoft Research Open source: Apache Spark

1.29k views • 41 slides

CS 744: Big Data Systems Shivaram Venkataraman Fall 2018 ADMINISTRIVIA - Midterm grades up

CS 744: Big Data Systems Shivaram Venkataraman Fall 2018 ADMINISTRIVIA - Midterm grades up today - Pick up papers office hours today or Tuesday class - Course Projects: round 2 meetings Graph Mining WHATS DIFFERENT ? Graph Analytics Graph

549 views • 17 slides

CS 744: Big Data Systems Shivaram Venkataraman Fall 2018 ADMINISTRIVIA - Assignment 2, Midterm

CS 744: Big Data Systems Shivaram Venkataraman Fall 2018 ADMINISTRIVIA - Assignment 2, Midterm grades this week - Course Projects: round 2 meetings next Friday - Next Tuesday: Guest speaker for first part WHAT WE KNOW SO FAR CONTINUOUS

378 views • 20 slides

1 Transaction management Example of a transaction Transaction management aims at handling

DATABASE DESIGN I - 1DL300 Introduction to Transactions & Concurrency Control Fall 2011 Elmasri/Navathe ch 20 and 21 Padron-McCarthy/Risch ch 23 and 24 An introductory course on database systems

619 views • 9 slides

Circular Encryption Dan Boneh Shai Halevi Mike Hamburg Rafi Ostrovsoky Circular

Circular Encryption Dan Boneh Shai Halevi Mike Hamburg Rafi Ostrovsoky Circular encryption (E, D) a symmetric cipher. k 1 , k 2 two keys. Which of the following is safe to publish? c E k 1 (k

573 views • 13 slides

Authenticated Storage Using Small Trusted Hardware Hsin-Jung Yang, Victor Costan, Nickolai

Authenticated Storage Using Small Trusted Hardware Hsin-Jung Yang, Victor Costan, Nickolai Zeldovich, and Srini Devadas Massachusetts Institute of Technology November 8th, CCSW 2013 Cloud Storage Model Cloud Storage Requirements Privacy

1.11k views • 55 slides

Probe Card System for DHP Chip Testing VXD Workshop, Wetzlar, February 4-6, 2013 H. Krger, Bonn

Probe Card System for DHP Chip Testing VXD Workshop, Wetzlar, February 4-6, 2013 H. Krger, Bonn University Motivation To/from DCD PXD modules are sensitive to single- (86 CMOS/HSTL) point-of-failure of the DHP 18x CMOS 64x HSTL 4x

298 views • 12 slides

Lecture 7: Bu ff er Management 1 / 46 Bu ff er Management Administrivia To accommodate

Bu ff er Management Lecture 7: Bu ff er Management 1 / 46 Bu ff er Management Administrivia To accommodate students who faced challenges with setting up the virtual machine and / or getting familiar with C ++ , we are bumping up the number of

973 views • 46 slides

IPv6 Secure ND implementation report on Cisco IOS Eric Levy-Abegnoli IETF 70th, vancouver 70th

IPv6 Secure ND implementation report on Cisco IOS Eric Levy-Abegnoli IETF 70th, vancouver 70th IETF - Vancouver, BC, Canada Implementation status Implements RFC3971 & RFC3972 Includes CGA support and Authorization Discovery

514 views • 7 slides

Realistic Traffic Generator Hanoch Haim Principal Engineer Agenda Overview Stateless

Realistic Traffic Generator Hanoch Haim Principal Engineer Agenda Overview Stateless Advance Stateful 2 TRex Results Open Source Cisco Customers 3 TRex Usa sage ge Ana nalyti ytics cs mont nthl hly y repo

361 views • 32 slides

GRUU again for the last time really Jonathan Rosenberg Cisco Changes from -11 Minor

GRUU again for the last time really Jonathan Rosenberg Cisco Changes from -11 Minor Changes Removed references to provisional responses for non-INVITES Added note from Dales draft about not mucking with Contacts with ;gruu

378 views • 10 slides