Data-Intensive Distributed Computing 431/451/631/651 (Fall 2020) - PDF document

Data-Intensive Distributed Computing 431/451/631/651 (Fall 2020) Part 2: MapReduce Algorithm Design (2/3) Ali Abedi These slides are available at https://www.student.cs.uwaterloo.ca/~cs451/ 1

HDFS MapReduce blissful ignorance Abstraction Storage/computing unpleasant truth Cluster of computers Although we argued about having an abstraction layer to hide the complexities of underlying infrastructure, today we want to have a quick look at the architecture of datacenters. This will help us later to understand the performance trade offs different algorithms. It also makes us appreciate these systems more ☺ 2

A quick review of data center architecture 3

The anatomy of a server Left: Top view of a server Right: the two top figures are the front of the server with two storage configurations: 1)16 2.5 inch drives 2) 8 3.5 inch drivers Right: bottom is the back of the server. We can see network interfaces (7) 4

The anatomy of a server rack We put multiple servers in a server rack. There is a network switch that connects the servers in a rack. This switch also connects the rack to other racks. 5

The anatomy of a data center Clusters of racks of servers build a data center. This is a very simplistic view of a data center. 6

Storage Hierarchy Remote Machine Different Datacenter Remote Machine Different Rack Remote Machine Same Rack Local Machine L1/L2/L3 cache, memory, SSD, magnetic disks capacity, latency, bandwidth Capacity, latency, and bandwidth for reading data change depending on where the data is. The lowest latency and highest bandwidth is achieved when the data we need is on our local server. We can increase capacity by utilizing other servers but at the cost of higher latency and lower bandwidth. 7

Latency numbers every programmer should know Demo https://colin-scott.github.io/personal_website/research/interactive_latency.html 8

The anatomy of a data center Google’s data center video https://youtu.be/XZmGGAbHqa0 9

Abstraction Storage/computing Cluster of computers 10

Distributed File System How can we store a large file on a distributed system? 11

File.txt 200 TB How do you store this file? S1 S2 S3 S19 S20 . . . 100 TB 100 TB 100 TB 100 TB 100 TB Assume that we have 20 identical networked servers each with 100 TB of disk space. How would you store a file on these server? This is the fundamental question in distributed file systems. 12

File.txt Divide into smaller chunks S1 S2 S3 S19 S20 . . . 100 TB 100 TB 100 TB 100 TB 100 TB We can split the file into smaller chunks. 13

File.txt 1 2 3 4 5 6 7 8 Assign chunks to servers S1 S2 S3 S19 S20 . . . 100 TB 100 TB 100 TB 100 TB 100 TB And assign the chunks (e.g., randomly) to the servers. 14

File.txt 1 → S1 2 → S3 Keep track of the chunks … 8 → S19 using a master server S1 S2 S3 S19 S20 . . . 100 TB 100 TB 100 TB 100 TB 100 TB We need to track where each chunk is stored so that we can retrieve the file. 15

File.txt 1 → S1 2 → S3 … 8 → S19 What happens when a server fails?! S1 S2 S3 S19 S20 . . . 100 TB 100 TB 100 TB 100 TB 100 TB If a server that contains one of the chunks fails, the files become corrupted. Since failure rate is high on commodity servers, we need to figure out a solution. 16

File.txt 1 2 3 4 5 6 7 8 FAULT TOLORANCE Store each chunk on multiple servers REPLICATION S1 S2 S3 S19 S20 . . . 100 TB 100 TB 100 TB 100 TB 100 TB If each chunk is stored on multiple server, if a server fails there is a backup. The number of copies determines how much resilience we want. 17

From our made-up distributed file system to a real one 18

Hadoop Distributed File System (HDFS) Adapted from form Erik Jonsson (UT Dallas) 19

Goals of HDFS • Very Large Distributed File System • 10K nodes, 100 million files, 10PB • Assumes Commodity Hardware • Files are replicated to handle hardware failure • Detect failures and recover from them • Optimized for Batch Processing • Provides very high aggregate bandwidth 20

Distributed File System • Data Coherency • Write-once-read-many access model • Client can only append to existing files • Files are broken up into blocks • Typically 64MB block size • Each block replicated on multiple DataNodes • Intelligent Client • Client can find location of blocks • Client accesses data directly from DataNode HDFS is not like a typical file system you use on Windows or Linux. It was specifically designed for Hadoop. It cannot perform some of the typical operations that other file systems can do like random write. Instead it is optimized for large sequential reads and append only writes. 21

HDFS Architecture HDFS namenode Application /foo/bar (file name, block id) File namespace block 3df2 HDFS Client (block id, block location) instructions to datanode datanode state (block id, byte range) HDFS datanode HDFS datanode block data Linux file system Linux file system … … Adapted from (Ghemawat et al., SOSP 2003) Note that the namenode is relatively lightweight, it's just storing where the data is located on datanodes not the actual data. May still have a redundant namenode in the background if the primary one fails HDFS client gets data information from namenode and then interacts with datanodes to get that data Note that namenode has to communicate with datanodes to ensure consistency and redundancy of data (e.g., if a new clone of the data needs to be created) 22

Functions of a NameNode • Manages File System Namespace • Maps a file name to a set of blocks • Maps a block to the DataNodes where it resides • Cluster Configuration Management • Replication Engine for Blocks 23

NameNode Metadata • Metadata in Memory • The entire metadata is in main memory • No demand paging of metadata • Types of metadata • List of files • List of Blocks for each file • List of DataNodes for each block • File attributes, e.g. creation time, replication factor • A Transaction Log • Records file creations, file deletions etc 24

DataNode • A Block Server • Stores data in the local file system (e.g. ext3) • Stores metadata of a block (e.g. CRC) • Serves data and metadata to Clients • Block Report • Periodically sends a report of all existing blocks to the NameNode • Facilitates Pipelining of Data • Forwards data to other specified DataNodes 25

Block Placement • Current Strategy • One replica on local node • Second replica on a remote rack • Third replica on same remote rack • Additional replicas are randomly placed • Clients read from nearest replicas 26

Heartbeats • DataNodes send hearbeat to the NameNode • Once every 3 seconds • NameNode uses heartbeats to detect DataNode failure 27

Replication Engine • NameNode detects DataNode failures • Chooses new DataNodes for new replicas • Balances disk usage • Balances communication traffic to DataNodes 28

HDFS Demo 29

Google File System (GFS) Terminology differences: GFS master = Hadoop namenode GFS chunkservers = Hadoop datanodes Implementation differences: Different consistency model for file appends Implementation language Performance 30

HDFS MapReduce Abstraction Storage/computing Cluster of computers 31

Hadoop Cluster Architecture 32

How do we get data to the workers? Let’s consider a typical supercomputer… SAN Compute Nodes SAN: Storage Area Network 33

Compute-Intensive vs. Data-Intensive SAN Compute Nodes Why does this make sense for compute-intensive tasks? What’s the issue for data -intensive tasks? This makes sense for compute-intensive tasks as the computations (for some chunk of data) are likely to take a long while even on such sophisticated hardware, so the communication costs are greatly outweighed by the computation costs. For data- intensive tasks, the computations (for some chunk of data) aren’t likely to take nearly as long, so the computation costs are greatly outweighed by the communication costs. Likely to experience latency and bottleneck even with high speed transfer. 34

What’s the solution? Don’t move data to workers… move workers to the data! Key idea: co-locate storage and compute Start up worker on nodes that hold the data If a server is responsible for both data storage and processing, Hadoop can do a lot of optimization. For example, when assigning mapreduce tasks to servers, Hadoop considers which servers contain what part of the file locally to minimize copy over network. If all of the data can be process locally where it is stored there will be no need to move the data. 35

Putting everything together … NameNode Resource Manager Node Manager Node Manager Node Manager DataNode DataNode DataNode Linux file system Linux file system Linux file system … … … worker node worker node worker node This figure shows how computation and storage is co-located on a Hadoop cluster. Node manager manages running tasks on a node (e.g., if we have spare resources, do the next job assigned to us) Resource manager is responsible for managing available resources in the cluster 36

Data-Intensive Distributed Computing 431/451/631/651 (Fall 2020) - PDF document

Data-Intensive Distributed Computing 431/451/631/651 (Fall 2020) Part 2: MapReduce Algorithm Design (2/3) Ali Abedi These slides are available at https://www.student.cs.uwaterloo.ca/~cs451/ 1 HDFS MapReduce blissful ignorance Abstraction

MapReduce Data Intensive Computing Data-intensive computing is a class of parallel

Data-Intensive Workfmows A journey to a Holistjc Framework for Data-Intensive Workfmows Ian

Data Intensive Computing Frameworks Amir H. Payberah amir@sics.se Amirkabir University of

for Data Intensive Scalable Computing CAP3 Gene Assembly Program Compute intensive

Intensive Family Support Project Katherine Manchester Paula Hill What is the Intensive Family

Data-Intensive Distributed Computing 431/631 (Fall 2020) Part 1: Introduction to Big Data Ali

Data-Intensive Distributed Computing 451/651 (Fall 2020) Part 1: Introduction to Big Data Ali

Enabling Enabling Data- -Intensive Science Intensive Science Data with Tactical Storage

On safety in distributed computing Srivatsan Ravi On safety in distributed computing Safety in

Distributed Systems (ICE 601) Distributed Transactions Dongman Lee ICU Class Overview

Unleashing Talent in A Distributed Workforce C O R E N E T 2 0 2 0 HACKATHON: DISTRIBUTED W O R K

Data-Intensive Distributed Computing 431/451/631/651 (Fall 2020) Part 1: MapReduce Algorithm

OCIO UFOs Template 4 April 26, 2011 4 April 26, 2011 Objectives 1. Provide an interoperable

Data-Intensive Distributed Computing CS 431/631 451/651 (Fall 2019) Part 6: Data Mining (3/4)

Data-Intensive Distributed Computing CS 431/631 451/651 (Winter 2019) Part 9: Real-Time Data

Data-Intensive Distributed Computing CS 431/631 451/651 (Winter 2019) Part 6: Data Mining (4/4)

Poster: NDN Distributed File System (NDFS) Junior DONGO (UPEC) Fabrice MOURLIN (UPEC) Charif

XtreemFS a Distributed File System for Grids and Clouds Jan Stender Zuse Institute Berlin

Distributed Storage and Consistency Distributed Storage and Consistency Storage moves into the

Hadoop Distributed File System A.A. 2016/17 Matteo Nardelli Laurea Magistrale in Ingegneria

GFS Doug Woos (based on slides from Tom Anderson and Dan Ports) Logistics notes Lab 3b due

OS Support for a Commodity Database on PC Clusters Distributed Devices vs. Distributed File

Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer

BabuDB: Fast and Efficient File System Metadata Storage Jan Stender, Bjrn Kolbeck, Felix