CS 744: MAPREDUCE Shivaram Venkataraman Fall 2020 ANNOUNCEMENTS - - PowerPoint PPT Presentation

cs 744 mapreduce
SMART_READER_LITE
LIVE PREVIEW

CS 744: MAPREDUCE Shivaram Venkataraman Fall 2020 ANNOUNCEMENTS - - PowerPoint PPT Presentation

back ! welcome CS 744: MAPREDUCE Shivaram Venkataraman Fall 2020 ANNOUNCEMENTS Assignment 1 deliverables Code (comments, formatting) papers # sections evaluation in Report to Similar Partitioning analysis (graphs,


slide-1
SLIDE 1

CS 744: MAPREDUCE

Shivaram Venkataraman Fall 2020

welcome

back !

slide-2
SLIDE 2

ANNOUNCEMENTS

  • Assignment 1 deliverables

– Code (comments, formatting) – Report

  • Partitioning analysis (graphs, tables, figures etc.)
  • Persistence analysis (graphs, tables, figures etc.)
  • Fault-tolerance analysis (graphs, tables, figures etc.)
  • See Piazza for Spark installation

#

Similar

to

evaluation

sections

in

papers

slide-3
SLIDE 3

Scalable Storage Systems Datacenter Architecture Resource Management Computational Engines Machine Learning SQL Streaming Graph Applications

→ MapReduce

HFS

  • Data

center

arch

slide-4
SLIDE 4

BACKGROUND: PTHREADS

void *myThreadFun(void *vargp) { sleep(1); printf(“Hello World\n"); return NULL; } int main() { pthread_t thread_id_1, thread_id_2; pthread_create(&thread_id_1, NULL, myThreadFun, NULL); pthread_create(&thread_id_2, NULL, myThreadFun, NULL); pthread_join(thread_id_1, NULL); pthread_join(thread_id_2, NULL); exit(0); }

limited

set

  • g arap
  • i. Bharal

memory

  • heap

}

execute

this in parallel

create

two thread

2

.

Locks . Cvs

↳ Synchronization

across

threads

  • ~

Single

are

→ Muth

are both

wait

for

  • f

them

threads

slide-5
SLIDE 5

BACKGROUND: MPI

int main(int argc, char** argv) { MPI_Init(NULL, NULL); // Get the number of processes int world_size; MPI_Comm_size(MPI_COMM_WORLD, &world_size); // Get the rank of the process int world_rank; MPI_Comm_rank(MPI_COMM_WORLD, &world_rank); // Print off a hello world message printf("Hello world from rank %d out of %d processors\n", world_rank, world_size); // Finalize the MPI environment. MPI_Finalize(); }

mpirun -n 4 -f host_file ./mpi_hello_world

0,1

,

2

.
  • ,
n - I

n go ,

Supercomputing

ranks

for

n

processes

fine grained

synchronization

{

get ttam

. C-

n )
  • f

processes

ferial

hosts

for this job

program

  • are

in

=host-fk#

[

  • ji;

*me

:.

MPI

  • tend

to send

a

menage

+.

ardeer

prom

D D

D

D

slide-6
SLIDE 6

MOTIVATION

Build Google Web Search

  • Crawl documents, build inverted indexes etc.

Need for

  • automatic parallelization
  • network, disk optimization
  • handling of machine failures
  • Didn't

want

programmers

to

worry

about

it

  • mpirun

crashes

MPI \

,

DDD DX

slide-7
SLIDE 7

OUTLINE

  • Programming Model
  • Execution Overview
  • Fault Tolerance
  • Optimizations
slide-8
SLIDE 8

PROGRAMMING MODEL

Data type: Each record is (key, value) Map function: (Kin, Vin) à list(Kinter, Vinter) Reduce function: (Kinter, list(Vinter)) à list(Kout, V

  • ut)

÷÷÷÷÷÷

slide-9
SLIDE 9

Example: Word Count

def def mapper(line): for for word in in line.split():

  • utput(word, 1)

def def reducer(key, values):

  • utput(key, sum(values))

in

:*

ifeng.de?ntermediate

T

value

cow

,

list G. D

Courses

. 2)
slide-10
SLIDE 10

Word Count Execution

the quick brown fox the fox ate the mouse how now brown cow

Map Map Map Reduce Reduce Input Map Shuffle & Sort Reduce Output

GFL, data

is

chunked he → 10*12=0

#

trashing

shared

brown - 535%2=1

=fFrkEeas

g.

mappers -

(

thegn??

Partition

Che ,

' ) Heil) Ike !)

the ,

3

Chunk

  • function
:

.

:

  • .
i

intermediate

. .

key , values

;

brown , 2

(the, l)

brown

.

(

the , I)

:

I

slide-11
SLIDE 11

Word Count Execution

the quick brown fox the fox ate the mouse how now brown cow

Map Map Map Reduce Reduce

brown, 2 fox, 2 how, 1 now, 1 the, 3 ate, 1 cow, 1 mouse, 1 quick, 1

the, 1 brown, 1 fox, 1 quick, 1 the, 1 fox, 1 the, 1 how, 1 now, 1 brown, 1 ate, 1 mouse, 1 cow, 1

Input Map Shuffle & Sort Reduce Output

the

, I

the , 2

he:b

.# ' '

em

.

I

"

Ft:

the , I

combiner

⇒ t

k

port

fetch

files

the , I

numbered

0 I?

① quick , '#

brown , '

ID tox , '

sina.IE#e

  • port

D

'D

slide-12
SLIDE 12

ASSUMPTIONS

i.

Failures

are

norm

Only

have

tasks

No

shared

memory

  • r

message passing

  • ther

than

intermediate

KV

.

.

2

.

Local

storage

(disk)

is

cheap

,

abundant

s .

Applications

can be

written

in

this

model

4 .

Input

is

sploltalde

( records

collection)

slide-13
SLIDE 13

ASSUMPTIONS

1. Commodity networking, less bisection bandwidth 2. Failures are common 3. Local storage is cheap 4. Replicated FS

slide-14
SLIDE 14

Word Count Execution

the quick brown fox

Map Map

the fox ate the mouse

Map

how now brown cow

Automatically split work Schedule tasks with locality

JobTracker

Submit a Job

MapReduce frameworks

MRjT

launching A

↳ Reduce

tasks

MR Master → tasks

,

( by

users

)

conflation

,

  • -
  • Tonydetim

events

worker

*

  • law
  • I

2

3

4

5

slide-15
SLIDE 15

Fault Recovery

If a task crashes: – Retry on another node – If the same task repeatedly fails, end the job

the quick brown fox

Map Map

the fox ate the mouse

Map

how now brown cow

→ Input

can

be still read ( replication)

slide-16
SLIDE 16

Fault Recovery

If a node crashes: – Relaunch its current tasks on other nodes What about task inputs ? File system replication

the quick brown fox

Map Map

the fox ate the mouse

Map

how now brown cow

'

2

3

slide-17
SLIDE 17

the quick brown fox

Map

Fault Recovery

If a task is going slowly (straggler): – Launch second copy of task on another node – Take the output of whichever finishes first

the quick brown fox

Map

the fox ate the mouse

Map

how now brown cow

f)

a

task runs

much slower

than other

/

Assumption

mappers

=

→ something

about the

node

is making

it slow

Che ,

. .)

→ Deterministic

IM

⑦÷÷

,

O

X

slide-18
SLIDE 18

MORE DESIGN

Master failure Locality Task Granularity

checkpoints

→ retry

the job

probability

  • f

master failing

Fuentes

Map

tasks

where

Gfs

chunks

are

Metadata

slide-19
SLIDE 19

MAPREDUCE: SUMMARY

  • Simplify programming on large clusters with frequent failures
  • Limited but general functional API
  • Map, Reduce, Sort
  • No other synchronization / communication
  • Fault recovery, straggler mitigation through retries

Intermediate

data

i. "

I

¥÷.mn?-:.s

.

"

push

data

"
slide-20
SLIDE 20

DISCUSSION

https://forms.gle/mAHD4QuMXko7vnjB6

slide-21
SLIDE 21

DISCUSSION

List one similarity and one difference between MPI and MapReduce

MPI

MapReduce

Similarity

µ.gs

perf

Parallel confuting

=

Diff

Expressive

programing

model

tinted

Fault

tolerance

storage

fine

Message

Patti't

Intermediate

grained

(network

)

KV

pairs

slide-22
SLIDE 22

DISCUSSION

Indexing pipeline where you start with HTML documents. You want to index the documents after removing the most commonly occurring words. 1. Compute most common words. 2. Remove them and build the index. What are the main shortcomings of using MapReduce to do this?

( the

, a . . . . . )

How

to

access

the

list

  • f

common

words

Both

MR

  • ps

read

same

inputs

compose

to

avoid

repeated

disk

I/o

slide-23
SLIDE 23

Wd

go, staffman

, fast

failures

resiliency

↳ Barely

any

longer

than

without

failures

pot

zoo

process

liked

Af \

if

map

is

redone

also

need to

redo

parts

  • f the

O

wiggle

.
slide-24
SLIDE 24

Jeff Dean, LADIS 2009

slide-25
SLIDE 25

NEXT STEPS

  • Next lecture: Spark
  • Assignment 1: Use Piazza!

grey

program

15,000

inputs

µ

"" ""t

"

""