Experiences Scaling Use of Google's Sawzall Jeffrey D. Oldham - - PowerPoint PPT Presentation

experiences scaling use of google s sawzall
SMART_READER_LITE
LIVE PREVIEW

Experiences Scaling Use of Google's Sawzall Jeffrey D. Oldham - - PowerPoint PPT Presentation

Experiences Scaling Use of Google's Sawzall Jeffrey D. Oldham surname at company-name .com Google, Inc. 2011-03-13 Programming, not Theory Not focus on theory. No theorems. No models. No algorithms. Focus on users' programming of parallel


slide-1
SLIDE 1

Experiences Scaling Use of Google's Sawzall

Jeffrey D. Oldham surname at company-name.com Google, Inc. 2011-03-13

slide-2
SLIDE 2

Programming, not Theory

Not focus on theory. No theorems. No models. No algorithms. Focus on users' programming of parallel systems. Users write code. Not system developers. Users write tests.

slide-3
SLIDE 3

Summary

Sawzall eases writing map reductions. Structured Sawzall scales. Parallel system API should separate fundamental model concepts. Ex: map reduction = map + reduce + record enumeration ease writing test code.

slide-4
SLIDE 4

Outline

Map reductions and MapReduce Map reductions and Saw + Sawzall Structured Saw + Sawzall

slide-5
SLIDE 5

Map Reduction

slide-6
SLIDE 6

MapReduce: C++ Library

slide-7
SLIDE 7

Outline

Map reductions and MapReduce Map reductions and Saw + Sawzall Structured Saw + Sawzall

slide-8
SLIDE 8

Sawzall: Simpler Map Reductions

slide-9
SLIDE 9

Sawzall Mental Model: One Record

slide-10
SLIDE 10

Sample Program

Compute the query number per latitude-longitude degree. Sawzall query-location.szl:

proto "querylog.proto" queries_per_degree: table sum[lat: int][lon: int] of int; log_record: QueryLogProto = input; loc: Location = locationinfo(log_record.ip); emit queries_per_degree[int(loc.lat)][int(loc.lon)] <- 1;

Shell code:

saw --program=query-location.szl --input=… --output=…

slide-11
SLIDE 11

Saw + Sawzall Use

Used since 2003 by 100s of Googlers in 1000s of programs to compute a lot of data that is directly or indirectly externally facing.

slide-12
SLIDE 12

Outline

Map reductions and MapReduce Map reductions and Saw + Sawzall Structured Saw + Sawzall

slide-13
SLIDE 13

Scaling Programs

Code ecosystems support sharing tested code. + Sawzall function libraries have tests. – Programs shared by copying. – Typically untested.

slide-14
SLIDE 14

Sawzall Testing Model: Map Reduction

slide-15
SLIDE 15

Structured Pgms: Separate Concepts

slide-16
SLIDE 16

Sample Program

Compute the query number per latitude-longitude degree. Sawzall query-location.szl:

proto "querylog.proto" queries_per_degree: table sum[lat: int][lon: int] of int; log_record: QueryLogProto = input; loc: Location = locationinfo(log_record.ip); emit queries_per_degree[int(loc.lat)][int(loc.lon)] <- 1;

Shell code:

saw --program=query-location.szl --input=… --output=…

slide-17
SLIDE 17

Structured Sample Program

Compute the query number per latitude-longitude degree. Sawzall query-location.szl:

proto "querylog.proto" map: function(log: QueryLogProto, reduce: function(int, int)) {

loc: Location = locationinfo(log_record.ip);

reduce(loc.lat, loc.lon); } reduce: function(lat: int, lon: int) {

queries_per_degree: table sum[lat: int][lon: int] of int; emit queries_per_degree[int(loc.lat)][int(loc.lon)] <- 1;

}

log_record: QueryLogProto = input;

map(log_record, reduce);

Shell code:

saw --program=query-location.szl --input=… --output=…

slide-18
SLIDE 18

Structured Testing Model

slide-19
SLIDE 19

Test Structured Programs

Test map functions ...

  • ne record at a time ...

using mocked reduce function. Advantages: No distributed I/O. Single processor only. Not test reduce functions or

  • rder enumeration.
slide-20
SLIDE 20

Summary

Sawzall eases writing map reductions. Structured Sawzall scales. Parallel system API should separate fundamental model concepts. Ex: map reduction = map + reduce + record enumeration ease writing test code.

slide-21
SLIDE 21

Experiences Scaling Use of Google's Sawzall

Jeffrey D. Oldham surname at company-name.com Google, Inc. 2011-03-13

slide-22
SLIDE 22

References

Sawzall Pike et al. Open-source implementation Wikipedia article MapReduce Dean and Ghemawat (2004, 2008) Wikipedia article