Map Reduce and Design Patterns Lecture 1 Fang Yu Software Security - PowerPoint PPT Presentation

Chapter 1 Chapter 2 Map Reduce and Design Patterns Lecture 1 Fang Yu Software Security Lab. Department of Management Information Systems College of Commerce, National Chengchi University http://soslab.nccu.edu.tw Cloud Computation, March 10, 2015 1 / 14

Chapter 1 Chapter 2 About Me Yu, Fang • 2014-present: Associate Professor, Department of Management Information Systems, National Chengchi University • 2010-2014: Assistant Professor, Department of Management Information Systems, National Chengchi University • 2005-2010: Ph.D. and M.S., Department of Computer Science, University of California at Santa Barbara • 2001-2005: Institute of Information Science, Academia Sinica • 1994-2000: M.B.A. and B.B.A., Department of Information Management, National Taiwan University 2 / 14

Chapter 1 Chapter 2 Hadoop and MapReduce Refresh Hadoop: The Definitive Guide or the Apache Hadoop website. • Hadoop MapReduce jobs are divided into a set of map tasks and reduce tasks that run in a distributed fashion on a cluster of computers. • Each task works on the small subset of the data it has been assigned so that the load is spread across the cluster. • The map tasks generally load, parse, transform, and filter data. • Each reduce task is responsible for handling a subset of the map task output. 3 / 14

Chapter 1 Numerical summarizations Chapter 2 Summarization Grouping similar data together and then performing an operation such as calculating a statistic, building an index, or just simply counting • Numerical summarizations • Inverted index, and • Counting with counters 4 / 14

Chapter 1 Numerical summarizations Chapter 2 Numerical Summarization Group records together by a key and calculate a numerical aggregate per group • Consider θ to be a generic numerical summarization function • Over some list of values ( v 1 , v 2 , . . . , v n ), find λ = θ ( v 1 , v 2 , . . . , v n ) • θ could be minimum, maximum, average, median, and standard deviation 5 / 14

Chapter 1 Numerical summarizations Chapter 2 Motivation Consider that your website logs each time a user logs onto the website, enters a query, clicks ads, or performs any other notable action • When your website is more active? • How affective your ads are? 6 / 14

Chapter 1 Numerical summarizations Chapter 2 Minimum, maximum, and count example Problem: Given a list of users comments, determine the first and last time a user commented and the total number of comments from that user. • Key: User ID, Value: MinMaxCountTuple • Mapper? • Reducer? 7 / 14

Chapter 1 Numerical summarizations Chapter 2 Ideas • Mapper: For each comment, generate a pair < UserId, (CommentTime, CommentTime, 1) > • Reducer: For each group by UserID, find min, max, and aggregate count 8 / 14

Chapter 1 Numerical summarizations Chapter 2 More details about the Reducer For each value in a group: • If the output results minimum is not yet set, or the values minimum is less than results current minimum, we set the results minimum to the input value. • Same to the maximum, except using a greater than operator • Each values count is added to a running sum Remark: the reducer code can be used as a combiner as associativity is preserved. 9 / 14

Chapter 1 Numerical summarizations Chapter 2 Average example Problem: Given a list of users comments, determine the average comment length per hour of day. • < Hour, CommentLength > • Mapper? • Reducer? • Can the reducer code be used as a combiner? 10 / 14

Chapter 1 Numerical summarizations Chapter 2 Average example To calculate an average, we need two values for each group: the sum of the values that we want to average and the number of values that went into the sum. • < Hour, (Count, AvgCommentLength) > • Mapper: For each comment, generate a pair < Hour, (1, CommentLength) > • Reducer: For each group by Hour, accumulate Count and Sum, and compute AvgCommentLength as Sum/Count. Set the pair as < Hour, (Count, AvgCommentLength) > • The reducer code can be used as a combiner 11 / 14

Chapter 1 Numerical summarizations Chapter 2 Median and Standard Deviation Could be more complicated • Median requires sorting • Standard deviation requires the average to be discovered prior to reduction 12 / 14

Chapter 1 Numerical summarizations Chapter 2 Median and Standard Deviation Problem: Given a list of users comments, determine the median and standard deviation of comment lengths per hour of day • A naive idea: < Hour, CommentLength > • Mapper: For each comment, generate a pair < Hour, CommentLength > • Reducer: For each group by Hour, sort the comment lengths in a list to find the median value, and accumulate count and sum to calculate mean. Revisit the list to accumulate sum of deviations by squaring the difference between each comment length and the mean and compute standard deviation • A combiner cannot be used in this implementation. Can we do better? 13 / 14

Chapter 1 Numerical summarizations Chapter 2 Memory-conscious median and standard deviation Instead of having a list whose scaling is O(n) where n = number of comments, the number of key/value pairs in our map is O(max(m)) where m = maximum comment length. • < Hour, A sorted map of (CommentLength, Count) > • Mapper: For each comment, generate a pair < Hour, A singleton map with (CommentLength, 1) > • Reducer: For each group by Hour, maintain the sorted map < Hour, A sorted map of (CommentLength, Count) > Revisit the map to find the median and sum, and accumulate sum of deviations by multiplying the count with the squaring of the difference between each comment length and the mean to compute standard deviation • A combiner can be used to aggregate the sorted map 14 / 14

Map Reduce and Design Patterns Lecture 1 Fang Yu Software Security - PowerPoint PPT Presentation

Chapter 1 Chapter 2 Map Reduce and Design Patterns Lecture 1 Fang Yu Software Security Lab. Department of Management Information Systems College of Commerce, National Chengchi University http://soslab.nccu.edu.tw Cloud Computation, March

Declarative MapReduce 10/29/2018 1 MapReduce Examples Filter Map Aggregate Map Reduce

Design Patterns in Eiffel Dr. Till Bay design patterns? [Design Patterns] are

Design Patterns Applications Programming What is design patterns? The design patterns are

Factory Patterns: Factory Method and Abstract Factory Design Patterns In Java Bob Tarr

Design Patterns 1 What are Design Patterns? Design patterns describe common (and successful)

More Design Patterns Horstmann ch.10.1,10.4 Design patterns Structural design patterns

Recap: Map-Reduce Map Phase Reduce Phase (per record

Java Design Patterns Lecture 28 COP 3252 Summer 2017 July 25, 2017 Design Patterns Design

Lecture 20 Next lecture: Design Patterns 1 Structural patterns (controlling heap layout)

Principles and Patterns 26 February, 2020 Recap Principles Patterns Inheritance Anti-patterns

map-D map-D data refined map-D data refined map-D A GPU Database for Real-Time Big Data

Design Patterns Massimo Felici Massimo Felici Design Patterns 2011 c 1 Design Patterns

Patterns 2020/4/12 Structural Design Patterns Creational Structural Behavioral Design

Design Patterns: Background Design Patterns: Background Five Principles (revisited)

B16 Design Patterns Lecture 1 Victor Adrian Prisacariu http://www.robots.ox.ac.uk/~victor

B16 Design Patterns Lecture 3 Victor Adrian Prisacariu http://www.robots.ox.ac.uk/~victor

Describing Data Part 1: Centrality and Variability INFO-1301, Quantitative Reasoning 1

M5S2 - Confidence Intervals for population mean with population standard deviation unknown

Feb 27: Expectation, Variance, and Standard Deviation In-class Midterm Exam MOVED to 3/10

Standard Deviation MDM4U: Mathematics of Data Management A deviation is the difference between any

Foundations of Computer Science Lecture 21 Deviations from the Mean How Good is the Expectation

Statistics, Probability, Distributions, & Error Propagation James R. Graham 9/2/09 1

The Distribution of the Sample Mean Suppose that X 1 , X 2 , . . . , X n are a simple random sample

METHOD HODS FOR T TESTING G UNI NIFORMITY S STATI TISTI TICS CS KRISHNA PATEL AND ROBERT

Map Reduce and Design Patterns Lecture 1 Fang Yu Software Security - PowerPoint PPT Presentation

Chapter 1 Chapter 2 Map Reduce and Design Patterns Lecture 1 Fang Yu Software Security Lab. Department of Management Information Systems College of Commerce, National Chengchi University http://soslab.nccu.edu.tw Cloud Computation, March

Declarative MapReduce 10/29/2018 1 MapReduce Examples Filter Map Aggregate Map Reduce

Design Patterns in Eiffel Dr. Till Bay design patterns? [Design Patterns] are

Design Patterns Applications Programming What is design patterns? The design patterns are

Factory Patterns: Factory Method and Abstract Factory Design Patterns In Java Bob Tarr

Design Patterns 1 What are Design Patterns? Design patterns describe common (and successful)

More Design Patterns Horstmann ch.10.1,10.4 Design patterns Structural design patterns

Recap: Map-Reduce Map Phase Reduce Phase (per record

Java Design Patterns Lecture 28 COP 3252 Summer 2017 July 25, 2017 Design Patterns Design

Lecture 20 Next lecture: Design Patterns 1 Structural patterns (controlling heap layout)

Principles and Patterns 26 February, 2020 Recap Principles Patterns Inheritance Anti-patterns

map-D map-D data refined map-D data refined map-D A GPU Database for Real-Time Big Data

Design Patterns Massimo Felici Massimo Felici Design Patterns 2011 c 1 Design Patterns

Patterns 2020/4/12 Structural Design Patterns Creational Structural Behavioral Design

Design Patterns: Background Design Patterns: Background Five Principles (revisited)

B16 Design Patterns Lecture 1 Victor Adrian Prisacariu http://www.robots.ox.ac.uk/~victor

B16 Design Patterns Lecture 3 Victor Adrian Prisacariu http://www.robots.ox.ac.uk/~victor

Describing Data Part 1: Centrality and Variability INFO-1301, Quantitative Reasoning 1

M5S2 - Confidence Intervals for population mean with population standard deviation unknown

Feb 27: Expectation, Variance, and Standard Deviation In-class Midterm Exam MOVED to 3/10

Standard Deviation MDM4U: Mathematics of Data Management A deviation is the difference between any

Foundations of Computer Science Lecture 21 Deviations from the Mean How Good is the Expectation

Statistics, Probability, Distributions, &amp; Error Propagation James R. Graham 9/2/09 1

The Distribution of the Sample Mean Suppose that X 1 , X 2 , . . . , X n are a simple random sample

METHOD HODS FOR T TESTING G UNI NIFORMITY S STATI TISTI TICS CS KRISHNA PATEL AND ROBERT

Statistics, Probability, Distributions, & Error Propagation James R. Graham 9/2/09 1