declarative mapreduce
play

Declarative MapReduce 10/29/2018 1 MapReduce Examples Filter Map - PowerPoint PPT Presentation

Declarative MapReduce 10/29/2018 1 MapReduce Examples Filter Map Aggregate Map Reduce Grouped aggregated Map Reduce Equi-join Map Reduce Map Reduce Non-equi-join 10/29/2018 2 Declarative Languages Describe what you want to do


  1. Declarative MapReduce 10/29/2018 1

  2. MapReduce Examples Filter Map Aggregate Map Reduce Grouped aggregated Map Reduce Equi-join Map Reduce Map Reduce Non-equi-join 10/29/2018 2

  3. Declarative Languages Describe what you want to do not how to do it The most popular example is SQL Can we compile SQL queries into MapReduce program(s)? 10/29/2018 3

  4. Pig A system built on-top of Hadoop (Now supports Spark as well) Provides a SQL-ETL-like query language termed Pig Latin Compiles Pig Latin programs into MapReduce programs 10/29/2018 4

  5. Examples Filter: Return all the lines that have a user- specified response code, e.g., 200. log = LOAD ‘logs.csv’ USING PigStorage() AS (host, time, method, url, response, bytes); ok_lines = FILTER log BY response = ‘200’; STORE ok_lines into ‘ filtered_output ’; Map 10/29/2018 5

  6. Examples Grouped aggregate Find the total number of bytes per response code log = LOAD ‘logs.csv’ USING PigStorage() AS (host, time, method, url, response, bytes: int); grouped = GROUP log BY response; grouped_aggregate = FOREACH grouped GENERATE group, SUM(bytes); STORE grouped_aggregate into ‘ grouped_output ’; Map Reduce 10/29/2018 6

  7. Examples Grouped aggregate Find the average number of bytes per response code log = LOAD ‘logs.csv’ USING PigStorage() AS (host, time, method, url, response, bytes: int); grouped = GROUP log BY response; grouped_aggregate = FOREACH grouped GENERATE group, AVG(bytes); STORE grouped_aggregate into ‘ grouped_output ’; 10/29/2018 7

  8. Examples Join: Find pairs of requests that ask for the same URL , coming from the same source log1 = LOAD ‘logs.csv’ USING PigStorage() AS (host, time, method, url, response, bytes: int); log2 = LOAD ‘logs.csv’ USING PigStorage() AS (host, time, method, url, response, bytes: int); joined = JOIN log1 BY (url, host), log2 BY (url, host); 10/29/2018 8

  9. Examples Join: Find pairs of requests that ask for the same URL , coming from the same source and happened within an hour of each other log1 = LOAD ‘logs.csv’ USING PigStorage() AS (host, time, method, url, response, bytes: int); log2 = LOAD ‘logs.csv’ USING PigStorage() AS (host, time, method, url, response, bytes: int); joined = JOIN log1 BY (url, host), log2 BY (url, host); filtered = FILTER joined BY ABS(log1::time - log2::time) < 3600000; 10/29/2018 9

  10. How it works LOAD operation Determines the input path and InputFormat STORE operation Determines the output path and OutputFormat FILTER and FOREACH Translated into map-only jobs AGGREGATE and JOIN Translated into map-reduce jobs All are compiled into one or more MapReduce jobs 10/29/2018 10

  11. Additional Features Lazy execution Nothing gets actually executed until the STORE command is reached Consolidation of map-only jobs Map-only jobs (FILTER and FOREACH) can be consolidated into a next job’s map function or a previous job’s reduce function 10/29/2018 11

  12. A Complex Example log1 = LOAD ‘logs.csv’ USING PigStorage () AS (…); log2 = LOAD ‘logs.csv’ USING PigStorage () AS (…); joined = JOIN log1 BY (url, host), log2 BY (url, host); filtered = FILTER joined BY ABS(log1::time - log2::time) < 3600000; grouped = GROUP filtered BY log1::host; agg_groups = FOREACH grouped GENERATE group, COUNT(*); STORE agg_groups INTO ‘ final_result'; 10/29/2018 12

  13. Further Readings Pig home page: https://pig.apache.org Detailed documentation: http://pig.apache.org/docs/r0.17.0/ The original Pig Latin paper: Olston, Christopher, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. "Pig latin: a not-so-foreign language for data processing." In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp. 1099-1110. ACM, 2008. 10/29/2018 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend