csc 369 distributed computing
play

CSC 369: Distributed Computing Alex Dekhtyar May 6 Day 14: Java - PowerPoint PPT Presentation

CSC 369: Distributed Computing Alex Dekhtyar May 6 Day 14: Java Hadoop API CSC 369: Distributed Computing Alex Dekhtyar May 6 Day 14: Java Hadoop API HAPPY EQUATOR DAY! Housekeeping Lab 4 (mini-project): due Sunday night Lab 5: due


  1. CSC 369: Distributed Computing Alex Dekhtyar May 6 Day 14: Java Hadoop API

  2. CSC 369: Distributed Computing Alex Dekhtyar May 6 Day 14: Java Hadoop API HAPPY EQUATOR DAY!

  3. Housekeeping Lab 4 (mini-project): due Sunday night Lab 5: due tonight (grace period tomorrow) Lab 6: full lab coming out Friday Grading: slowly happening…

  4. Hadoop Java API

  5. Hadoop API Current Version is 3.2.1. hadoop Command-line tools hdfs We limit ourselves to hadoop jar yarn

  6. Hadoop Java API org.apache.hadoop Let’s concentrate on things we absolutely need

  7. Hadoop Java API org.apache.hadoop Core MapReduce classes org.apache.hadoop.mapreduce Inuput/Output org.apache.hadoop.mapreduce.lib.input parsing org.apache.hadoop.mapreduce.lib.output atomic type wrappers org.apache.hadoop.io Job configuration org.apache.hadoop.conf File system classes org.apache.hadoop.fs

  8. org.apache.hadoop.mapreduce MapReduce Job org.apache.hadoop.mapreduce.Job org.apache.hadoop.mapreduce.Mapper Extensible Mapper org.apache.hadoop.mapreduce.Reducer Extensible Reducer Parent class for org.apache.hadoop.mapreduce.Partitioner Partitioning tasks org.apache.hadoop.mapreduce.InputFormat Parent classes for org.apache.hadoop.mapreduce.OutputFormat Input/Output Formats Parent class for org.apache.hadoop.mapreduce.InputSplit Input Split

  9. How it works Input File

  10. How it works InputSplit InputSplit InputSplit Input File

  11. How it works Job InputSplit Mapper Combiner (Reducer) InputSplit Reducer InputSplit Input File

  12. How it works Compute Node1 Job InputSplit Mapper Combiner (Reducer) Compute Node2 InputSplit Reducer InputSplit Compute Node3 Input File

  13. How it works Compute Node1 Job InputSplit Mapper Combiner (Reducer) Compute Node2 InputSplit Reducer InputSplit Compute Node3 Input File

  14. How it works Compute Node1 InputSplit Combiner (Reducer) Mapper Compute Node2 InputSplit Combiner (Reducer) Mapper InputSplit Compute Node3 Combiner (Reducer) Mapper Mapper Input File

  15. Compute Node1 InputSplit Mapper Combiner (Reducer)

  16. Compute Node1 InputSplit Mapper Combiner (Reducer)

  17. time MAP STAGE Reduce STAGE Compute Node1 Compute Node1 Mapper Reducer Combiner Compute Node2 Compute Node2 Mapper Reducer Combiner Compute Node3 Compute Node3 Reducer Mapper Combiner

  18. time MAP STAGE Reduce STAGE Compute Node1 Compute Node1 Partitioner Mapper Reducer Combiner Compute Node2 Compute Node2 Partitioner Mapper Reducer Combiner Compute Node3 Compute Node3 Partitioner Reducer Mapper Combiner

  19. time MAP STAGE Reduce STAGE Compute Node1 Compute Node1 Partitioner Mapper Reducer Combiner Compute Node2 Compute Node2 Partitioner Mapper Reducer Combiner Compute Node3 Compute Node3 Partitioner Reducer Mapper Shuffle STAGE Combiner

  20. time MAP STAGE Reduce STAGE Compute Node1 Compute Node1 Partitioner Mapper Reducer Combiner Compute Node2 Compute Node2 Partitioner Mapper Reducer Combiner Compute Node3 Compute Node3 Partitioner Reducer Mapper Shuffle STAGE Combiner

  21. Mapper in a nutshell protected void setup(org.apache.hadoop.mapreduce.Mapper.Context context) protected void map(KEYIN key, VALUEIN value, org.apache.hadoop.mapreduce.Mapper.Context context) protected void cleanup(org.apache.hadoop.mapreduce.Mapper.Context context) void run(org.apache.hadoop.mapreduce.Mapper.Cont ext context)

  22. run(InputSplit s, Context c): Run setup() once setup(s,c); for each record in s do: Run map() for each record map(record, c); end for; cleanup(s,c) Run cleaunp() once

  23. Reducer in a nutshell protected void setup(org.apache.hadoop.mapreduce.Mapper.Context context) protected void reduce(KEYIN key, Iterable<VALUEIN> value, org.apache.hadoop.mapreduce.Mapper.Context context) protected void cleanup(org.apache.hadoop.mapreduce.Mapper.Context context) void run(org.apache.hadoop.mapreduce.Mapper.Cont ext context)

  24. Shuffle Sort SecondarySort run(InputSplit s, Context c): Run setup() once setup(s,c); Run map() for each record for each record in s do: map(record, c); end for; Run cleaunp() once cleanup(s,c)

  25. Hadoop Java API org.apache.hadoop Core MapReduce classes org.apache.hadoop.mapreduce Inuput/Output org.apache.hadoop.mapreduce.lib.input parsing org.apache.hadoop.mapreduce.lib.output atomic type wrappers org.apache.hadoop.io Job configuration org.apache.hadoop.conf File system classes org.apache.hadoop.fs

  26. Hadoop Java API org.apache.hadoop Core MapReduce classes org.apache.hadoop.mapreduce Inuput/Output org.apache.hadoop.mapreduce.lib.input parsing org.apache.hadoop.mapreduce.lib.output atomic type wrappers org.apache.hadoop.io Job configuration org.apache.hadoop.conf File system classes org.apache.hadoop.fs

  27. org.apache.hadoop.mapreduce.lib.input Single File Input Format Generic Input File format (others extend it) FileInputFormat Text Input TextInputFormat User-defined Key-Value Pairs KeyValueInputFormat Fixed Length Records in input FixedLengthInputFormat NLineInputFormat Controls the size of split (in terms of #lines)

  28. org.apache.hadoop.mapreduce.lib.input Single File Input Format Generic Input File format (others extend it) FileInputFormat Text Input TextInputFormat User-defined Key-Value Pairs KeyValueInputFormat Fixed Length Records in input FixedLengthInputFormat Controls the size of split (in terms of #lines) NLineInputFormat Other Important Classes Multiple Files as inputs to a single Mapper MultipleInputs File Partitions FileSplits

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend