Todays Objec3ves Phils Talk Review Amazon Web Services Elas3c Map - - PDF document

today s objec3ves
SMART_READER_LITE
LIVE PREVIEW

Todays Objec3ves Phils Talk Review Amazon Web Services Elas3c Map - - PDF document

10/30/17 Todays Objec3ves Phils Talk Review Amazon Web Services Elas3c Map Reduce (EMR) Oct 30, 2017 Sprenkle - CSCI325 1 Phils Talk Oct 30, 2017 Sprenkle - CSCI325 2 1 10/30/17 AMAZON WEB SERVICES (AWS) Oct 30, 2017


slide-1
SLIDE 1

10/30/17 1

Today’s Objec3ves

  • Phil’s Talk Review
  • Amazon Web Services

Ø Elas3c Map Reduce (EMR)

Oct 30, 2017 1 Sprenkle - CSCI325

Phil’s Talk

Oct 30, 2017 Sprenkle - CSCI325 2

slide-2
SLIDE 2

10/30/17 2

AMAZON WEB SERVICES (AWS)

Oct 30, 2017 Sprenkle - CSCI325 3

What is Amazon Web Services?

  • A collec3on of remote compu3ng services that

together make up a cloud compu3ng plaTorm

Ø offered over the Internet by Amazon.com

  • Grew out of Amazon’s need to rapidly provision

and configure machines of standard configura3ons for its own business.

4

http://aws.amazon.com

Oct 30, 2017 Sprenkle - CSCI325

slide-3
SLIDE 3

10/30/17 3

Amazon Web Services Architecture

  • AWS is located in 16 geographical Regions

Ø Region: Geographic loca3on, price, laws, network locality. Ø wholly contained within a single country and all of its data and services stay within the designated Region.

  • Each region has mul3ple Availability Zones

Ø dis3nct data centers providing AWS services Ø isolated from each other to prevent outages from spreading between Zones Ø 44 availability zones

5 Oct 30, 2017 Sprenkle - CSCI325

https://aws.amazon.com/about-aws/global-infrastructure/

Terminology

  • Instance: One running virtual machine.
  • Instance Type: hardware configura3on - cores,

memory, disk.

  • Instance Store Volume: Temporary disk

associated with instance.

  • Image (AMI): Stored bits which can be turned

into instances.

  • Key Pair: Creden3als used to access VM from

command line.

Oct 30, 2017 Sprenkle - CSCI325 6

slide-4
SLIDE 4

10/30/17 4

The Amazon Web Services Universe

Oct 30, 2017 Sprenkle - CSCI325 7

Infrastructure Services Pla>orm Services Cross Service Features Management Interface

Management Interface

Oct 30, 2017 Sprenkle - CSCI325 8

Management Interface

CLI SDK Web

http://aws.amazon.com/tools/

Command-line interface

http://aws.amazon.com/cli/ http://aws.amazon.com/console/

Management Console SDKs, IDEs

slide-5
SLIDE 5

10/30/17 5

Infrastructure Services

Oct 30, 2017 Sprenkle - CSCI325 9

http://aws.amazon.com/s3/ http://aws.amazon.com/ec2/ http://aws.amazon.com/ebs/

Infrastructure Services

http://aws.amazon.com/vpc/

EC2 S3 EBS VPC

PlaTorm Services

Oct 30, 2017 Sprenkle - CSCI325 10

hcp://aws.amazon.com/dynamodb/ hcps://aws.amazon.com/emr/ hcp://aws.amazon.com/elas3cbeanstalk/ hcp://aws.amazon.com/rds/

EMR

DynamoDB

Beanstalk

RDS

Pla>orm Services

slide-6
SLIDE 6

10/30/17 6

Amazon Elas3c MapReduce (EMR)

  • Web service that makes it easy to quickly and

cost-effec3vely process vast amounts of data using Hadoop

  • Distributes data and processing across a resizable

cluster of Amazon EC2 instances

  • Can launch a persistent cluster that stays up

indefinitely or a temporary cluster that terminates afer the analysis is complete

Ø Probably want to terminate cluster

Oct 30, 2017 Sprenkle - CSCI325 11

Amazon Elas3c MapReduce (EMR)

  • Supports a variety of Amazon EC2 instance types

and Amazon EC2 pricing op3ons (On-Demand, Reserved, and Spot).

  • When launching an Amazon EMR cluster (also called

a "job flow"), you choose how many and what type

  • f Amazon EC2 Instances to provision.
  • The Amazon EMR price is in addi3on to the Amazon

EC2 price.

  • Amazon EMR is used in a variety of applica3ons,

including log analysis, web indexing, data warehousing, machine learning, financial analysis, scien3fic simula3on, and bioinforma3cs.

Oct 30, 2017 Sprenkle - CSCI325 12

slide-7
SLIDE 7

10/30/17 7

WordCount Mapper in Java

Oct 30, 2017 Sprenkle - CSCI325 13

public public static static class class TokenizerMapper TokenizerMapper 
 extends extends Mapper<Object, Text, Text, Mapper<Object, Text, Text, IntWritable IntWritable> { > { private private final final static static IntWritable IntWritable one

  • ne =

= new new IntWritable IntWritable(1); (1); private private Text Text word word = = new new Text(); Text(); public public void void map(Object map(Object key key, Text , Text value value, Context , Context context context)
 throws throws IOException IOException, , InterruptedException InterruptedException { { StringTokenizer itr = new new StringTokenizer StringTokenizer(value value.toString .toString()); ()); while while ( (itr itr.hasMoreTokens .hasMoreTokens()) { ()) { word.set(itr.nextToken()); context.write(word, one

  • ne);

); } } }

WordCount Reducer in Java

Oct 30, 2017 Sprenkle - CSCI325 14

public public static static class class IntSumReducer IntSumReducer extends extends Reducer<Text, Reducer<Text, IntWritable IntWritable, Text, , Text, IntWritable IntWritable> { > { private private IntWritable IntWritable result result = = new new IntWritable IntWritable(); (); public public void void reduce(Text reduce(Text key key, , Iterable Iterable<IntWritable IntWritable> > values values, Context , Context context context) throws throws IOException IOException, , InterruptedException InterruptedException { { int int sum sum = 0; = 0; for for ( (IntWritable IntWritable val val : : values values) { ) { sum += val.get(); } result.set(sum); context.write(key, result); } }

slide-8
SLIDE 8

10/30/17 8

WordCount.java

Oct 30, 2017 Sprenkle - CSCI325 15

public public class class WordCount WordCount { { public public static static void void main(String[] main(String[] args args) ) throws throws Exception { Exception { Configuration conf = new new Configuration(); Configuration(); Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class class); ); job.setMapperClass(TokenizerMapper.class class); ); job.setCombinerClass(IntSumReducer.class class); ); job.setReducerClass(IntSumReducer.class class); ); job.setOutputKeyClass(Text.class class); ); job.setOutputValueClass(IntWritable.class class); ); FileInputFormat.addInputPath(job, new new Path( Path(args args[0])); [0])); FileOutputFormat.setOutputPath(job, new new Path( Path(args args[1])); [1])); System.exit(job.waitForCompletion(true true) ? 0 : 1); ) ? 0 : 1); } }

Nested Classes

  • Nested class: member of enclosing class
  • Non-sta3c nested classes/inner classes

Ø Have access to members of enclosing class, even if private

  • Sta3c nested classes do not have access to

(instance) members of enclosing class

Oct 30, 2017 Sprenkle - CSCI325 16

slide-9
SLIDE 9

10/30/17 9

Solu3ons

  • Original code given

Ø All part of one Java class file

  • Alterna3ve:

Ø Classes in separate Java class files/not inner classes Ø The way I organized your example code in GitHub so that you may have an easier 3me with sharing/ collabora3ng

Oct 30, 2017 Sprenkle - CSCI325 17

Gelng Data To The Mapper

Input file InputSplit InputSplit InputSplit InputSplit Input file RecordReader RecordReader RecordReader RecordReader Mapper (intermediates) Mapper (intermediates) Mapper (intermediates) Mapper (intermediates) InputFormat

slide-10
SLIDE 10

10/30/17 10

Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

  • FileInputFormat: Key – offset of data in its file

Oct 30, 2017 Sprenkle - CSCI325 19

Finally: Wri3ng The Output

Reducer Reducer Reducer RecordWriter RecordWriter RecordWriter

  • utput file
  • utput file
  • utput file

OutputFormat

slide-11
SLIDE 11

10/30/17 11

Project 3

  • Use MapReduce and Amazon clusters to create

an inverted index

Ø What is an inverted index?

  • Write mapper and reducer
  • Write query
  • Check out resources, run through the tutorials

Ø Don’t get overwhelmed! Ø Important part of CS is learning tools, systems on your own

Oct 30, 2017 Sprenkle - CSCI325 21