today s objec3ves
play

Todays Objec3ves Phils Talk Review Amazon Web Services Elas3c Map - PDF document

10/30/17 Todays Objec3ves Phils Talk Review Amazon Web Services Elas3c Map Reduce (EMR) Oct 30, 2017 Sprenkle - CSCI325 1 Phils Talk Oct 30, 2017 Sprenkle - CSCI325 2 1 10/30/17 AMAZON WEB SERVICES (AWS) Oct 30, 2017


  1. 10/30/17 Today’s Objec3ves • Phil’s Talk Review • Amazon Web Services Ø Elas3c Map Reduce (EMR) Oct 30, 2017 Sprenkle - CSCI325 1 Phil’s Talk Oct 30, 2017 Sprenkle - CSCI325 2 1

  2. 10/30/17 AMAZON WEB SERVICES (AWS) Oct 30, 2017 Sprenkle - CSCI325 3 What is Amazon Web Services? • A collec3on of remote compu3ng services that together make up a cloud compu3ng plaTorm Ø offered over the Internet by Amazon.com • Grew out of Amazon’s need to rapidly provision and configure machines of standard configura3ons for its own business. http://aws.amazon.com Oct 30, 2017 Sprenkle - CSCI325 4 2

  3. 10/30/17 Amazon Web Services Architecture • AWS is located in 16 geographical Regions Ø Region: Geographic loca3on, price, laws, network locality. Ø wholly contained within a single country and all of its data and services stay within the designated Region. • Each region has mul3ple Availability Zones Ø dis3nct data centers providing AWS services Ø isolated from each other to prevent outages from spreading between Zones Ø 44 availability zones https://aws.amazon.com/about-aws/global-infrastructure/ Oct 30, 2017 Sprenkle - CSCI325 5 Terminology • Instance: One running virtual machine. • Instance Type: hardware configura3on - cores, memory, disk. • Instance Store Volume: Temporary disk associated with instance. • Image (AMI): Stored bits which can be turned into instances. • Key Pair: Creden3als used to access VM from command line. Oct 30, 2017 Sprenkle - CSCI325 6 3

  4. 10/30/17 The Amazon Web Services Universe Cross Service Features Management Pla>orm Services Interface Infrastructure Services Oct 30, 2017 Sprenkle - CSCI325 7 Management Interface http://aws.amazon.com/console/ Management Console CLI http://aws.amazon.com/tools/ Management SDKs, IDEs Interface SDK http://aws.amazon.com/cli/ Command-line interface Web Oct 30, 2017 Sprenkle - CSCI325 8 4

  5. 10/30/17 Infrastructure Services http://aws.amazon.com/ec2/ EC2 VPC Infrastructure http://aws.amazon.com/vpc/ Services S3 http://aws.amazon.com/s3/ EBS http://aws.amazon.com/ebs/ Oct 30, 2017 Sprenkle - CSCI325 9 PlaTorm Services EMR hcps://aws.amazon.com/emr/ RDS Pla>orm Services hcp://aws.amazon.com/rds/ DynamoDB hcp://aws.amazon.com/dynamodb/ Beanstalk hcp://aws.amazon.com/elas3cbeanstalk/ Oct 30, 2017 Sprenkle - CSCI325 10 5

  6. 10/30/17 Amazon Elas3c MapReduce (EMR) • Web service that makes it easy to quickly and cost-effec3vely process vast amounts of data using Hadoop • Distributes data and processing across a resizable cluster of Amazon EC2 instances • Can launch a persistent cluster that stays up indefinitely or a temporary cluster that terminates afer the analysis is complete Ø Probably want to terminate cluster Oct 30, 2017 Sprenkle - CSCI325 11 Amazon Elas3c MapReduce (EMR) • Supports a variety of Amazon EC2 instance types and Amazon EC2 pricing op3ons (On-Demand, Reserved, and Spot). • When launching an Amazon EMR cluster (also called a "job flow"), you choose how many and what type of Amazon EC2 Instances to provision. • The Amazon EMR price is in addi3on to the Amazon EC2 price. • Amazon EMR is used in a variety of applica3ons, including log analysis, web indexing, data warehousing, machine learning, financial analysis, scien3fic simula3on, and bioinforma3cs. Oct 30, 2017 Sprenkle - CSCI325 12 6

  7. 10/30/17 WordCount Mapper in Java public static public static class class TokenizerMapper TokenizerMapper 
 extends Mapper<Object, Text, Text, extends Mapper<Object, Text, Text, IntWritable IntWritable> { > { private private final final static static IntWritable IntWritable one one = = new new IntWritable IntWritable(1); (1); private private Text Text word word = = new new Text(); Text(); public public void void map(Object map(Object key key, Text , Text value value, Context , Context context context) 
 throws throws IOException IOException, , InterruptedException InterruptedException { { StringTokenizer itr = new new StringTokenizer StringTokenizer(value value.toString .toString()); ()); while while ( (itr itr.hasMoreTokens .hasMoreTokens()) { ()) { word.set(itr.nextToken()); context.write(word, one one); ); } } } Oct 30, 2017 Sprenkle - CSCI325 13 WordCount Reducer in Java public public static static class class IntSumReducer IntSumReducer extends extends Reducer<Text, Reducer<Text, IntWritable IntWritable, Text, , Text, IntWritable IntWritable> { > { private private IntWritable IntWritable result result = = new new IntWritable IntWritable(); (); public public void void reduce(Text reduce(Text key key, , Iterable Iterable<IntWritable IntWritable> > values values, Context , Context context context) throws throws IOException IOException, , InterruptedException InterruptedException { { int sum int sum = 0; = 0; for for ( (IntWritable IntWritable val val : : values values) { ) { sum += val.get(); } result.set(sum); context.write(key, result); } } Oct 30, 2017 Sprenkle - CSCI325 14 7

  8. 10/30/17 WordCount.java public public class class WordCount WordCount { { public static public static void void main(String[] main(String[] args args) ) throws throws Exception { Exception { Configuration conf = new new Configuration(); Configuration(); Job job = Job. getInstance(conf, "word count"); job.setJarByClass(WordCount.class class); ); job.setMapperClass(TokenizerMapper.class class); ); job.setCombinerClass(IntSumReducer.class class); ); job.setReducerClass(IntSumReducer.class class); ); job.setOutputKeyClass(Text.class class); ); job.setOutputValueClass(IntWritable.class class); ); FileInputFormat. addInputPath(job, new new Path( Path(args args[0])); [0])); FileOutputFormat. setOutputPath(job, new new Path(args Path( args[1])); [1])); System. exit(job.waitForCompletion(true true) ? 0 : 1); ) ? 0 : 1); } } Oct 30, 2017 Sprenkle - CSCI325 15 Nested Classes • Nested class: member of enclosing class • Non-sta3c nested classes/inner classes Ø Have access to members of enclosing class, even if private • Sta3c nested classes do not have access to (instance) members of enclosing class Oct 30, 2017 Sprenkle - CSCI325 16 8

  9. 10/30/17 Solu3ons • Original code given Ø All part of one Java class file • Alterna3ve: Ø Classes in separate Java class files/not inner classes Ø The way I organized your example code in GitHub so that you may have an easier 3me with sharing/ collabora3ng Oct 30, 2017 Sprenkle - CSCI325 17 Gelng Data To The Mapper Input file Input file InputSplit InputSplit InputSplit InputSplit InputFormat RecordReader RecordReader RecordReader RecordReader Mapper Mapper Mapper Mapper (intermediates) (intermediates) (intermediates) (intermediates) 9

  10. 10/30/17 Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT> • FileInputFormat: Key – offset of data in its file Oct 30, 2017 Sprenkle - CSCI325 19 Finally: Wri3ng The Output Reducer Reducer Reducer OutputFormat RecordWriter RecordWriter RecordWriter output file output file output file 10

  11. 10/30/17 Project 3 • Use MapReduce and Amazon clusters to create an inverted index Ø What is an inverted index? • Write mapper and reducer • Write query • Check out resources, run through the tutorials Ø Don’t get overwhelmed! Ø Important part of CS is learning tools, systems on your own Oct 30, 2017 Sprenkle - CSCI325 21 11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend